Per discussion on 10/4/2016, we're considering a demonstration for NDSC6 that is compelling to the library research data management community.

Basic premise:

  • Generalize the Clowder Tool Manager to work with other services, such as Dataverse or DSpace
  • Extend Dataverse to allow users to analyze datasets via Tool Manager applications (Jupyter/Rstudio)
  • Do this work as much as possible in NDS Labs Workbench

Status:

  • Cloned the Dataverse repo into the Workbench using the Java development environment and built via Maven without error.  Was able to easily package the dvinstall.zip, which will be required to leverage the NDS Dataverse docker container.
  • Brief walkthough of code in Dataverse that supports TwoRavens integration.  Basically, if TwoRavens is enabled Dataverse displays a button that links to TwoRavens for analysis.  Adding another button like this will be fairly straightforward This will need to be built and packaged into a suitable dvinstall and extend the ndslabs-dataverse Dockerfile.
  • Created a custom Tool Manager Docker image based on "docker-in-docker", which will be required to run in Workbench. Launched Tool Manager in Workbench without issue.
  • Brief walkthrough of Tool Manager code to see what would need to be extended.
  • Brief evaluation of Clowder ToolManager integration – specifically, the UI for managing running containers. We will likely want to replicate these in a UI for the ToolManager.

Tasks:

  • Fork Dataverse repo
  • Implement Tool Manager button in Dataverse UI (following the TwoRavens model)
  • Fork Tool Manager repo
  • Extend Tool Manager to support non-Clowder-specific URLs and, if possible, local file mounts
  • Implement a basic UI for the Tool Manager that would allow users to stop/delete running analysis instances.

 

Estimate:

  • Dataverse changes: < 1 day
  • Tool Manager changes: < 1 day
  • Tool Manager UI: (Mike needs to provide estimate
  • Demonstration video: < 4 hours for 10 minute vide + presentation materials

Tool Manager in Labs Workbench

The Tool Manager developed for Clowder is a Python-based REST API that runs in privileged Docker container. The toolserver Python script reads the toolconfig.json file, accepts requests to create instances of the tools, which it does by spawning additional Docker containers. Details of running tools are stored in /usr/local/data/instances.json (which raises questions of concurrency). A URL is returned with a port assigned from the Docker ephemeral range. This poses a few problems running the Tool Manager in Labs Workbench:

  • Requires Docker:  This can be solved by our DinD approach used for Jenkins and Docker services
  • Requires Docker port range: This can be solved by using the Nginx proxy_pass (ala the Ingress controller)

The Labs Workbench "Tool Manager" (aka Analysis Toolbox) is a customized version of the original Clowder tools server that can run in Labs Workbench

https://github.com/nds-org/ndslabs-toolmanager/

It includes the following changes:

  • Custom toolconfig.json contains JupyterLab and RStudio (for demonstration only)
  • Dockerfile adds DinD, nginx
  • entrypoint.sh starts DinD, nginx, and the toolserver
  • templates directory contains jinja templates used to generate the nginx.conf
  • toolserver added the following:
    • readTemplate to read a template file
    • writeNginxConf to write the nginx conf
    • reloadNginx to call nxinx -s reload when the configuration changes
  • Each time an instance is updated or removed, the nginx.conf is written and reloaded.
  • The templates/tool.tmp files contain nginx location directive template with reverse proxy configuration settings, including tool-specific rules. When a new instance is added, a location directive is added using the first 10 characters of the container ID which proxies requests to the ephemeral port.  This means that we don't need to open the Docker ranges, but we do need to write rules for any new services going forward.

It's also worth looking at the Jupyter TmpNB tool https://github.com/jupyter/tmpnb used by ytHub going forward.  They use a similar approach but with a nodejs proxy – and of course only support Jupyter.

 

Standalone Tool Server (NDSC6)

The goal here is to provide a standalone tool server that implements the basic management functions currently in Clowder:

  • View list of available tools
  • View a list of running tool instances
  • Launch a tool for a given dataset
  • Add a dataset to a running tool instanec
  • Stop a running tool instance
  • Access a running tool instance

Additional features might include

  • Token-based access for remote applications (e.g., Dataverse)
  • Admin versus user access to running instances
  • Allow user to set passwords for running instance

tool-mgr-ui

Current REST endpoints

  • GET /tools
  • GET /instances?ownerId=<userId>
  • POST /instances
  • PUT /instances/<instance>
  • DELETE /instances/<instance>
  • GET /logs

 

Dataverse flow (see diagram)

  • User accesses Dataverse dataset page. Next to each file is an "explore" button that links to the Tool Manager Dashboard
  • http://toolmanager/do?token=TOKEN&user=userId&dataset=datasetId
    • token: Tool Manager access token that encodes information about the service (e.g., to differentiate Clowder from Dataverse)
    • user: Service internal userID.  If not specified "guest".
    • dataset: ID of the dataset in Dataverse/Clowder.  The plugin in Tool Manager must know what to do with it.
  • Tool Manager UI presents the user with several options:
    • Open dataset in new instance (select from list of available tools). This launches a new tool instance (POST)
    • Open dataset in existing instance (PUT)
    • Stop instance (DELETE)

Example data (current Tool Manager)

toolconfig.json

{
    "jupyterlab": {
        "description": "JupyterLab environment", 
        "name": "JupyterLab"
    }, 
    "rstudio": {
        "description": "RStudio analysis environment", 
        "name": "RStudio"
    }
}

 

instances.json

 
{
    "c255e06ce3b45e56bb4ca3be802366320922649a4557492e3a381bda429bc173": {
        "created": "2016-10-11T10:24:07.435488+00:00", 
        "description": "JupyterLab environment", 
        "name": "Test User's instance", 
        "ownerId": "57fcbb8cba996f10d6ded357", 
        "port": "32768", 
        "toolName": "JupyterLab", 
        "toolPath": "jupyterlab", 
        "uploadHistory": [
            {
                "datasetId": "57fcbc5ce4b01702ddce9383", 
                "datasetName": "Test", 
                "time": "2016-10-11T10:24:07.435488+00:00", 
                "uploaderId": "57fcbb8cba996f10d6ded357", 
                "url": "sia2v9-clowder.labsdev.ndslabs.org/api/datasets/57fcbc5ce4b01702ddce9383/download"
            }, 
            {
                "datasetId": "57fcc10ee4b01702ddce9393", 
                "datasetName": "Test2", 
                "time": "2016-10-11T10:39:26.594829+00:00", 
                "uploaderId": "57fcbb8cba996f10d6ded357", 
                "url": "sia2v9-clowder.labsdev.ndslabs.org/api/datasets/57fcc10ee4b01702ddce9393/download"
            }
        ], 
        "url": "https://s2hqhs-toolmanager.labsdev.ndslabs.org/c255e06ce3/"
    }
}

 

Dataverse Data Access API

http://guides.dataverse.org/en/4.2/api/dataaccess.html

Example request Dataverse/TwoRavens

https://s5fiod-dataverse.labsdev.ndslabs.org/api/access/datafile/3?key=6b7494fc-cc77-49cb-b29f-c4196e5ba49c&format=prep

 

http://guides.dataverse.org/en/4.2/api/native-api.html#dataverses

File structure:

/usr/local/glassfish4/glassfish/domains/domain1/files $ find .
.                                                                                                   
./temp                                                                                              
./temp/157b599bf54-d8e778f23e26.thumb400                                                            
./temp/157b599bf54-d8e778f23e26.thumb48                                                             
./10.5072                                                                                           
./10.5072/FK2                                                                                       
./10.5072/FK2/2KFT6A                                                                                
./10.5072/FK2/2KFT6A/157b5603c9f-dbe3f1e9cb5a.orig                                                  
./10.5072/FK2/2KFT6A/157b5603c9f-dbe3f1e9cb5a                                                       
./10.5072/FK2/2KFT6A/157b5603c9f-dbe3f1e9cb5a.90d                                                   
./10.5072/FK2/2KFT6A/157b5603c9f-dbe3f1e9cb5a.prep                                                  
./10.5072/FK2/2KFT6A/157b599bf54-d8e778f23e26                                                       
./10.5072/FK2/2KFT6A/157b599bf54-d8e778f23e26.thumb400                                              
./10.5072/FK2/2KFT6A/157b599bf54-d8e778f23e26.thumb64                                               
./10.5072/FK2/2KFT6A/157b599bf54-d8e778f23e26.thumb48

 

Example Native API request:

sh-4.2# curl localhost:8080/api/datasets/2 

{
	"status": "OK",
	"data": {
		"id": 2,
		"identifier": "2KFT6A",
		"persistentUrl": "http://dx.doi.org/10.5072/FK2/2KFT6A",
		"protocol": "doi",
		"authority": "10.5072/FK2",
		"latestVersion": {
			"id": 1,
			"versionNumber": 1,
			"versionMinorNumber": 0,
			"versionState": "RELEASED",
			"productionDate": "Production Date",
			"UNF": "UNF:6:oImW54v0kC7t1HBUrVIG2A==",
			"lastUpdateTime": "2016-10-11T21:07:37Z",
			"releaseTime": "2016-10-11T21:07:37Z",
			"createTime": "2016-10-11T20:12:48Z",
			"metadataBlocks": {
				"citation": {
					"displayName": "Citation Metadata",
					"fields": [{
						"typeName": "title",
						"multiple": false,
						"typeClass": "primitive",
						"value": "Test"
					}, {
						"typeName": "author",
						"multiple": true,
						"typeClass": "compound",
						"value": [{
							"authorName": {
								"typeName": "authorName",
								"multiple": false,
								"typeClass": "primitive",
								"value": "Admin, Dataverse"
							},
							"authorAffiliation": {
								"typeName": "authorAffiliation",
								"multiple": false,
								"typeClass": "primitive",
								"value": "Dataverse.org"
							}
						}]
					}, {
						"typeName": "datasetContact",
						"multiple": true,
						"typeClass": "compound",
						"value": [{
							"datasetContactName": {
								"typeName": "datasetContactName",
								"multiple": false,
								"typeClass": "primitive",
								"value": "Admin, Dataverse"
							},
							"datasetContactAffiliation": {
								"typeName": "datasetContactAffiliation",
								"multiple": false,
								"typeClass": "primitive",
								"value": "Dataverse.org"
							},
							"datasetContactEmail": {
								"typeName": "datasetContactEmail",
								"multiple": false,
								"typeClass": "primitive",
								"value": "dataverse@mailinator.com"
							}
						}]
					}, {
						"typeName": "dsDescription",
						"multiple": true,
						"typeClass": "compound",
						"value": [{
							"dsDescriptionValue": {
								"typeName": "dsDescriptionValue",
								"multiple": false,
								"typeClass": "primitive",
								"value": "Tset"
							}
						}]
					}, {
						"typeName": "subject",
						"multiple": true,
						"typeClass": "controlledVocabulary",
						"value": ["Mathematical Sciences"]
					}, {
						"typeName": "depositor",
						"multiple": false,
						"typeClass": "primitive",
						"value": "Admin, Dataverse"
					}, {
						"typeName": "dateOfDeposit",
						"multiple": false,
						"typeClass": "primitive",
						"value": "2016-10-11"
					}]
				}
			},
			"files": [{
				"label": "fearonLaitin.tab",
				"version": 2,
				"datasetVersionId": 1,
				"datafile": {
					"id": 3,
					"name": "fearonLaitin.tab",
					"contentType": "text/tab-separated-values",
					"filename": "157b5603c9f-dbe3f1e9cb5a",
					"originalFileFormat": "text/csv",
					"originalFormatLabel": "Comma Separated Values",
					"UNF": "UNF:6:oImW54v0kC7t1HBUrVIG2A==",
					"md5": "fcc7c6d8f80b416407e356b123e889b5"
				}
			}]
		}
	}
}

 

 

 

  • No labels