Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


 

SEAD's restful service api (see http://sead.ncsa.illinois.edu/sead-acr-api ) allows Clowder component has a service api that allows you to read/write/annotate/tag/delete datasets and collections (among other things). These services are used within SEAD's bulk upload/sync command line tool and can be called from Curl or your own code to allow your application to read/write directly to/from ACR spaces.SEAD's google id for devices is in a file called sead-google.json which is available upon requestClowder spaces.

SEAD Uploader

The SEAD Uploader is a command line tool that can be used to upload or sync portions of your disk with an ACR a Clowder project space. (The upload speed you achieve will depend on your disk and network speed, but the client itself can manage >10K >100K file uploads and we've seen at least 250MB/minute transfer speeds over a mix of large and small files.). Note: The first time you use the SEADUploader on a machine, it will run through an authentication process with Google (see below). 

Steps to get started:

  1. Please make sure you have JAVA installed on your computer. If you don't, you can download it here: https://java.com/en/download/
  2. Download the SEAD Uploader jar file :  sead2.0.13.jar (1.5 Users can still use sead1.2e.jar) For SEAD 1.5 only, request the sead-google.json file by sending an email to SEADdatanet@umich.edu.
  3. On your computer, open the root directory that contains files and folders you would like to upload to your Project Space in SEAD.
  4. Put the sead1sead2.2e3.jar file and (for 1.5) sead-google.json files in this directory.
  5. Holding the SHIFT button, right click on your mouse to open a menu. Select the "Open Command Window Here" option.
  6. Once you have the Command Window open, invoke the SEADUploader by typing the following in the Command Window:

java -cp sead1sead2.2e3.jar org.sead.acruploader.clientclowder.SEADUploader <-listonly> <-limit=<X>> <-skip=<n>> <-verify> <-merge> ex=<Y>> <-limit<X>> key=<apiKey>> <-forcenew> -ex<Y>> <serverUrl> <directories/files server=<serverUrl> -id=<id> <directories list...>

where:

-listonly: write information about what would/would not be transferred without doing any upload

-merge: do not create new collections or datasets if ones uploaded from the same path already exist-limit<X>: limit=<X>: limit this run to at most X dataset uploads (any required collections will be automatically created)

-skip=<n>: skip the first <n> files found on disk before starting to check whether files exist on the server and uploading those that are not yet in SEAD (any required collections will be automatically created)

-ex<Y>ex=<Y>: exclude any file that matches the provided regular expression pattern, e.g. -ex^ex=^\..*  (exlude files that start with a period) -ex*.txt (exclude all files ending in .txt). Multiple repeats of this flag can be used to exclude based on multiple patterns..

-forcenew: By default, the Uploader will search for an existing dataset with the same name as the specified directory and only upload files that are not already in that dataset. -forcenew will always create a new dataset and, subject to other settings, will therefore always upload all files.

-key=<apiKey>: use an API Key created you create (on your Profile page) to avoid having to enter a username/password.

-id=<id>: if you know a dataset exists, specifying it's id here will improve performance as the uploader won't have to scan through all datasets to find it.

-server=<serverUrl>serverUrl: the base URL of the ACR project space SEAD/Clowder server  you're interacting with, e.g.  http-server=https://sead-demosead2.ncsa.illinois.edu/acr

directories /files lists list - a list of one or more directory or file names the Uploader should work on. The Uploader will recurse (depth first) through the files and subdirectories contained within any listed directory.-sead2: tells the Uploader to use the SEAD 2.0 API which is required to upload to https://sead2.ncsa.illinois.edu

-verify: adding this flag will cause the uploader to use a cryptographic hash to verify that the local file and the copy in SEAD are exactly the same, byte-for-byte. With SEAD 2.0,  this flag should be used on a second run of the Uploader - hashes in 2.0 are calculated asynchronously and may not be available immediately after a file is uploaded.

...

Note: SEAD recommends always using the -merge flag unless you want duplicate file copies in SEAD for some purpose.using an API key rather than your username/password.


Examples:

2.0 examples: use of the Uploader creates one Dataset with Folders and Files inside

 

java -cp sead2.3.jar org.sead.acruploader.clientclowder.SEADUploader -listonly -merge -sead2 server=https://sead2.ncsa.illinois.edu mydir

Using SEAD's 2.0 instance, check ./mydir and list the Dataset and all Folders and Files that would be created without the -listonly flag

java -cp sead2.3.jar org.sead.acruploader.clientclowder.SEADUploader -merge -sead2 server=https://sead2.ncsa.illinois.edu mydir

 Using SEAD's 2.0 instance, create a 'mydir' Dataset in your account on sead2.ncsa.illinois.edu and create Folders and Files for all contained items. Each File will be annotated with metadata indicating the original path (user metadata: "instanceOf (http://purl.org/vocab/frbr/core#embodimentOf) with the value /mydir for the mydir directory in this example and /mydir/<relative path> for all Files).

java -cp sead2.jar org.sead.acr.client.SEADUploader -merge -sead2 -verify https://sead2.ncsa.illinois.edu mydir

 Using SEAD's 2.0 instance, verify that all files previously uploaded to SEAD 2 are exactly the same as those on your disk.  Since SEAD 2 currently calculates hash values with a delay, the -verify flag should be added on a second run of the Uploader.

 1.5 examples: use of the Uplaoder creates a Collection (Dataset in 2.0) with Sub-collections (Folders in 2.0) and Datasets (Files in 2.0) inside

...

Note that if this command is run without the -merge flag, it will create a new Dataset and upload all folders and files again, even if a Dataset already exists from a prior run (which is not usually what you'd want). -merge makes the Uploader check for an existing Dataset and only uploads missing content, or folders/files added to your local directory since the last upload.

java -cp sead2.3.jar org.sead.

...

uploader.

...

clowder.

...

SEADUploader -limit100 -server=https://

...

...

 mydir

Check ./mydir and list all collections and datasets that would be created without the -listonly flag

...

 Using SEAD's 2.0 instance, locate/create a 'mydir' Dataset in your account on sead2.ncsa.illinois.edu

...

and create

...

Folders and

...

Files for

...

the first 100 contained items.

...

Running the command again will add the next 100 items (becasue of the -merge flag it will find/skip the ones already uploaded). 

java -cp sead2.3

Note that if this command is run a second time, new collections and datasets will be created - probably not what you'd want. Instead use:

...

.jar org.sead.

...

uploader.

...

clowder.

...

SEADUploader -verify -server=https://

...

...

 mydir

 Using SEAD's 2.0 instance, verify that all files previously uploaded to SEAD 2 are exactly the same as those on your disk.  Since SEAD 2 currently calculates hash values with a delay, the -verify flag should be added on a second run of the Uploader.

...

This command will only create collections and datasets that have not previously been uploaded. (The "instanceOf" metadata is used to identify matching items, so starting with the same directory on your disk is required (i.e. starting up or down one directory will result in new uploads))

java -cp sead1.2e.jar org.sead.acr.client.SEADUploader -merge -limit100 https://sead-demo.ncsa.illinois.edu/acr mydir

Upload a maximum of 100 datasets (files) and only create the collections (directories) required. Using this command repeatedly (with the -merge flag) will upload the next 100 datasets, so repeated use will eventually upload all datasets, as though the limit flag had not been used. Since this mode (with -merge and -limit) requires the Uploader to check for existing files to identify where to begin, it will be slower than not using the -limit switch.

java -cp sead1.2e.jar org.sead.acr.client.SEADUploader -merge https://sead-demo.ncsa.illinois.edu/acr file1.txt file2.txt file3.txt

Upload the three listed files if the do not exist in the ACR space

 

SEAD 2.0 Authentication Process

Uploading data to SEAD 2.0 requires an APIKey or a local SEAD username/password (Social social logins are not yet supported). Each In username/password mode, each time you run the Uploader, you will be prompted for your username and password. Your password will be transmitted via https to the SEAD2 server, but no copy is stored on your local machine.

SEAD 1.5 First Time Google Authentication Process

The SEADUploader requires authentication. SEAD's google id for devices is in sead-google.json. The first time the SEADUploader is invoked, it will initiate a Google "device" authorization request for SEAD (similar to what you may have seen with Netflix or other services on your TV). The Uploader will generate a code and provide a Google URL.

...

machine

...

Once you've completed that, hit <enter> and the SEADUploader will continue and acquire a Google token it will use to authenticate you to SEAD.

...

.

...

Help

If you experience problems with using the SEADUploader, please contact SEAD at SEADdatanet@umich.edu and we will be happy to walk you through the steps.