SEAD's restful service api (see http://sead.ncsa.illinois.edu/sead-acr-api ) allows you to read/write/annotate/tag/delete datasets and collections (among other things). These services are used within SEAD's bulk upload/sync command line tool and can be called from Curl or your own code to allow your application to read/write directly to/from ACR spaces.
SEAD Uploader
The SEAD Uploader is a command line tool that can be used to upload or sync portions of your disk with an ACR project space. (The upload speed you achieve will depend on your disk and network speed, but the client itself can manage >10K file uploads and we've seen at least 250MB/minute transfer speeds over a mix of large and small files.)
The uploader is invoked as
java -cp sead1.jar org.sead.acr.client.SEADUploader <-listonly> <-merge> <-limit<X>> <serverUrl> <directories/files list...>
where:
-listonly: write information about what would/would not be transferred without doing any upload
-merge: do not create new collections or datasets if ones uploaded from the same path already exist
-limit<X>: limit this run to at most X dataset uploads (any required collections will be automatically created)
serverUrl: the base URL of the ACR project space you're interacting with, e.g. http://sead-demo.ncsa.illinois.edu/acr
directories/files lists - a list of one or more directory or file names the Uploader should work on. The Uploader will recurse (depth first) through the files and subdirectories contained within any listed directory.
Examples:
java -cp sead1.jar org.sead.acr.client.SEADUploader -listonly http://sead-demo.ncsa.illinois.edu/acr mydir
Check ./mydir and list all collections and datasets that would be created without the -listonly flag
java -cp sead1.jar org.sead.acr.client.SEADUploader http://sead-demo.ncsa.illinois.edu/acr mydir
Create a "mydir" collection in the project space and create collections and datasets for all contained items. Each item will be annotated with metadata indicating the original path (user metadata: "instanceOf (http://purl.org/vocab/frbr/core#embodimentOf) with the value /mydir for the mydir directory in this example and /mydir/<relative path> for all children).
Note that if this command is run a second time, new collections and datasets will be created - probably not what you'd want. Instead use:
java -cp sead1.jar org.sead.acr.client.SEADUploader -merge http://sead-demo.ncsa.illinois.edu/acr mydir
This command will only create collections and datasets that have not previously been uploaded. (The "instanceOf" metadata is used to identify matching items, so starting with the same directory on your disk is required (i.e. starting up or down one directory will result in new uploads))
java -cp sead1.jar org.sead.acr.client.SEADUploader -merge -limit100 http://sead-demo.ncsa.illinois.edu/acr mydir
Upload a maximum of 100 datasets (files) and only create the collections (directories) required. Using this command repeatedly (with the -merge flag) will upload the next 100 datasets, so repeated use will eventually upload all datasets, as though the limit flag had not been used. Since this mode (with -merge and -limit) requires the Uploader to check for existing files to identify where to begin, it will be slower than not using the -limit switch.
java -cp sead1.jar org.sead.acr.client.SEADUploader -merge http://sead-demo.ncsa.illinois.edu/acr file1.txt file2.txt file3.txt
Upload the three listed files if the do not exist in the ACR space
The SEADUploader requires authentication. The first time it is invoked, it will initiate a Google "device" authorization request for SEAD (similar to what you may have seen with Netflix or other services on your TV). The Uploader will generate a code and provide a Google URL. Using a browser on any machine, you can go to the URL and enter the code. Once you've completed that, hit <enter> and the SEADUploader will continue and acquire a Google token it will use to authenticate you to SEAD. After the first time, the Uploader will use the acquired refresh_token to automatically log in on your behalf. This process requires the information in the sead-google.json file provided with the Uploader, and it will write a refresh.txt file to your disk to remember your login. Deleting refresh.txt will cause the Uploader to generate a new code as it did the first time. Deleting sead-google.json (or not having it in the same directory your invoking the client from) will cause the Uplaoder to fail.
Curl
Curl (curl.haxx.se) is a command line tool useful for invoking web services. In addition to the restful services listed at http://sead.ncsa.illinois.edu/sead-acr-api, SEAD provides a command line tool and session management services to simplify authentication:
Authentication and session management:
To acquire a Google access token, invoke the following tool:
java -cp sead1.jar org.sead.acr.client.SEADGoogleLogin
(or follow the alternative directions below to use Curl and manage this process yourself)
The first invocation will initiate a Google "device" authorization process as with the SEADUploader:
Did not find stored refresh token. Initating first-time device authorization request.
1) Go to : http://www.google.com/device in your browser
2) Type : 2c47gdhu in your browser
3) Hit <Return> to continue.
Once you've entered the code in a browser and hit <return>, the process will complete:
Proceeding
New Access Token is: ya29.KgD2pVWLsA5TBR4AAACbQLkXOa0nnxD0Es2taQPbx5ebeKR0BJ02VmMJS-7CXg
Expires in 3600 seconds
On subsequent invocations, you'll simply see:
New Access Token is: ya29.KgAEcQrmKtUJTx4AAABuo3_Qs2iTnlSOp4dpNz9EWCUuSpDcVmGiQZh2YYunsg
Expires in 3600 seconds
This access token can then be used to login to SEAD and retrieve a session cookie that can be used for all subsequent service requests:
curl --cookie-jar <temporary cookie file> -d "username=<your email>&googleAccessToken=<your access token" http://<desired ACR space>/acr/api/authenticate
Calling SEAD services
With a session cookie, calling SEAD services is straight forward, For example,
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections
will retrieve JSON Linked Data (JSON-LD - see http://json-ld.org/) formatted metadata for the top-level collections in the specified ACR space:
{
"tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/75D08C13-A727-4631-8401-555CFA330411": {
"Identifier": "tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/75D08C13-A727-4631-8401-555CFA330411",
"Title": "el_BOB",
"Date": "2013-10-01T19:10:14.155Z",
"Uploaded By": "http://cet.ncsa.uiuc.edu/2007/person/admin",
"Creator": "Coleman, Eric : http://vivo-vis-test.slis.indiana.edu/vivo/individual/a1375"
},
"tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/1339ECC8-6FAE-4C63-BDBE-DA7FC1999965": {
"Identifier": "tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/1339ECC8-6FAE-4C63-BDBE-DA7FC1999965",
"Title": "Test",
"Date": "2014-01-13T18:55:58.082Z",
"Uploaded By": "http://cet.ncsa.uiuc.edu/2007/person/anonymous"
},
"tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/41c3486d-a18f-468f-a0fa-50c1bb5102a8": {
"Identifier": "tag:cet.ncsa.uiuc.edu,2008:/bean/Collection/41c3486d-a18f-468f-a0fa-50c1bb5102a8",
"Title": "test-classes",
"Date": "2014-03-27T16:25:37.536Z",
"Uploaded By": "http://cet.ncsa.uiuc.edu/2007/person/admin",
"Abstract": "cheese"
},
"tag:medici@uiuc.edu,2009:col_BJtxI2-d8-ng5TpQPNcdlA": {
"Identifier": "tag:medici@uiuc.edu,2009:col_BJtxI2-d8-ng5TpQPNcdlA",
"Title": "classes",
"Date": "2014-05-28T13:36:42.543Z",
"Uploaded By": "http://cet.ncsa.uiuc.edu/2007/person/myersjd@umich.edu"
},
"@context": {
"Identifier": "http://purl.org/dc/elements/1.1/identifier",
"Title": "http://purl.org/dc/elements/1.1/title",
"Date": "http://purl.org/dc/terms/created",
"Uploaded By": "http://purl.org/dc/elements/1.1/creator",
"Abstract": "http://purl.org/dc/terms/abstract",
"Contact": "http://sead-data.net/terms/contact",
"Creator": "http://purl.org/dc/terms/creator"
}
}
Note that in JSON-LD, the @context can be used to associate the labels given with RDF identifiers for the predicates.
The identifiers of the collections can be used to get further information about them:
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections/<id> - the same metadata as above for one collection
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections/<id>/biblio - further bibliographic metadata
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections/<id> - all known metadata (biblio, user-provided, and extracted metadata)
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections/<id>/datasets - the list of datesets in the collection
curl -b <temporary cookie file> http://<desired ACR space>/acr/resteasy/collections/<id>/collections - the list of subcollections
Curl can be used to create collections, upload data (and provide provenance/metadata in the same call), add tags, add metadata, etc.
For example, uploading a dataset would be done like this:
curl -v -b \tmp\temp.txt --form "datablob=@readme.txt" --form "http://purl.org/vocab/frbr/core#embodimentOf=/test/readme.txt" http:/<your ACR space/acr/resteasy/datasets
which would upload a readme.txt file and add metadata identifying the path for it on the local disk.
Downloading would be done with:
curl -b <temporary cookie file> -O http://<your ACR space>/acr/resteasy/datasets/{id}/file
Curl Tips:
cookie files and the destination for downloaded files need to be in writable directories
-v (verbose) is a useful flag to see more about the service request/response
Using Curl for Google login:
Curl can be used instead of SEAD's SEADGoogleLogin utility to initially register a device (your computer) with Google and acquire SEAD credentials as well as to acquire new tokens over time (see https://developers.google.com/accounts/docs/OAuth2ForDevices for details):
curl -v -d "client_id=972225704837-hlsr4m5f69v2s0vig8c6fbdmdohvqjv6.apps.googleusercontent.com&scope=email profile" https://accounts.google.com/o/oauth2/device/code
will request a user_code that can be entered in a browser.
curl -v -d "client_id=972225704837-hlsr4m5f69v2s0vig8c6fbdmdohvqjv6.apps.googleusercontent.com&client_secret=<SEAD client secret>&code=<device _code from step 1>&grant_type=http://oauth.net/grant_type/device/1.0" https://accounts.google.com/o/oauth2/token
will acquire an access token (used by SEAD) and a refresh_token (used to acquire new access_tokens in the future). Your code should store the refresh_token for future use to avoid having your user deal with the code step.
curl -v -d "client_id=972225704837-hlsr4m5f69v2s0vig8c6fbdmdohvqjv6.apps.googleusercontent.com&client_secret=<SEAD Client Secret>&refresh_token=<refresh_token from step 2>&grant_type=refresh_token" https://accounts.google.com/o/oauth2/token
will acquire an access token from the refresh token.