You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

The Restful API to SEAD's Curation and Publication Services is focused on management (basic CRUD: Create, Read, Update, Delete) of repository, people, and research object publication entries.Matchmaking, which essentially requires create a 'test' research object publication request, is handles through additional endpoints for the Research Object service. The following sections describe the basic use of the services and the final section does a walk through of how to create new publication requests from the Beta Test Project Space.

 

Repositories:

Repositories working with SEAD should POST a profile to the /cp/repositories endpoint. The profile is a json-ld object containing descriptive terms about the repository (primarily from the re3data vocabulary) along with any additional descriptive terms repositories wish to add (i.e. other vocabularies). To particulate in matchmaking, profiles must also contain terms that trigger rules (this list is available at /cp/researchobjects/matchingrepositories/rules and is discussed further below). Repositories are also expected to identify preference terms they will respond to (and that users will see in the staging area). The format for such preferences is not yet defined.

Repositories are required to include an "orgidentifier" value in their profile object. This should be a short term that is unique (e.g. sda, ideals, icpsr) - it will be used to list existing repositoriy profiles at /cp/repositories. Individual profiles can be read by going to /cp/repositories/<orgidentifier>, e.g. /cp/repositories/sda.

Profiles may be updated by doing a PUT with a new profile to /cp/repositories/<orgidentifier>, and can be deleted by doing a DELETE to the same URL.

Currently, the Bob repository has a profile that shows the entries required to trigger the existing rules (/cp/researchobjects/matchingrepositories/rules): 

People:

Personal profiles are available through the /cp/people endpoint. Navigating to this endpoint lists the current set of profiles being managed. New profiles can be added by sending a POST to /cp/people with a json object indicating the provider and identifier for the person at that provider. Currently only ORCID profiles are supported, so POSTs would contain "{"provider": ORCID", "identifier":"<numeric ORCID_ID> }" where "0000-0003-2164-8132" is one such identifier. Individual profiles can be retrieved from /cp/people/<id>, e.g.  /cp/people/0000-0003-2164-8132. An empty PUT to a specific profile endpoint will cause a refresh (the profile will be retrieved from ORCID again), and DELETE will remove the profile.

  Since people profiles are primarily for automated use during publication and matchmaking, it is expected that Project Spaces will automatically request profile harvesting for, for example, people who are creators of research objects, and, eventually, for all people who register with SEAD. The format of what is saved is expected to change as other providers are added.

Research Objects:

Research objects have a lifecycle that, from a curation and preservation service perspective, begins with a request for the object to be published, proceeds through a series of processing steps that result in status updates, and ends with successful publication, failure at some step, or revocation of the request. A request is made by POSTing to the /cp/researchobjects endpoint. The object sent is a compound JSON-LD object consisting of an object describing the top level item (an ORE Aggregation) to be published, the list of user preferences (i.e. from the staging area), and the repository the request is for. A GET at /cp/researchobjects shows the list of requests, and doing a GET on /cp/researchobjects/<Identifier> will show the full request. Within the full request is a URL from which the OREMap for the Aggregation can be retrieved. The Map includes a list of all subobjects, their metadata, and their relationships in an ORE-compliant JSON-LD structure.

A DELETE to  /cp/researchobjects/<Identifier> causes the request to be revoked/deleted .

A PUT to /cp/researchobjects/<Identifier>/status  can be used to send a Status message to update the request

A GET to /cp/researchobjects/<Identifier>/status willl retrieve just the status information known about a request.

A GET to /cp/repositories/<orgidentifier>/researchobjects will return a list of the research object publication requests to a given repository.

For Matchmaking, a POST can be made to /cp/researchobjects/matchingrepositories using the same format as for a publication request - minus the key/value identifying a specific repository - and the score for the proposed research object at all repositories will be given. The total score is a sum of scores from all rules and the response details the individual scores as well as the total.

 

Walking Through:

Create a collection and add subcollections and content as usual in the Beta Test Project Space. To create a new publication request, hit the "Submit for Publication" Button.

One that is current as of 9/16/15 has been created as https://sead-test.ncsa.illinois.edu/seadcp0.91/cp/researchobjects/tag:sead-data.net,2015:RO_m4U1YuLCLgDWEHuLiCbW0g .

Since we do not yet have a Staging Area, changing preferences or setting the repository target for publication require Admin privileges and setting the Project Space default values for these entries. Going to Admin Page/Config Tab/ 2.0 Beta Publication section, set the default repository and preferences as desired and hit "submit". New requests made will use these values.

To revoke a publication request, click the "Publication Requested" button on your collection and it will revert to say "Submit For Publication" again.

After a "Submit For Publication" request, you should then be able to go to /cp/researchobjects and see the new request.Your request will be deleted if you revoke it from the project space.

The request from the project space is augmented two ways during processing:

If your object has metadata about its creators (dcterms:creator), which can be added in the project space, and if the value is of the form "orcid.org/<orcid-id>", and if that person has listed employment or education affiliations in their profile, these affiliations are added to the request automatically. Multiple creators can be listed.

An initial status message indicating that the request has been received by the curation and publication services is also added.

Copy the "Identifier" to the end of the URL (e.g. /cp/researchobjects/<Identifier> ) and you'll see the full request. Within this request, click on the link listed as the "@id" of the "Aggregation". This URL will return the full OREMap JSON-LD file for your request. This file can be used by repositories to process the request - it contains a full list of all aggregated objects (Collections and Datasets for v1.5), their metadata, and their relationships (including the collection/dataset hierarchy). These objects are listed with an ORE:similarTo URL pointing at the content itself (for 1.5, this is the restful endpoint to the live binary content for a dataset, which, since only metadata about a dataset can change, is also the content for the version being published).

A repository can find requests for it at /cp/repositories/<orgidentifier>/researchobjects and can again retrieve the full request and OREMap as above. As a repository processes the request, it should add status messages using a POST to /cp/researchobjects/<Identifier>/status with a message of the form: { "reporter": "SEAD-CP", "stage": "Receipt Acknowledged", "message": "request recorded and processing will begin"} , replacing the values as appropriate. The status, with a timestamp generated by the service will be added. The full status chain for a request can be seen doing a GET to /cp/researchobjects/<Identifier> /status.

A message with "stage":"Success", and a "message" with the final identifier should be sent to finish processing. (The services will then notify the project space to add the DOI and reset the collection to be ready for the next version to be published. This is not yet implemented in the Beta Test Space. As soon as a repository manages to support the walkthrough listed here, and this final call is added to the Beta Test Space, the full publication process can be exercised.)

Matchmaking:

To test matchmaking, the simplest approach, given that there are no staging areas yet, is to copy an existing publication request entry and remove the Affiliations, Repository, and Status entries. The JSON object with the remaining Aggregation and Preferences entries can be POSTed to /cp/researchobjects/matchingrepositories to receive a json document with the scores. Note that rules will only fire if the repository profile has the appropriate value as listed in /cp/researchobjects/matchingrepositories/rules , i.e. to limit dataset size, a repository profile must include "Max Dataset Size":<long value in bytes>. The score document includes a message describing whether a rule fired or not and why the result is what it is.

(Posting the content of https://sead-test.ncsa.illinois.edu/sead-cp/cp/researchobjects/tag:sead-data.net,2015:RO_23hQD7ujKGLL0-BqJ1CTTQ as is to the matching endpoint should work. Nominally, a matching request would not include status or a repository choice, but these fields are ignored, so you don't have to delete them. To make this work, be sure that the content type and accept type are both application/json. The response I receive from this is shown below:

 This depends on the current content for that curation object, the current profile for the Bob repository, and the active current rule set. )

Behind the scenes, these rules are retrieving statistical summary information from the project space by using the ORE:similarTo link in the POSTed JSON, which refers to the live collection being aggregated and  adding "/stats" to the end of it to invoke the statistics endpoint for that collection. The response from this includes information including, for example, the maximium dataset size in the research object/collection which can then be compared against the repository's requirement. You can make the same request manually to see the information returned.

People Management

In this walkthrough, all management of people and their profile harvesting from ORCID occurs automatically. The endpoints described above can be used to view this information and/or to manually add/update/delete additional profiles. These will eventually be used to support profile harvesting when a user joins SEAD or adds their ORCID ID, and to enable profiles to be searched to enable matching of names as dc:creator metadata is types (as we do with VIVO IDs today).

 

  • No labels