Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Obtain access to SEAD system
  2. Register the repository profile
  3. Develop a client to pull and publish the collections

1. Obtain access to SEAD system

To obtain access to SEAD system, please contact SEAD and provide the IP addresses/subnets of the repository servers that will be accessing SEAD services. The SEAD team will add those addresses to its registry of partner repositories and grant access to SEAD services for those IP addresses. 

Once registered, the following endpoint will be accessible by the repository: https://seadva-test.d2i.indiana.edu/sead-c3pr/.

2. Register the repository profile

Repositories need to create a profile in the JSON-LD format (http://json-ld.org/) that will be used by the SEAD Matchmaker in its pairing between datasets and repositories. The profile should include the following information:

...

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories/<orgidentifier_of_your_repository>

3. Develop a client to pull and publish the collections

When SEAD receives publication requests from its users for a repository, it keeps a list of those requests. To access the collections and publish them, partner repositories need to implement an agent that handles the metadata and data retrieval and status updates. You can refer to https://github.com/Data-to-Insight-Center/sead2/tree/master/sead-nds-repository for a sample agent code.

The following operations need to be implemented in the agent:

Retrieve Publication Requests from SEAD

The requests queued for your repository can be retrieved from the following endpoint: https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories/<orgidentifier>/researchobjects

...

These endpoints return a list of requests that are queued for your repository. The collection identifier can be found in ‘Aggregation.Identifier’ element in each JSON object that represents a collection. You can retrieve the full request document using https://seadva-test.d2i.indiana.edu/sead-c3pr/api/researchobjects/<request_id>

Accept the collection and publish to a repository

To acknowledge receipt of a request, you're first required to send a status message to SEAD to indicate that you have started processing the particular request. For this you should send a POST request to the following endpoint: https://seadva-test.d2i.indiana.edu/sead-c3pr/api/researchobjects/<collection_id>/status

...

Once you send this status message to SEAD with the “reporter”:<your_orgidentifier> element, the /repositories/<orgidentifier>/researchobjects/new endpoint will no longer list that collection, but it will still be available in the /repositories/<orgidentifier>/researchobjects endpoint.

Process the Request

The request document contains summary information about the data to be published (e.g. title, abstract, creators, statistics about the # files, total size, included mimetypes) and information about the person making the request (e.g. ID, affiliations) and their preferences for how the request should be processed (e.g. is it a 'test'). Together, this information may be sufficient to allow you to decide to accept it and begin processing it or to reject it. To reject it at this stage or any later stage, you POST status with the "stage": "failure". SEAD will assume that no further processing will be done by the repository and will report the failure to the user in their Project Space.

...

"target" is a required field and it should be the landing page of the collection in your repository. Set "permanent" to 'true' only if you need to create a permanent DOI. Otherwise it will create a temporary DOI which will expire in two weeks.

Communicating with the Submitters

 

As you process the request, you can communicate with the submitters as required by your process. As part of the request, you will get information about the submitter and the creators of the data being published. This information may be as minimal as a name string, but may also include an email address and/or identifier. You can use this information to contact users out-of-band (e,g, by sending an email, looking up a phone number using their ID, etc.). SEAD also enables you to send status messages that the submitter (and others with access in their Project Space) will be able to see in the Staging Area as a status update.  In order to communicate other statuses to the submitter, you can update the status using the above mentioned endpoint and the same JSON format. You should always send your organization ID as the value for the "reporter" field, but you may send any value for "Stage" and message ("pending", "failure", and "success" are currently the only reserved terms for the "stage" - status messages with these stage values affect processing in SEAD and must be used as described in this document). These status messages could be informational ("stage":"In Review", "message":"Your submission is in review. This process usually takes 1-2 weeks.") or may be a call for action ("stage","Approval Required", "message":"Your publication has been accepted. Please visit "URL" to accept our terms and conditions and finalize your publication."). 

Complete publishing the collection

Once you have successfully published the collection in the repository, you are required to send a  final status message to the above mentioned endpoint, and with the POST body as follows:

...

Once this message is received, SEAD will handle registering the metadata for the new publication with external catalog(s) (currently DataOne) and returning the PID of the publication for display in the Project Space.

PID Landing Page

Once you have published the data, SEAD assumes that your repository and the PID you generated for the data become the primary means for accessing the data. At present, SEAD retains a record of the publication event including the summary information in the publication request, and may also retain a 'live' copy of the data within the submitters' Project Space. While this information is retained, SEAD may generate web pages that list these publications (e.g. as a list of datasets published through SEAD, or as lists of datasets published by specific projects) and may associate the 'live' dataset with any published versions. All of these displays will use the URL form of the PID you supply to refer viewers to the authoritative data copy you are preserving. Conversely, you are welcome to use the information provided in the request to link back to SEAD (in general, to the specific publication event, or to the live dataset, etc.) as you find useful. This could allow visitors to your repository to learn, for example, that further versions of the data have been published, that the live copies have further annotations and/or revisions since publication, that the same project has published related datasets to other repositories, etc. If you are interested, it may be possible for you to use SEAD's APIs to retrieve some of this information and incorporate it into the landing page you maintain.