You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Following are the steps that need to be carried out by a repository in order to get registered with SEAD and accept/publish collections from SEAD.

  1. Obtain access to SEAD
  2. Register the repository profile
  3. Develop a client to pull and publish the collections

Obtain access to SEAD

First the repository needs to contact SEAD and provide the IP addresses/subnets of the repository servers that will be accessing the SEAD services. Then the SEAD team will grant access to the SEAD services for those IP addresses. 

Following endpoint will be accessible by the repository afterwards;
https://seadva-test.d2i.indiana.edu/sead-c3pr/

Register the repository profile

Repositories need to create a profile in the JSONLD format which includes basic profile information and the other information that will be used by SEAD Matchmaker when recommending repositories for the objects. Repository profile must have an “orgidentifier”. This is the ID of your repository that will be used by SEAD.

Following is a sample JSONLD profile of a repository;

{"@context": ["http://re3data.org/",
  {
    "Max Dataset Size": "http://sead-data.net/terms/maxdatasetsize",
    "Rights Holder IDs Required": "http://sead-data.net/terms/RightsHolderIdsRequired",
    "Total Size": "tag:tupeloproject.org,2006:/2.0/files/length",
    "Max Collection Depth": "http://sead-data.net/terms/maxcollectiondepth",
    "motto": "http://bobs.asseenon.tv/terms/motto",
    "Affiliations": "http://sead-data.net/terms/affiliations",
    "Data Mimetypes": "http://purl.org/dc/elements/1.1/format",
    "Metadata Terms": "http://sead-data.net/terms/terms"
  }
],
  "Max Dataset Size": "1000",
  "@type": "repository",
  "Total Size": "10000000",
  "orgidentifier": "bob",
  "repositoryURL": "http://http://www.nationaldataservice.org/projects/labs.html",
  "Rights Holder IDs Required": true,
  "Max Collection Depth": "10",
  "motto": "Our profile is up to date, so we have to be good",
  "repositoryName": "SEAD NDS Labs Publisher (Proof-of-Concept)",
  "Affiliations": [
    "SEAD",
    "NDS Members"
  ],
  "Data Mimetypes": ["text/csv"],
  "Metadata Terms": [
    "http://purl.org/dc/terms/creator",
    "http://purl.org/dc/terms/abstract",
    "http://sead-data.net/vocab/test/doesntexist"
  ]
}

Currently SEAD Matchmaker considers the following profile information when making the recommendation;

  • “Data Mimetypes” - Acceptable types of files that the collection can contain
  • “Max Collection Depth” - Acceptable maximum collection depth
  • “Max Dataset Size” – Maximum size of individual files in the collection
  • “Total Size” – Total acceptable size of the collection
  • “Metadata Terms“- Minimum metadata fields that a collection should contain
  • “Affiliations” - Organizations that a collection should be affiliated with
  • “Rights Holder IDs Required” - Whether the collections should have a valid global identifier(ORCID, Clowder or Google ID) for the "Rights Holder" metadata.

Once the repository profile is created, it should be registered with SEAD by sending a POST request to the following endpoint with the JSONLD profile as the POST body.

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories

Once registered, the newly registered repository will be listed in the above endpoint. You will be able to receive the full profile using the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories/<orgidentifier_of_your_repository>

Develop a client to pull and publish the collections

When SEAD receives publish requests for a repository, it keeps a list of those requests in PDT. If the repository need to access the collections and publish them, it need to implement an agent that handles the metadata/data retrieval and status updates. You can refer to https://github.com/Data-to-Insight-Center/sead2/tree/master/sead-nds-repository for a sample agent code.

Following are the operations that need to be implemented in the agent;

Retrieve collections from SEAD

The collections queued for your repository can be retrieved from the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories/<orgidentifier>/researchobjects

If you want only the new collections (the collections that have not yet been started to process by the repository), query the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/repositories/<orgidentifier>/researchobjects/new

These endpoints return a list of collections that are queued for your repository. The collection identifier can be found in ‘Aggregation.Identifier’ element in each JSON object that represents a collection. You can access the OREMap(A single JSONLD file that has all the collection metadata) of a collection using the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/researchobjects/<collection_id>/oremap

Accept the collection and publish to a repository

If you like to publish a collection in your queue, you first required to send a status message to SEAD to indicate that you have started processing this request. For this you should send a POST request to the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/researchobjects/<collection_id>/status

 with the post body;

{
	"reporter": "your_org_identifier",
	"stage": "Pending",
	"message": "The repository is now processing this request",
	"date": "Apr 14, 2016 12:29:15 PM"
}

Once you send this status message to SEAD with the “reporter”:<your_orgidentifier> element, the /repositories/<orgidentifier>/researchobjects/new endpoint will no longer list that collection, but it will still be available in the /repositories/<orgidentifier>/researchobjects endpoint.

Then you can get the OREMap of that collection to get metadata and data download links, and implement your own code to deposit the collection in your repository. In order to communicate your status to the submitter, you can update more status using the above mentioned endpoint and using the same JSON format.

Complete publishing the collection;

Once you have successfully published the collection in the repository, you are required to send a status message to the above mentioned endpoint, and with the POST body as follows;

{
	"reporter": "your_org_identifier",
	"stage": "Success",
	"message": "<PID of the collection>",
	"date": "Apr 14, 2016 12:29:15 PM"
}

The value for “stage” should be “Success” and the “message” should contain the PID of the collection. PID is a globally resolvable permanent identifier for the collection. You can use SEAD services to create a DOI(Digital Object Identifier) for the collection by sending a POST request to the following endpoint;

https://seadva-test.d2i.indiana.edu/sead-c3pr/api/doi

with the post body as follows;

{
	"target":"http://landing_page_of_the_collection", 
	"metadata":
	{
		"title":"<title_of_the_collection>",
		"creator":"<creator>",
		"pubDate":"<publication date>"
	}, 
	"permanent":"false"
}

"target" is a required field and it should be the landing page of the collection in your repository. Set "permanent" to 'true' only if you need to create a permanent DOI. Otherwise it will create a temporary DOI which will expire in two weeks.

 

  • No labels