Definitions

Live Objects: Dataset and Collections in a Space. Anybody can change/add/modify.

Curation Object: Object being modified for publication by the curators.

Publication Object:  Curation Object + DOI (Digital Object Identifier). The returned object from the repository. 

Process
1.  Curation Area - Curator Picks some of the Live Objects and start the process of submitting that dataset or collection. 

Design Specification:

Design Question: What is a curation object, with respect to Clowder? 

Design Answer: It is an object that holds copies of the live objects. The curation object contains what was selected from the live objects and is put into the staging area as an object that the user will be working on. There is an entry in the staging area with all the information that was available when the user added it to the staging area. 

Design Question: What happens if the original dataset/collection is updated?  Do we keep track in the curation dataset to the live object? 

Design Answer: The curation Dataset/Collection has a link to the original dataset/collection. They will have the same Id, but stored in different places. This link should be bi-directional as the original Live Objects will have a reference to the Curation Object that is in process, or the Publication Object, after the process is finished.

Design Question: There will be a staging area per space. Is there also a staging area that is private?  

Design Answer: Pending

2.  Matchmaker - The user 'asks' the program what repositories will accept the curation object. 

Design Specification

Design Question: If, during the curation flow, the live object is updated and I want to update my curation object, can this be done or should it be done? Or is it best to just force them to start a new flow.

Design Answer: Initial answer was to make them start over. After some discussion and later in the conversation, that seems to have changed, so To Be Determined? It should be noted that these curation objects can be long-lived and that could mean weeks or possibly months.

 

3. C3PO

Design Specification. 

TO BUILD:

Discussion Points:

Sprint 1 - Curation Projects. 

Optional: 

 

Sprint 2 - Matchmaker Calls 

 

Sprint 3 - C3PR Calls. 

 ------------------------------------------------------------------------------------------------------------------------------------------------------------

Sprint tasks:

  1. Staging area per space [Indira]
    1. standalone plugin
  2. Create curation object [Yan]
    1. Select  dataset and collections from space [Luigi]
  3. Submit for publication (separate plugins?) [Rob]
  1. Call matchmaker and pick repository (separate plugins?)
  2. Refine metadata
  3. Store user preferences for publication in profile
  4. Store published object (everyone in space can see them)
  1. Edit curation object
  2. List curation objects and published objects that a dataset/collection are part of
Steps
  1. Create curation object
  2. Matchmaker query and selection
  3. Editing of metadata and submission to repository
Questions
  1. Who can see the curation objects?
Background
  1. curation object -> publication object
  2. repositories preferences
    1. a repository says what options it provides
    2. user might have preferences set in their profiles in the spaces
      1. generally speaking I want things free
      2. but in one instance I might be willing to pay
  3. attributes of the content vs attributes of the repositories
    1. "I would like"
    2. "I have images"
  4. "if my dataset doesn't have license, assign creative common"
  5. preference / requirements

Board Picture 

 

Reference Pages 

 

 

 

 

Sample Rules from Inna

1. "Affiliation match – string match

2. Max collection hierarchy - # subfolders / 1 means flat structure

3. Max collection size – number in GB / no limit

4. Max file size – number in GB / no limit

5. Preferred formats – list / any (if list, also need to check whether conversion is an option)

6. Metadata required – none / structured (fields required) / un-structured (data description or readme) / both required

7. Domain match – string match

8. Packaging - preferred type (zip, tar, gz, bagit) / packaging not preferred

9. Versions accepted – new submissions, updates, special versions, all

10. Confidential info in data – acknowledgement required / not required

11. Copyrighted material – acknowledgement required / not required

12. Data License – required / not required (type - repository specific use agreement / any type)

13. Depositor Agreement – required / not required (need to prepared or accepted at the time of matchmaking?)

14. Access – open, restricted, embargo, enclave

 

It seems to me that at least some of this information would be generated in the staging area via clicking on some checkboxes. Some of it will come from project spaces metadata fields, but I'm not sure we have appropriate predicates. I'm still unclear about how to find them. I saw that all the rules that already exist have "http://sead-data.net/terms/" as part of their implementation. Does it mean we’ll rely on our own vocabulary for all the rules?"

 

Curation Statuses


Other curations: does it open in an app? is there a code book? are variables named properly? Rearrange folders

------------------------

09/02/2015 Notes: 

Discussion: 

1) Only the owner  of a dataset or a member of a space with the EditStagingArea Permission can see the publish button for a dataset. 

2) Discussion about when a dataset is selected for publication. The curation object associated with the dataset is displayed on the staging area for all the spaces the dataset is in.

3) Discussed a new possible flow for the application. Where the flow starts with the submit page with a pre/selected repository and the matchmaker results displayed. The preselected repository can be either the user's preferred repository or the  last repository used. The page could have a message saying "Please select a repository" if none of the above cases is possible. There will be a button next to the repository name to change the repository which will lead to the current Matchmaker page. The Edit Metadata page will be accessible via the link in left navigation. A validation button should also be available in the main page to verify that the dataset is ready for publication. 

 

Reference API

https://sead-test.ncsa.illinois.edu/sead-cp/

https://github.com/qqmyers/sead2