Live Objects: Dataset and Collections in a Space. Anybody can change/add/modify.
Curation Object: Object being modified for publication by the curators.
Publication Object: Curation Object + DOI (Digital Object Identifier). The returned object from the repository.
Process
1. Curation Area - Curator Picks some of the Live Objects and start the process of submitting that dataset or collection.
Design Specification:
Design Question: What is a curation object, with respect to Clowder?
Design Answer: It is an object that holds copies of the live objects. The curation object contains what was selected from the live objects and is put into the staging area as an object that the user will be working on. There is an entry in the staging area with all the information that was available when the user added it to the staging area.
Design Question: What happens if the original dataset/collection is updated? Do we keep track in the curation dataset to the live object?
Design Answer: The curation Dataset/Collection has a link to the original dataset/collection. They will have the same Id, but stored in different places. This link should be bi-directional as the original Live Objects will have a reference to the Curation Object that is in process, or the Publication Object, after the process is finished.
Design Question: There will be a staging area per space. Is there also a staging area that is private?
Design Answer: Pending
2. Matchmaker - The user 'asks' the program what repositories will accept the curation object.
Design Specification
User has choices for some available, and selection boxes/options regarding requirements, and matchmaker comes back with options.
Need to study and determine how matchmaker takes those rules in.
Design Question: If, during the curation flow, the live object is updated and I want to update my curation object, can this be done or should it be done? Or is it best to just force them to start a new flow.
Design Answer: Initial answer was to make them start over. After some discussion and later in the conversation, that seems to have changed, so To Be Determined? It should be noted that these curation objects can be long-lived and that could mean weeks or possibly months.
3. C3PO
Design Specification.
TO BUILD:
Discussion Points:
Sprint 1 - Curation Projects.
Optional:
Sprint 2 - Matchmaker Calls
Sprint 3 - C3PR Calls.
------------------------------------------------------------------------------------------------------------------------------------------------------------
Sprint tasks:
Board Picture
Reference Pages
1. "Affiliation match – string match
2. Max collection hierarchy - # subfolders / 1 means flat structure
3. Max collection size – number in GB / no limit
4. Max file size – number in GB / no limit
5. Preferred formats – list / any (if list, also need to check whether conversion is an option)
6. Metadata required – none / structured (fields required) / un-structured (data description or readme) / both required
7. Domain match – string match
8. Packaging - preferred type (zip, tar, gz, bagit) / packaging not preferred
9. Versions accepted – new submissions, updates, special versions, all
10. Confidential info in data – acknowledgement required / not required
11. Copyrighted material – acknowledgement required / not required
12. Data License – required / not required (type - repository specific use agreement / any type)
13. Depositor Agreement – required / not required (need to prepared or accepted at the time of matchmaking?)
14. Access – open, restricted, embargo, enclave
It seems to me that at least some of this information would be generated in the staging area via clicking on some checkboxes. Some of it will come from project spaces metadata fields, but I'm not sure we have appropriate predicates. I'm still unclear about how to find them. I saw that all the rules that already exist have "http://sead-data.net/terms/" as part of their implementation. Does it mean we’ll rely on our own vocabulary for all the rules?"
Curation Statuses
Other curations: does it open in an app? is there a code book? are variables named properly? Rearrange folders
------------------------
09/02/2015 Notes:
Discussion:
1) Only the owner of a dataset or a member of a space with the EditStagingArea Permission can see the publish button for a dataset.
2) Discussion about when a dataset is selected for publication. The curation object associated with the dataset is displayed on the staging area for all the spaces the dataset is in.
3) Discussed a new possible flow for the application. Where the flow starts with the submit page with a pre/selected repository and the matchmaker results displayed. The preselected repository can be either the user's preferred repository or the last repository used. The page could have a message saying "Please select a repository" if none of the above cases is possible. There will be a button next to the repository name to change the repository which will lead to the current Matchmaker page. The Edit Metadata page will be accessible via the link in left navigation. A validation button should also be available in the main page to verify that the dataset is ready for publication.
https://sead-test.ncsa.illinois.edu/sead-cp/
https://github.com/qqmyers/sead2