...
- Use an existing search engine (e.g., Solr/Lucene) to index the re3data
- Create a test collection of datasets/queries/relevance judgements
- This can be done manually (find a set of researchers to give us a dataset and/or query and the repository they seleted)
- This can be done automatically by sampling datasets from existing repositories and assume that these are the "most relevant"
- Develop demonstration UI
The end product will be a search engine that merges the re3data, biosharing (if available), funder and publisher lists along with models of relevance.
Analysis
What tools already exist in this space?
...
- Find researchers with real datasets and have them identify the top repositories from re3data (possible future IRB study)?
- For some subset of repositories, go find a dataset.
References
Elsevier. Supported Data Repositories.
Myers, Jim. (2016). SEAD 2.0 Publication API Walkthrough:.
Nature. Availability of data and materials.
PLOS ONE. Data availability.
UI RDS. Saving and Sharing your Data.
DCC. Whyte, A. (2015). ‘Where to keep research data: DCC checklist for evaluating data repositories’ v.1.1 Edinburgh: Digital Curation Centre.