Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

This page is intended to capture information related to

Jira
serverJIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-211
.

...

  1. Use an existing search engine (e.g., Solr/Lucene) to index the re3data
  2. Create a test collection of datasets/queries/relevance judgements
    1. This can be done manually (find a set of researchers to give us a dataset and/or query and the repository they seleted)
    2. This can be done automatically by sampling datasets from existing repositories and assume that these are the "most relevant"
  3. Develop demonstration UI

Analysis

What tools already exist in this space?

Registries of Research Data Repositories

The end product will be a search engine that merges the re3data, biosharing (if available), funder and publisher lists along with models of relevance.

Search Engine

We can use either a research-oriented (Indri/Galago/Terrier) or general-purpose (Lucene) search engine platform. The goal would be to identify features/characteristics of repositories that can be used to improve rankings, aside from basic language models.

Potential features:

  • Retrieval score based on name, description, subject, information crawled from associated URLs, keywords, 
  • language, startDate, size
  • URL format (e.g. presence of non-standard ports, path depth)
  • # results in Google scholar
  • How much info in re3data (how complete is the record)?
  • Number of policies

Test collection:

A key requirement will be to be able to evaluate the retrieval model, which requires a suitable test collection.  For NDSC6, we would just pilot this.

  • Find researchers with real datasets and have them identify the top repositories from re3data?
  • For some subset of repositories, go find a dataset. 

Background/Analysis

What tools already exist in this space?

Registries of Research Data Repositories

RegistryDescriptionNotes
Re3DataRegistry of research data repositories

Started from Databib, crowd-sourced.

Metadata is too general for search; user feedback "precision is horrible"; not based on natural language

Biosharing.orgRegistry of databases and policies for life/environmental/bio sciences

Schema based on BioDBCore: http://biocuration.org/community/standards-biodbcore/

Data is not available, but will be.

BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

CinergiCommunity Inventory of EarthCube Resources for Geosciences Interoperability

Curated database of geoscience information resources

OpenAIREOpenAIRE data provider searchPublishes guidelines for data archives
LA Referencia  
bioCADDIEData discovery indexIndex of data "do for data what pubmed did for literature"
OpenDOAR

Directory of open-access repositories

 
SHARE 

Index of research activities/outputs including data management plans,

RegistryDescriptionNotes
Re3DataRegistry of research data repositories

Started from Databib, crowd-sourced.

Metadata is too general for search; user feedback "precision is horrible"; not based on natural language

Biosharing.orgRegistry of databases and policies for life/environmental/bio sciences

Schema based on BioDBCore: http://biocuration.org/community/standards-biodbcore/

Data is not available, but will be.

BioSharing: curated and crowd-sourced metadata standards, databases and data policies in the life sciences

CinergiCommunity Inventory of EarthCube Resources for Geosciences Interoperability

Curated database of geoscience information resources

OpenAIREOpenAIRE data provider searchPublishes guidelines for data archives
LA Referencia  
bioCADDIEData discovery indexIndex of data "do for data what pubmed did for literature"
OpenDOAR

Directory of open-access repositories

 
SHARE 

Index of research activities/outputs including data management plans, grant proposals, preprints, presentations, and data repository deposits

...

  1. Do researchers come to you looking for places to put their data?
    1. Of those that come to you, do you have some estimate of the percentage of those that eventually do find a place to put their data?
  2. Thinking about the researchers that come to you, what is the typical consultation like? What types of questions or concerns do they have?
  3. Do you notice any common challenges or themes across the campus for researchers looking for places to deposit data?
  4. What are some of the tools you recommend and how well do they meet the needs of the researcher?
  5. Do you have any ideas of tools or services that could help you/them better?
  6. We’re thinking of this service (describe current vision of recommender), what do you think? Would it be useful?Are there any departments/researchers/labs that you think are representative of this problem that we could talk to? (Looking for most common cases)
  7. Is there anyone else working in this space that you think we should talk to?

Potential features:

  • Retrieval score based on name, description, subject, information crawled from associated URLs, keywords, 
  • language, startDate, size
  • URL format (e.g. presence of non-standard ports, path depth)
  • # results in Google scholar
  • How much info in re3data (how complete is the record)?
  • Number of policies

Test collection:

  • Find researchers with real datasets and have them identify the top repositories from re3data (possible future IRB study)?
  • For some subset of repositories, go find a dataset. 

References

  1. Are there any departments/researchers/labs that you think are representative of this problem that we could talk to? (Looking for most common cases)
  2. Is there anyone else working in this space that you think we should talk to?

References


Elsevier. Supported Data Repositories.

Myers, Jim. (2016). SEAD 2.0 Publication API Walkthrough:.

Nature. Availability of data and materials.

PLOS ONE. Data availability.

UI RDS. Saving and Sharing your Data.

Whyte, A. (2015). ‘Where to keep research data: DCC checklist for evaluating data repositories’ v.1.1 Edinburgh: Digital Curation Centre. DCC. Where to keep research data.