Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 

Table of Contents

This page is intended to capture information related to

Jira
serverJIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-211
.

...

ServiceData Repository Recommendation
U of I Research Data Service"Deposition of data into a web-accessible repository is generally the preferred mechanism for public data sharing because it ensures wide-spread and consistent access to the data.  If your discipline already has a trusted repository, we recommend you deposit where your community knows to look.  To find a repository, re3data.org is a large, vetted, and searchable catalog of data repositories.  If no discipline-specific repository exists, there are several options, including Illinois’ IDEALS repository (free) and other general-purpose repositories like DataDryad (fee-based)."
ElsevierList of supported data repositories
Nature

Data availability policy

"Supporting data must be made available to editors and peer-reviewers at the time of submission for the purposes of evaluating the manuscript...For information about suitable public repositories, see sections that follow."

PLOS

PLOS Data Repository Recommendation Guide

"PLOS has identified a set of established repositories below, which are recognized and trusted within their respective communities. Additionally, the Registry of Research Data Repositories (Re3Data) is a full scale resource of registered repositories across subject areas. " 

DCC

Where can I find a data repository?

  1. Funders: Some funders stipulate that data produced in a project they fund is offered to a speific data centre or repository identified in their policy
  2. Repository registries: Re3data, Biosharing.org
  3. Data Journals: A data journal will not normally host data itself but recommend where it should be deposited, and then link to it.  This tends to make them useful sources of advice about repositories. 
  4. Journal policies: Journals are increasingly requiring authors to deposit the data underlying their articles in a recognised repository, to complement or replace any in-house facility for supplementary materials
  5. Learned and professional societies: A society relevant to the research domain may offer advice on data sharing that includes recommendations about where to deposit dat

Problem statement

A researcher looking for a repository has many options, all of which require manual analysis: determine funding agency requirements, identify field/domain recommendations, review publisher recommendations, or search repository registries.   The NDS repository recommender will try to provide a single point where users can go to search for an appropriate A researcher at the U of I looking for a repository to publish their data has several options: select a field-specific repository based on funding agency or publisher requirements from curated lists, search re3data.org, or use their local institutional repository.

There are several existing services in this space including the Registry of Research Data Repositories (RE3Data), Biosharing.org, and the SEAD C3PR service.   In addition to these existing registries of research data repositories, funding agencies and publishers provide lists of recommended repositories.

...

  • Improved search over Re3Data through the use of priors (e.g., "trustworthiness" or some sort of impact factor)
  • Accounting for user motivations (funding agency requirements, publisher requirements, data size) through guided search
  • Suitable for use by publishers (via API or otherwise)

...

Broad vision

  • Start with re3data and biosharing.org data as core
  • Develop and test priors based on repository attributes
  • In essence, answer the research question: what makes one repository more relevant to users than another?

Is it really a "recommender"? Broadly speaking, a "recommender system" attempts to predict the relevance of an item to a user based on information known about the user. This could be profile information, previous ratings or related activities.  It is more likely that this system will be a "search engine" in the sense that the user comes with an information need and is looking for a ranked list of candidate repositories. The information need might be a query or the dataset itself.

Broad vision

  • Start with re3data and biosharing.org records as core
  • Develop and test priors based on repository attributes

...

Work plan

  1. Use an existing search engine (e.g., Solr/Lucene) to index the re3data
  2. Create a test collection of datasets/queries/relevance judgements
    1. This can be done manually (find a set of researchers to give us a dataset and/or query and the repository they seleted)
    2. This can be done automatically by sampling datasets from existing repositories and assume that these are the "most relevant"
  3. Develop demonstration UI

Analysis

What tools already exist in this space?

...

FactorDescription
Funding agency approvalFunding agencies (e.g. NIH) have lists of approved repositories
Researcher communitiesSome repositories restrict to researchers in certain communities
Publisher integrationPublishers (e.g., Elsevier) have arrangements with repositories (e.g., bi-directional linking)
Domain/FieldRepositories are often restricted by domain, with some generalist services
Technical restrictionsRepositories have technical restrictions (e.g., maximum file size, supported formats)
Community mandatesSome research communities have mandated repositories (see Nature list)
Data type

Does the repository take the data you want to deposit?

Some repositories are restricted to specific types of data. These criteria vary, for example:

    • Protein structures
    • Human or non-human derived
    • Phenotypes

Data types are often directly related to domain/field of study.

Metadata formatSome repositories are restricted to specific types of metadata (e.g., MIAME)
LicensingFree and unrestricted use or public domain (PLOS)
Best practicesRepository adhere's to best practices pertaining to responsible data sharing, digital preservation, citation, and openness (PLOS)

 

Additional factors from the DCC:

  • is a reputable repository available?
  • will it take the data you want to deposit?
  • will it be safe in legal terms?
  • will the repository sustain the data value?
  • will it support analysis and track usage?

 

Publishers, funding agencies, and libraries construct these lists of approved repositories to meet the needs of researchers, Many of these sites now link to centralized services, such as re3data.org. However, re3data.org does not capture all of the information needed to make a recommendation (e.g., C3PR technical restrictions).

...