Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

How is this problem currently addressed? We can find a few cases in the wild:

ServiceData Repository Recommendation
U of I Research Data Service"Deposition of data into a web-accessible repository is generally the preferred mechanism for public data sharing because it ensures wide-spread and consistent access to the data.  If your discipline already has a trusted repository, we recommend you deposit where your community knows to look.  To find a repository, re3data.org is a large, vetted, and searchable catalog of data repositories.  If no discipline-specific repository exists, there are several options, including Illinois’ IDEALS repository (free) and other general-purpose repositories like DataDryad (fee-based)."
ElsevierList of supported data repositories
Nature

Data availability policy

"Supporting data must be made available to editors and peer-reviewers at the time of submission for the purposes of evaluating the manuscript...For information about suitable public repositories, see sections that follow."

PLOS

PLOS Data Repository Recommendation Guide

"PLOS has identified a set of established repositories below, which are recognized and trusted within their respective communities. Additionally, the Registry of Research Data Repositories (Re3Data) is a full scale resource of registered repositories across subject areas. " 

A researcher at the U of I looking for a repository to publish their data has several options: select a field-specific repository based on funding agency or publisher requirements from curated lists, search re3data.org, or use their local institutional repository.

...

Is it really a "recommender"? Broadly speaking, a "recommender system" attempts to predict the relevance of an item to a user based on information known about the user. This could be profile information, previous ratings or related activities.  It is more likely that this system will be a "search engine" in the sense that the user comes with an information need and is looking for a ranked list of candidate repositories. The information need might be a query or the dataset itself.

Broad vision

  • Start with re3data and biosharing.org records as core
  • Develop and test priors based on repository attributes

 

Analysis

What tools already exist in this space?

...

  1. Do researchers come to you looking for places to put their data?
    1. Of those that come to you, do you have some estimate of the percentage of those that eventually do find a place to put their data?
  2. Thinking about the researchers that come to you, what is the typical consultation like? What types of questions or concerns do they have?
  3. Do you notice any common challenges or themes across the campus for researchers looking for places to deposit data?
  4. What are some of the tools you recommend and how well do they meet the needs of the researcher?
  5. Do you have any ideas of tools or services that could help you/them better?
  6. We’re thinking of this service (describe current vision of recommender), what do you think? Would it be useful?
  7. Are there any departments/researchers/labs that you think are representative of this problem that we could talk to? (Looking for most common cases)
  8. Is there anyone else working in this space that you think we should talk to?

Potential features:

  • Retrieval score based on name, description, subject, information crawled from associated URLs, keywords, 
  • language, startDate, size
  • URL format (e.g. presence of non-standard ports, path depth)
  • # results in Google scholar
  • How much info in re3data (how complete is the record)?
  • Number of policies

Test collection:

  • Find researchers with real datasets and have them identify the top repositories from re3data (possible future IRB study)?
  • For some subset of repositories, go find a dataset.