Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is intended to capture information related to

Jira
serverJIRA
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
keyNDS-211
.Background 

 

Background

What tools already exist in this space?

Registries of Research Data Repositories

  Notes
Re3Data Metadata is too general for search; precision is horrible; not based on natural language
Biosharing.org  
Cinergi  
OpenAIRE  
LA Referencia  

There are (at least) two major registries of research data repositories.  Publishers and funding agencies often direct researchers to search for repositories using these tools:

Publishers refer to both in their lists of recommended repositories, but both services appear to be intended for librarians, curators, publishers and funding agencies instead of the average researcher. The re3data is easily available for download and could be incorporated into our system. It's not clear whether the Bioshare data is available (technically, it could be crawled).

...

  • Many of the data repositories are crawl-able or implement standard APIs (OAI-PMH) for harvesting metadata.  It might be interesting to consider whether we can harvest descriptive metadata – particularly citation information – and use journal or other publication metadata as part of the recommendation process. 

What would make the existing tools better?

  • Natural language search
  • Ranking basic on different characteristics
    • Does it support my (identifier, metadata, etc)
    • Is it trusted (sustainability/certification). How long is the commitment?
    • Repository "impact factor"
    • Additional value adds (curatorial, linked)
    • Specialized vs geneal

Analysis

Reviewing the above publisher lists and registries, we can identify factors in the recommendation of repositories to researchers:

FactorDescription
Funding agency approvalFunding agencies (e.g. NIH) have lists of approved repositories
Researcher communitiesSome repositories restrict to researchers in certain communities
Publisher integrationPublishers (e.g., Elsevier) have arrangements with repositories (e.g., bi-directional linking)
DomainRepositories are often restricted by domain, with some generalist services
Technical restrictionsRepositories have technical restrictions (e.g., maximum file size, supported formats)
Community mandatesSome research communities have mandated repositories (see Nature list)
Data type

Some repositories are restricted to specific types of data. These criteria vary, for example:

    • Protein structures
    • Human or non-human derived
    • Phenotypes

Data types are often directly related to domain/field of study.

Metadata formatSome repositories are restricted to specific types of metadata (e.g., MIAME)

 

Publishers, funding agencies, and libraries construct these lists of approved repositories to meet the needs of researchers, Many of these sites now link to centralized services, such as re3data.org. However, re3data.org does not capture all of the information needed to make a recommendation (e.g., technical restrictions).

Use cases

Who are the users?

Researchers with data and they don't know where to put it, for various reasons.

UserSituation
No community repositoryThe researcher is in a community without a repository
Doesn't fit neatlyA researcher is becoming interdisciplinary, moving to a new discpline, or has data they think might be useful for other disciplines
Novice/lazyNew research not aware of existing resources (note, most advice would come from social media, conferences, training)

What are their motivations?

  • Responding to request from funding agency. Might need different characteristics (needs DOI, linking etc)
  • Has very large data (university can't handle it, domain repos can't handle it)
  • Has specific availability requirements (5 years, 10 years)
  • Is really complicated (has a lot of contextual information, does the service support it)
  • Sharing – not responding to regulatory requirement – just wants to make things available for reuse

Use cases

Q. Who are the users? While the re3data and biosharing sites seem more targeted at experts, perhaps our service is targeted at the novice researcher?

...