This page is intended to capture information related to
Jira | ||||||||
---|---|---|---|---|---|---|---|---|
|
Background
What tools already exist in this space?
Registries of Research Data Repositories
Notes | ||
---|---|---|
Re3Data | Metadata is too general for search; precision is horrible; not based on natural language | |
Biosharing.org | ||
Cinergi | ||
OpenAIRE | ||
LA Referencia |
There are (at least) two major registries of research data repositories. Publishers and funding agencies often direct researchers to search for repositories using these tools:
- http://www.re3data.org/
- https://biosharing.org/
- The schema is based on BioDBCore: http://biocuration.org/community/standards-biodbcore/,
- License: Creative Commons by Share Alike 4.0
- See also:
- http://cinergi.sdsc.edu/ (used by EarthCube, includes re3data?)
- http://www.share-research.org/
- https://www.openaire.eu/
- http://lareferencia.redclara.net/rfr/
Publishers refer to both in their lists of recommended repositories, but both services appear to be intended for librarians, curators, publishers and funding agencies instead of the average researcher. The re3data is easily available for download and could be incorporated into our system. It's not clear whether the Bioshare data is available (technically, it could be crawled).
...
- Many of the data repositories are crawl-able or implement standard APIs (OAI-PMH) for harvesting metadata. It might be interesting to consider whether we can harvest descriptive metadata – particularly citation information – and use journal or other publication metadata as part of the recommendation process.
What would make the existing tools better?
- Natural language search
- Ranking basic on different characteristics
- Does it support my (identifier, metadata, etc)
- Is it trusted (sustainability/certification). How long is the commitment?
- Repository "impact factor"
- Additional value adds (curatorial, linked)
- Specialized vs geneal
Analysis
Reviewing the above publisher lists and registries, we can identify factors in the recommendation of repositories to researchers:
Factor | Description |
---|---|
Funding agency approval | Funding agencies (e.g. NIH) have lists of approved repositories |
Researcher communities | Some repositories restrict to researchers in certain communities |
Publisher integration | Publishers (e.g., Elsevier) have arrangements with repositories (e.g., bi-directional linking) |
Domain | Repositories are often restricted by domain, with some generalist services |
Technical restrictions | Repositories have technical restrictions (e.g., maximum file size, supported formats) |
Community mandates | Some research communities have mandated repositories (see Nature list) |
Data type | Some repositories are restricted to specific types of data. These criteria vary, for example:
Data types are often directly related to domain/field of study. |
Metadata format | Some repositories are restricted to specific types of metadata (e.g., MIAME) |
Publishers, funding agencies, and libraries construct these lists of approved repositories to meet the needs of researchers, Many of these sites now link to centralized services, such as re3data.org. However, re3data.org does not capture all of the information needed to make a recommendation (e.g., technical restrictions).
Use cases
Who are the users?
Researchers with data and they don't know where to put it, for various reasons.
User | Situation |
---|---|
No community repository | The researcher is in a community without a repository |
Doesn't fit neatly | A researcher is becoming interdisciplinary, moving to a new discpline, or has data they think might be useful for other disciplines |
Novice/lazy | New research not aware of existing resources (note, most advice would come from social media, conferences, training) |
What are their motivations?
- Responding to request from funding agency. Might need different characteristics (needs DOI, linking etc)
- Has very large data (university can't handle it, domain repos can't handle it)
- Has specific availability requirements (5 years, 10 years)
- Is really complicated (has a lot of contextual information, does the service support it)
- Sharing – not responding to regulatory requirement – just wants to make things available for reuse
Use cases
Q. Who are the users? While the re3data and biosharing sites seem more targeted at experts, perhaps our service is targeted at the novice researcher?
...