Use an existing search engine (e.g., Solr/Lucene) to index the re3data
Create a test collection of datasets/queries/relevance judgements
1. This can be done manually (find a set of researchers to give us a dataset and/or query and the repository they seleted)
2. This can be done automatically by sampling datasets from existing repositories and assume that these are the "most relevant"
Develop demonstration UI

The end product will be a search engine that merges the re3data, biosharing (if available), funder and publisher lists along with models of relevance.

Search Engine

We can use either a research-oriented (Indri/Galago/Terrier) or general-purpose (Lucene) search engine platform. The goal would be to identify features/characteristics of repositories that can be used to improve rankings, aside from basic language models.

Potential features:

Retrieval score based on name, description, subject, information crawled from associated URLs, keywords,
language, startDate, size
URL format (e.g. presence of non-standard ports, path depth)
# results in Google scholar
How much info in re3data (how complete is the record)?
Number of policies

Test collection:

A key requirement will be to be able to evaluate the retrieval model, which requires a suitable test collection. For NDSC6, we would just pilot this.

Find researchers with real datasets and have them identify the top repositories from re3data?
For some subset of repositories, go find a dataset.

Background/Analysis

What tools already exist in this space?

...

Do researchers come to you looking for places to put their data?
1. Of those that come to you, do you have some estimate of the percentage of those that eventually do find a place to put their data?
Thinking about the researchers that come to you, what is the typical consultation like? What types of questions or concerns do they have?
Do you notice any common challenges or themes across the campus for researchers looking for places to deposit data?
What are some of the tools you recommend and how well do they meet the needs of the researcher?
Do you have any ideas of tools or services that could help you/them better?
We’re thinking of this service (describe current vision of recommender), what do you think? Would it be useful?Are there any departments/researchers/labs that you think are representative of this problem that we could talk to? (Looking for most common cases)
Is there anyone else working in this space that you think we should talk to?

Potential features:

Retrieval score based on name, description, subject, information crawled from associated URLs, keywords,
language, startDate, size
URL format (e.g. presence of non-standard ports, path depth)
# results in Google scholar
How much info in re3data (how complete is the record)?
Number of policies

Test collection:

Find researchers with real datasets and have them identify the top repositories from re3data (possible future IRB study)?
For some subset of repositories, go find a dataset.

References

Are there any departments/researchers/labs that you think are representative of this problem that we could talk to? (Looking for most common cases)
Is there anyone else working in this space that you think we should talk to?

References

Elsevier. Supported Data Repositories.

Myers, Jim. (2016). SEAD 2.0 Publication API Walkthrough:.

Nature. Availability of data and materials.

PLOS ONE. Data availability.

UI RDS. Saving and Sharing your Data.

Whyte, A. (2015). ‘Where to keep research data: DCC checklist for evaluating data repositories’ v.1.1 Edinburgh: Digital Curation Centre. DCC. Where to keep research data.

Space shortcuts

Page tree

Versions Compared

Old Version 33

New Version Current

Key

Search Engine

Potential features:

Test collection:

Background/Analysis

What tools already exist in this space?

Potential features:

Test collection:

References

References

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 33

New Version Current

Key

Search Engine

Potential features:

Test collection:

Background/Analysis

What tools already exist in this space?

Potential features:

Test collection:

References

References