Use an existing search engine (e.g., Solr/Lucene) to index the re3data
Create a test collection of datasets/queries/relevance judgements
1. This can be done manually (find a set of researchers to give us a dataset and/or query and the repository they seleted)
2. This can be done automatically by sampling datasets from existing repositories and assume that these are the "most relevant"
Develop demonstration UI

The end product will be a search engine that merges the re3data, biosharing (if available), funder and publisher lists along with models of relevance.

Analysis

What tools already exist in this space?

...

Find researchers with real datasets and have them identify the top repositories from re3data (possible future IRB study)?
For some subset of repositories, go find a dataset.