Idea box

What else can we do?

Look at the data structure, currently we treat it as unstructured. What about weighting fields?
Subset the expansion collections
Find other expansion collections (SNOMED/UMLS?)
Expand datasets with their repositories or associated publications
Use MeSH and/or Wikipedia categories to work around vocabulary mismatch
Use Boolean queries to ensure high precision
- Take the word "estrogen," which appears in some queries. It probably has a reasonably high IDF, since most documents won't be about estrogen, but its TF is probably not necessarily indicative of a document's relevance, since estrogen is relevant to many unrelated biomedical subjects. However, if both "estrogen" and "cancer" (which appear in the query together) appear in a document, then it may be more safe to assume that the TF of "estrogen" is indicative of document relevance (to breast cancer, in this case).

Other thoughts

Analyze the relevance judgments and the search results
- Are we seeing a lot of unjudged but relevant results with anything beyond a baseline model?
- Is there a systematic difference between the performance of the train and test queries?

Space shortcuts