What else can we do?
- Look at the data structure, currently we treat it as unstructured. What about weighting fields?
- Subset the expansion collections
- Find other expansion collections (SNOMED/UMLS?)
- Expand datasets with their repositories or associated publications
- Use MeSH and/or Wikipedia categories to work around vocabulary mismatch
- Use Boolean queries to ensure high precision
- Take the word "estrogen," which appears in some queries. It probably has a reasonably high IDF, since most documents won't be about estrogen, but its TF is probably not necessarily indicative of a document's relevance, since estrogen is relevant to many unrelated biomedical subjects. However, if both "estrogen" and "cancer" (which appear in the query together) appear in a document, then it may be more safe to assume that the TF of "estrogen" is indicative of document relevance (to breast cancer, in this case).
Other thoughts
- Analyze the relevance judgments and the search results
- Are we seeing a lot of unjudged but relevant results with anything beyond a baseline model?
- Is there a systematic difference between the performance of the train and test queries?