One of the goals of the CDS and OHSUMED baseline is to find test collections that are comparable to the BioCADDIE collection, but for full-text search instead of data search. As we've discovered, CDS isn't a perfect fit (the queries are very different – focused on clinical summaries).
The Genomics track isn't a perfect fit either, but may be closer than some of the medical/health records collections.
Take a look a the Genomics track guidelines and data:
http://trec.nist.gov/data/genomics.html
Do you think this is a reasonable baseline for comparison to BioCADDIE? Are there any issues with comparing results from the two?
Write up a summary of your findings in this ticket or in the Wiki. If it seems like a reasonable fit, create a new JIRA ticket describing what needs to be done to run the baselines.