BioCADDIE Planning

5/25/2017

Notes from NDS/BioCADDIE team meeting. This meeting is primarily to plan for the next sprint. The following are up for discussion:

Evaluation framework -- where should we go from here?
- Clean-up/prune ir-utils
- Lucene-centric evaluation
Distributed evaluation
Stemming in ES
Performance characterization (recommended by Kirk)
ES RM plugin
- Custom scoring exploration
VM resources:
- SDSC vs NCSA
- Shared data directories
New ideas?
- Boolean/"sufficient" query
- Structured search (using the document structure somehow)
- Try other collections (UMLS/MeSH, medical subsets)
- Analyze relevance judgments
- Compare baselines against medical collections
  - TREC CDS – uh, this is the PubMed Open Access collection...
  - CLEF eHealth
  - OHSUMED
- Cluster-based expansion models

Notes from BioCADDIE core developer meeting

Presented status update
BioCADDIE is running ES 1.7.5 in production, but more recent versions in development
Xiaoling emailed results from DataMed system for full test collection in TREC format.
Kirk suggested that we look at a fallback strategy – use one model for higher precision, another for long tail
- When does it work? What queries does it work for?
- Better characterization of what's working
- DataMed is a P@20 system, mainly
Gerard? has installed the current pipeline and will document. Maybe we can do the same.