BioCADDIE Planning

6/27

Thuong's last day ~7/15; Garrick out next week.
Sprint 28 priorities
- Create ElasticSearch indexes for PubMed and Wikipedia
- Lucene baseline runs: Use LuceneRunQuery to run baselines for all collections for comparison
- Lucene Rocchio runs: Once reviewed/merged, use LuceneRunQuery for Rocchio baselines for all collections
- Plugin implementation: With the Rocchio implementation, it should be straightforward to finalize the ElasticSearch plugin
- Audit/cleanup results: Review everything we've done, make sure we've run all models we want to
- Finalize QPP analysis
- Revisit repository priors
Revisit statement of work and task status (BioCADDIE)
- What we've done:
  - Comparative evaluation of RM and Rocchio using BioCADDIE test collection
  - Comparative evaluation of SDM
  - Decided what to implement (ElasticSearch plugin, Rocchio expansion)
- Still need to do
  - Implement actual plugin
  - Implement PubMed OA index and ingest process (ElasticSearch)
  - Testing (test plan, integration, performance, execution)
  - Release packaging (in progress)
  - Documentation
- What we can't do
  - Analysis with respect to current pipeline (we never got it running)
- What we did that wasn't on the SOW
  - Comparative evaluation with CDS, OHSUMED, Genomics
  - Document expansion
  - Train/test analysis
  - Query performance prediction

6/20

Sprint 27 extended until June 23
ElasticSearch 1.7.5: plugin framework not working, will implement with newer ElasticSearch version for BioCADDIE deliverable.
Train/test query analysis, rerunning test queries only (NDS-939)
Rocchio expansion with Lucene
Query performance prediction/adaptive feedback
TREC Genomics baseline

6/13

Sprint 28 extended until June 23
Craig in Seattle
Dirichlet scorer
- Lucene does not support true language modeling. Index structure is designed for TFIDF/BM25
- We will abandon LM in Lucene and focus on Rocchio expansion
CDS/OHSUMED analysis

6/8/2017

Mike is on vacation
Craig in Seattle next week
Dirichlet scorer (NDS-914)
- Dense to get through
Boolean retrieval (NDS-912)
- Surprising result: RM3 did reasonably well
- Not pursue
TREC-CDS (NDS-917)
- Why does OKAPI do so poorly?
- RM3 is just as expected
- Conclusion:
OHSUMED (NDS-929)
- Surprising that LM is lower
- RM3 is better
- No judged non-relevant
- Why is TFIDF so much better?
Query performance prediction
- Craig to send QPP papers
Query characterization
- Garrick:
  - There are a couple of queries that are really similar – look at query pairs
- Error analysis
Sprint 27 tasks
- Differences in Qrels for example/test queries, we haven't looked at it
  - Analysis of variance of scores for example/test
- Error analysis
- More on query characterization
- More on QPP
- More on Lucene

5/25/2017

Notes from NDS/BioCADDIE team meeting. This meeting is primarily to plan for the next sprint. The following are up for discussion:

Evaluation framework -- where should we go from here?
- Clean-up/prune ir-utils
- Lucene-centric evaluation (lucene4ir)
- Improving the shell-script approach (balance understandability/simplicity with scale)
- Possible tasks:
  - Tie breaking
  - Retrieval models without rescoring
    - Hack Indri or extend Lucene
  - Extend Lucene
    - Dirichlet + TwoStage
    - RM/RM3
    - Is it KL
    - PLM
    - LDA
    - Kmeans
    - Handling priors
    - CER
Distributed evaluation (Kubernetes)
- Mike has a prototype working with hyperkube
- Comment about missing Okapi expansion
- Possible tasks:
  - Test on a real cluster via deploy-tools (NDS-hackathon project)
  - Provision attached storage for each node (already done with deploy-tools?)
  - How can we get data to and from all of the nodes (for prototype, manual is fine). Ideally, something similar to hdfsput hdfs get from hadoop.
  - Garrick: qrels/topics?
  - Explore AWS/GCE/Azure?
ES RM plugin
- Possible tasks:
  - 1.7.5 support!! (NDS-897)
  - Actually implement the plugin (NDS-868)
  - Custom scoring exploration (Garrick)
Stemming in ES (NDS-885)
- Create index both stemmed (Snowball) and unstemmed
VM resources:
- SDSC vs NCSA
- Shared data directories
Performance characterization (recommended by Kirk)
New ideas?
- Boolean/"sufficient" query - (Garrick)
  - Boolean queries in Indri Queries: scoreif
- Structured search (using the document structure somehow)
- Try other collections (UMLS/MeSH, medical subsets)
- Analyze relevance judgments
- Compare baselines against medical collections
  - TREC CDS – uh, this is the PubMed Open Access collection...
  - CLEF eHealth
  - OHSUMED
- Cluster-based expansion models
- Query performance prediction

Sprint 27 tasks

Thuong:
- Finalize stemming work
- TREC-CDS baseline runs
- Boolean/sufficient-query runs
Garrick
- Boolean/sufficient-query runs
- Lucene Dirichlet implementation
- Custom scoring exploration
- QPP
- ir-utils cleanup
Craig
- LOOCV tie-breaking
- Output performance characterization
- ir-utils evaluation framework
Mike
- 1.7.5 plugin support (NDS-897)
- Implement RM plugin (NSD-868)
- Distributed evaluation on real cluster (NDS-hackathon)
- Define process for copying index data to nodes. Ideally, similar to hadoop fs put
- Explore running on AWS/GCE or Azure

5/23/2017

Notes from BioCADDIE core developer meeting

Presented status update
BioCADDIE is running ES 1.7.5 in production, but more recent versions in development
Xiaoling emailed results from DataMed system for full test collection in TREC format.
Kirk suggested that we look at a fallback strategy – use one model for higher precision, another for long tail
- When does it work? What queries does it work for?
- Better characterization of what's working
- DataMed is a P@20 system, mainly
Gerard? has installed the current pipeline and will document. Maybe we can do the same.

Space shortcuts

Page tree

6/27

6/20

6/13

6/8/2017

5/25/2017

Sprint 27 tasks

5/23/2017