7/11
- Mike out at PEARC this week; Thuong's last week
- Final deliverables:
- ElasticSearch plugin (NDS-868) and test process (NDS-956)
- ES Plugin + Test process
- PubMed ingest process (new)
- biocaddie repo release
- Documentation/whitepaper
- Results of comparative evaluation
- Indri v Lucene
- Baselines
- BM25, BM25+Rocchio, BM25+PubMed Rocchio
6/27
- Thuong's last day ~7/15; Garrick out next week; Craig out all next week
- Open discussion/status:
- Garrick: focusing on query expansion/Rocchio; how to make a plugin
- Mike: stress testing on Gluster/Kubernetes for BioCaddie; 4 large nodes;
- Thoung: re-ran baselines with test queries only; updated results; ran TREC Genomics 2006/7 baselines; compared to official results; started looking at Lucene baselines; runquery/mkeval/compare generalization
- Craig: merged LuceneRunQuery with 6.6 support; preliminary Rocchio implementation based on Garrick's work; QPP
- Revisit statement of work and task status (BioCADDIE)
- What we've done:
- Comparative evaluation of RM and Rocchio using BioCADDIE test collection
- Comparative evaluation of SDM
- Decided what to implement (ElasticSearch plugin, Rocchio expansion)
- Still need to do
- Implement actual plugin
- Implement PubMed OA index and ingest process (ElasticSearch)
- Testing (test plan, integration, performance, execution)
- Release packaging (in progress)
- Documentation
- What we can't do
- Analysis with respect to current pipeline (we never got it running)
- What we did that wasn't on the SOW
- Comparative evaluation with CDS, OHSUMED, Genomics
- Document expansion
- Train/test analysis
- Query performance prediction
- What we've done:
- Review "test" results + Genomics results
- A few open questions (why OKAPI is so bad on 2007; why 2006 results are better for LM than 2007)
- Remaining priorities
- From SOW
- Create ES plugin (
)Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key NDS-868 - Mike had an early prototype
Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key NDS-840 - Garrick implemented Rocchio/BM25 for Lucene (
)Jira server JIRA serverId b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca key NDS-829 - We have a rudimentary example, but now we need to implement.
- Mike had an early prototype
- Create ElasticSearch index for PubMed (NDS-876)
- Lucene baseline runs: Use LuceneRunQuery to run baselines for biocaddie (NDS-949)
- Lucene Rocchio runs: Once reviewed/merged, use LuceneRunQuery for Rocchio baselines for biocaddie
- Testing (Mike?)
- Release
- Documentation
- Create ES plugin (
- Other
- Create ElasticSearch index for Wikipedia
- Lucene baseline runs: Use LuceneRunQuery to run baselines for other collections
- Lucene Rocchio runs: Once reviewed/merged, use LuceneRunQuery for Rocchio baselines for other collections
- Audit/cleanup results: Review everything we've done, make sure we've run all models we want to
- Finalize QPP analysis
- Revisit repository priors
- From SOW
...