7/18
- Contract ends 7/30
- What's left
- ElasticSearch plugin – move repo (Mike)
- Testing – at least a manual test plan, automated would be great (Mike)
- PubMed ingest process (Craig)
- biocaddie + plugin repo release (Craig)
- Collect all data in place
- Documentation/presentation
- Bonus
- Parallel documentation
- Kubernetes review
- Publish data?
- Doc expansion on OHSUMED + Genomics (Garrick)
- Also PubMed expansion (Craig)
- "Priors" – if we wanted to implement priors in Lucene/ElasticSearch, how would we?
7/11
- Mike at PEARC this week; Thuong's last week
- Final deliverables:
- ElasticSearch plugin (NDS-868) and test process (NDS-956)
- PubMed ingest process (new)
- biocaddie repo release
- Documentation/whitepaper
- Results of comparative evaluation
- Indri v Lucene
- Baselines
- BM25, BM25+Rocchio, BM25+PubMed Rocchio
- Others
- Kubernetes + parallel
- Publish data?
- Report/paper points (ECIR/10-16-17;
- BioCADDIE
- Baseline results
- Query expansion and document expansion results
- Indri > Lucene/ElasticSearch
- Lucene's models aren't valid
- No built-in query expansion
- Limitations of the real-world search engine
- Test collection
- Train v test
- Short v orig
- Query characterization and QPP
- Other collections
- OHSUMED/TRECDS?/Genomics
- Infrastructure
- ir-tools/Maven
- Cross-validation
- Kubernetes/parellel
- BioCADDIE
...