Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

7/11

  • Mike out at PEARC this week; Thuong's last week
  • Final deliverables:
    • ElasticSearch plugin (NDS-868) and test process (NDS-956)
    • ES Plugin + Test process
    • PubMed ingest process (new)
    • biocaddie repo release
    • Documentation/whitepaper
      • Results of comparative evaluation
      • Indri v Lucene
      • Baselines
      • BM25, BM25+Rocchio, BM25+PubMed Rocchio

6/27

  • Thuong's last day ~7/15; Garrick out next week; Craig out all next week
  • Open discussion/status:
    • Garrick: focusing on query expansion/Rocchio; how to make a plugin
    • Mike: stress testing on Gluster/Kubernetes for BioCaddie; 4 large nodes; 
    • Thoung: re-ran baselines with test queries only; updated results; ran TREC Genomics 2006/7 baselines; compared to official results; started looking at Lucene baselines; runquery/mkeval/compare generalization
    • Craig: merged LuceneRunQuery with 6.6 support; preliminary Rocchio implementation based on Garrick's work; QPP
  • Revisit statement of work and task status (BioCADDIE)
    • What we've done:
      • Comparative evaluation of RM and Rocchio using BioCADDIE test collection
      • Comparative evaluation of SDM
      • Decided what to implement (ElasticSearch plugin, Rocchio expansion)
    • Still need to do
      • Implement actual plugin
      • Implement PubMed OA index and ingest process (ElasticSearch)
      • Testing (test plan, integration, performance, execution)
      • Release packaging (in progress)
      • Documentation
    • What we can't do
      • Analysis with respect to current pipeline (we never got it running)
    • What we did that wasn't on the SOW
      • Comparative evaluation with CDS, OHSUMED, Genomics
      • Document expansion
      • Train/test analysis
      • Query performance prediction
  • Review "test" results + Genomics results
    • A few open questions (why OKAPI is so bad on 2007; why 2006 results are better for LM than 2007)
  • Remaining priorities
    • From SOW
      • Create ES plugin (
        Jira
        serverJIRA
        serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
        keyNDS-868
        )
        • Mike had an early prototype
          Jira
          serverJIRA
          serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
          keyNDS-840
        • Garrick implemented Rocchio/BM25 for Lucene (
          Jira
          serverJIRA
          serverIdb14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
          keyNDS-829
          )
        • We have a rudimentary example, but now we need to implement.
      • Create ElasticSearch index for PubMed (NDS-876)
      • Lucene baseline runs: Use LuceneRunQuery to run baselines for biocaddie (NDS-949)
      • Lucene Rocchio runs: Once reviewed/merged, use LuceneRunQuery for Rocchio baselines for biocaddie
      • Testing (Mike?)
      • Release
      • Documentation
    • Other
      • Create ElasticSearch index for Wikipedia
      • Lucene baseline runs: Use LuceneRunQuery to run baselines for other collections
      • Lucene Rocchio runs: Once reviewed/merged, use LuceneRunQuery for Rocchio baselines for other collections
      • Audit/cleanup results: Review everything we've done, make sure we've run all models we want to
      • Finalize QPP analysis
      • Revisit repository priors

...