I've update the ir-utils.jar to support Lucene 6.6 and updated the biocaddie pom.xml to use the updated dependencies. This will allow us to run comparisons between Indri and Lucene's implementations of various baselines using the BioCADDIE data. I've made some updates to the BioCADDIE repository, so you'll need to update your copy:
- git pull
- mvn clean package -U
I've added some baseline scripts to the lucene/ subdirectory for Lucene's Dirichlet, JM, BM25, and "classic" tfidf models.
We've created a Lucene 6.6 index on biocaddie.ndslabs.org under /data/biocaddie/lucene/biocaddie_all.6.6.0/. For now, just copy this to the shared directory on SDSC.
The main task is to run the usual baselines and comparison (run, mkeval, compare).
Completion criteria:
- Document your baseline results for Lucene Dir,LM,BM25,TFIDF in the wiki.
As always, let me know if any of this is unclear!