We currently use Indri for our evaluation process. The goal of NDS-867 is to implement a similar evaluation framework based on ElasticSearch.
Requirements
Basic requirements for an evaluation framework:
- Ability to create an index controlling for specific transformations (stemming, stopping, field storage, etc)
- Ability to index standard TREC collection formats as well as the BioCADDIE JSON or XML data.
- Using a single index, ability to dynamically change retrieval models and parameters (i.e., IndriRunQuery)
- Output in TREC format for evaluation using trec_eval and related tools
- Ability to add new retrieval model implementations
- Standard baselines for comparison
- Handles standard TREC topic formats
- Multi-threaded and distributed processing for parameter sweeps
- Cross validation
- Hypothesis/significance testing.
With Indri (and related tools) we can do the following:
...
In short, it looks like there's been recent work to develop an evaluation framework around Lucene. We have some support for this in ir-utils, but it wasn't widely used (we've always used the Indri implementation for consistency). So we have a choice – work with the lucene4ir workshop code, which is open source but primarily developed for a single workshop. Or continue working in ir-utils, since that what we've got. In this case, we'd need to extend ir-utils to have improved support for Lucene similarities.
Lucene4IR Framework
Supports the following:
- Indexing parameters in XML format
- Retrieval parameters in XML format
- Index support for CACM, TRECAquaint, TRECNEWS, Tipster formats
- In addition to Lucene similarities, BM25L, Okapi BM25, SMART BNNBNN
- IndexerApp
- RetrievalApp
- RetrievalAppQueryExpansion
Other notes
Re-reading Zhai's SLMIR, noticed different ranges for Okapi BM25 parameters.
...