...
- Indexing parameters in XML format
- Retrieval parameters in XML format
- Index support for CACM, TRECAquaint, TRECNEWS, Tipster formats
- In addition to Lucene similarities, BM25L, Okapi BM25, SMART BNNBNN
- IndexerApp
- RetrievalApp
- RetrievalAppQueryExpansion
IR-Utils
The ir-utils project is maybe the best of both worlds – supporting evaluation using both Indri and Lucene. It's also a bit of a mess and missing things we've added on our own forks.
What it has:
- Basic framework for running models with parameterization
- A variety of scorers
- Weak evaluation support (mainly use trec_eval)
- Abstraction of Indri and Lucene indexes
- Lucene indexer support with Trec, StreamCorpus, Wiki, Xml support
- LuceneRunQuery, LuceneBuildIndex classes
- Trec-formatted output
- Feedback models
What is could have with a few PRs:
- YAML-based collection/model parameterization framework
- Multi-threaded query runner
- Distributed query runner (via Mike's Kubernetes work)
- Cross-validation framework
- Permutation test (via Galago ireval)
Other notes
Re-reading Zhai's SLMIR, noticed different ranges for Okapi BM25 parameters.
...