...
- Ability to create an index controlling for specific transformations (stemming, stopping, field storage, etc)
- Ability to index standard TREC collection formats as well as the BioCADDIE JSON, XML, HTML data etc.
- Using a single index, ability to dynamically change retrieval models and parameters (i.e., IndriRunQuery)
- Output in TREC format for evaluation using trec_eval and related tools
- Ability to add new retrieval model implementations
- Standard baselines for comparison
- Handles standard TREC topic formats
- Multi-threaded and distributed processing for parameter sweeps
- Ideally, works with large collections, such as ClueWeb
- Cross validation
- Hypothesis/significance testing.
- Query performance prediction: implement the basics
With Indri (and related tools) we can do the following:
...