Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Ability to create an index controlling for specific transformations (stemming, stopping, field storage, etc)
  • Ability to index standard TREC collection formats as well as the BioCADDIE JSON or XML data,  XML, HTML data etc.
  • Using a single index, ability to dynamically change retrieval models and parameters (i.e., IndriRunQuery)
  • Output in TREC format for evaluation using trec_eval and related tools
  • Ability to add new retrieval model implementations
  • Standard baselines for comparison
  • Handles standard TREC topic formats
  • Multi-threaded and distributed processing for parameter sweeps
    • Ideally, works with large collections, such as ClueWeb
  • Cross validation
  • Hypothesis/significance testing.
  • Query performance prediction: implement the basics

With Indri (and related tools) we can do the following:

...

  • Indexing parameters in XML format
  • Retrieval parameters in XML format
  • Index support for CACM, TRECAquaint, TRECNEWS, Tipster formats
  • In addition to Lucene similarities, BM25L, Okapi BM25, SMART BNNBNN
  • IndexerApp
  • RetrievalApp
  • RetrievalAppQueryExpansion

IR-Utils

The ir-utils project is maybe the best of both worlds – supporting evaluation using both Indri and Lucene. It's also a bit of a mess and missing things we've added on our own forks.

What it has:

  • Basic framework for running models with parameterization
  • A variety of scorers
  • Weak evaluation support (mainly use trec_eval)
  • Abstraction of Indri and Lucene indexes
  • Lucene indexer support with Trec, StreamCorpus, Wiki, Xml support
  • LuceneRunQuery, LuceneBuildIndex classes
  • Trec-formatted output
  • Feedback models

What is could have with a few PRs:

  • YAML-based collection/model parameterization framework
  • Multi-threaded query runner
  • Distributed query runner (via Mike's Kubernetes work)
  • Cross-validation framework
  • Permutation test (via Galago ireval)


Other notes

Re-reading Zhai's SLMIR, noticed different ranges for Okapi BM25 parameters.

...