Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

We currently use Indri for our evaluation process. The goal of NDS-867 is to implement a similar evaluation framework based on ElasticSearch.

Requirements

Basic requirements for an evaluation framework:

  • Ability to create an index controlling for specific transformations (stemming, stopping, field storage, etc)
  • Ability to index standard TREC collection formats as well as the BioCADDIE JSON,  XML, HTML data etc.
  • Using a single index, ability to dynamically change retrieval models and parameters (i.e., IndriRunQuery)
  • Output in TREC format for evaluation using trec_eval and related tools
  • Ability to add new retrieval model implementations
  • Standard baselines for comparison
  • Handles standard TREC topic formats
  • Multi-threaded and distributed processing for parameter sweeps
    • Ideally, works with large collections, such as ClueWeb
  • Cross validation
  • Hypothesis/significance testing.
  • Query performance prediction: implement the basics

With Indri (and related tools) we can do the following:

...

Also worth a read:  Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR)

In short, it looks like there's been recent work to develop an evaluation framework around Lucene.  We have some support for this in ir-utils, but it wasn't widely used (we've always used the Indri implementation for consistency). So we have a choice – work with the lucene4ir workshop code, which is open source but primarily developed for a single workshop. Or continue working in ir-utils, since that what we've got. In this case, we'd need to extend ir-utils to have improved support for Lucene similarities.

Lucene4IR Framework

Supports the following:

  • Indexing parameters in XML format
  • Retrieval parameters in XML format
  • Index support for CACM, TRECAquaint, TRECNEWS, Tipster formats
  • In addition to Lucene similarities, BM25L, Okapi BM25, SMART BNNBNN
  • IndexerApp
  • RetrievalApp
  • RetrievalAppQueryExpansion

IR-Utils

The ir-utils project is maybe the best of both worlds – supporting evaluation using both Indri and Lucene. It's also a bit of a mess and missing things we've added on our own forks.

What it has:

  • Basic framework for running models with parameterization
  • A variety of scorers
  • Weak evaluation support (mainly use trec_eval)
  • Abstraction of Indri and Lucene indexes
  • Lucene indexer support with Trec, StreamCorpus, Wiki, Xml support
  • LuceneRunQuery, LuceneBuildIndex classes
  • Trec-formatted output
  • Feedback models

What is could have with a few PRs:

  • YAML-based collection/model parameterization framework
  • Multi-threaded query runner
  • Distributed query runner (via Mike's Kubernetes work)
  • Cross-validation framework
  • Permutation test (via Galago ireval)


Other notes

Re-reading Zhai's SLMIR, noticed different ranges for Okapi BM25 parameters.

...