You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

(This is a description of my current evaluation process, mainly for review).

Overview

The evaluation process consists of the following steps:

  • Build BioCADDIE index
  • Run baseline models with parameter sweeps
  • Run our models with parameter sweeps
  • Run leave-one-query-out cross validation
  • Compare results of cross-validation using ttest

Some details:

Working directory is assumed to be:

biocaddie.ndslabs.org:/data/willis8/bioCaddie

Java classes are in https://github.com/craig-willis/biocaddie

Building the index

Currently using IndriBuild index.

Convert documents to TREC text format for indexing. This is the process used to produce 

scripts/dats2trec.sh

Note, the edu.gslis.biocaddie.util.DATSToTrecText class will operate on all fields or a subset of fields (title, description).  At this point, I'm using all.

The output of this process is the file:

/data/willis8/bioCaddie/data/biocaddie_all.txt

Use IndriBuildIndex to construct the index:

IndriBuildIndex build_index.all.params

Run baseline models

I have scripts that sweep parameters for several baseline models under the baselines/ directory:

  • dir.sh: LM/Dirichlet
  • jm.sh: LM/Jelinek-Mercer
  • okapi.sh: Indri's Okapi implementation
  • rm3.sh: Indri's RM3 implementation
  • tfidf: Indri's TFIDF baseline
  • two.sh: LM/Two-stage smoothing

Each scripts takes two arguments:

  • topics: orig, short, stopped
  • collection: combined, train, test

Each of these scripts produces a set of TREC-formatted output files under the following directory structure:

  • output
    • model (dir, jm, okapi, rm3, tfidf, two)
      • collection (combined, train, test)
        • topics (orig, short, stopped)

Cross validation:

The "mkeval.sh" script generates trec_eval -c -q -m all_trec formatted output for each parameter combination. For example:

./mkeval.sh dir short combined 

Produces:

eval/dir/combined/short

With one file per parameter combination.

The script then runs a simple leave-one-query-out CrossValidation utility optimizing for multiple metrics (map, ndcg, ndcg_cut_20, p_20). This produces a set of output files in the loocv/ directory of the form:

model.collection.topics.metric.out

Comparing runs:

A simple R script compare.R reads the output from two models and compares across multiple metrics via ttest. For example:

Rscript compare.R combined tfidf dir short
[1] "map 0.2444 0.2776 p= 0.0257"
[1] "ndcg 0.4545 0.5252 p= 0.0356"
[1] "P_20 0.431 0.531 p= 0.0266"
[1] "ndcg_cut_20 0.3982 0.4859 p= 0.0161"

 

The columns are:

  • Metric
  • First model (tfidf)
  • Second model (dir)
  • p-value from one-tailed paired t-test (first model is < second model)

The first column is the metric, second is the first model (tfidf), third is the second model (dir) and fourth is the p-value.

 

  • No labels