Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Also make a copy at /data/trecgenomics/data/ 
 

2. Indexes (/shared/trecgenomics/indexes/trecgenomics_all) 

...

Also make a copy at /data/trecgenomics/indexes/trecgenomics_all 

3. Queries 

#download topics to /shared/trecgenomics/queries folder 

...

Also make a copy of the query at /data/trecgenomics/queries  

4. Qrels 

#download qrels to /shared/trecgenomics/qrels folder 

No Format
wget http://skynet.ohsu.edu/trec-gen/data/2007/trecgen2007.all.judgments.tsv.txt 

#convert qrels into correct format for trec_eval (add in 0 in second column, replace NOT_RELEVANT with 0 and RELEVANT with 2, remove columns 4 and 5) 

No Format
grep -v "#" /shared/trecgenomics/qrels/trecgen2007.all.judgments.tsv.txt | sed -e 's/\tRELEVANT/\t2/g' -e 's/\tNOT_RELEVANT/\t0/g' -e 's/\t/\t0\t/1' | cut -f 1,2,3,6 > trecgenomics-qrels.txt  

...

The relevant judgements generated above contain duplicate values such as a document for a query might have multiple judgements (RELEVENT/NON-RELEVANT) based on the document's maximum-length span. 

Eg: In trecgen2007.all.judgments.tsv.txt file: 

...

Also make a copy of the qrels at/data/trecgenomics/qrels 
 

5. IndriRunQuery - Output  

No Format
cd ~/biocaddie/baselines/trecgenomics 
./<model>.sh <topic> <collection> |parallel -j 20 bash -c "{}"  

...

/data/trecgenomics/output/rm3/combined/orig  

 

6. Cross-validation 

No Format
cd ~/biocaddie  
scripts/mkeval_trecgenomics.sh <model> <topics> <collection> 

Eg: scripts/mkeval_trecgenomics.sh tfidf orig combined 
 

7. Compare models 

No Format
cd ~/biocaddie   
Rscript scripts/compare_trecgenomics.R <collection> <from model> <to model> <topic> 

...