Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In trecgen2007.all.judgments.tsv.txt file: 


No Format
200 9063387 2059 1870 NOT_RELEVANT 
200 9063387 7300 1702 RELEVANT 
200 9063387 58122 4989 NOT_RELEVANT 
200 9063387 82135 1426 RELEVANT 
200 9063387 83588 3235 RELEVANT 
200 9063387 97901 27036 NOT_RELEVANT 

In trecgenomics-qrels.txt: 

No Format
root@integration-1:/data/trecgenomics/qrels# grep 9063387  trecgenomics-qrels.txt 
200     0       9063387 0 
200     0       9063387 2 
200     0       9063387 0 
200     0       9063387 2 
200     0       9063387 2 
200     0       9063387 0 

...

Also make a copy of the qrels at/data/trecgenomics/qrels 
 

5. IndriRunQuery - Output  

No Format
cd ~/biocaddie/baselines/trecgenomics 
./<model>.sh <topic> <collection> |parallel -j 20 bash -c "{}"  

Eg: 

./jm.sh orig combined| parallel -j 20 bash -c "{}"  
./dir.sh orig combined| parallel -j 20 bash -c "{}"  
./tfidf.sh orig combined| parallel -j 20 bash -c "{}"  
./two.sh orig combined| parallel -j 20 bash -c "{}"  
./okapi.sh orig combined| parallel -j 20 bash -c "{}"  
./rm3.sh orig combined| parallel -j 20 bash -c "{}"  

IndriRunQuery outputs for different baselines are stored at: 

/data/trecgenomics/output/tfidf/combined/orig 

/data/trecgenomics/output/dir/combined/orig 

/data/trecgenomics/output/okapi/combined/orig 

/data/trecgenomics/output/jm/combined/orig 

/data/trecgenomics/output/two/combined/orig 

/data/trecgenomics/output/rm3/combined/orig  

 

6. Cross-validation 

No Format
cd ~/biocaddie  
scripts/mkeval_trecgenomics.sh <model> <topics> <collection> 

Eg: scripts/mkeval_trecgenomics.sh tfidf orig combined 
 

7. Compare models 

No Format
cd ~/biocaddie   
Rscript scripts/compare_trecgenomics.R <collection> <from model> <to model> <topic> 

...

No Format
root@integration-1:~/biocaddie# Rscript scripts/compare_trecgenomics.R combined tfidf dir orig 
[1] "map 0.2465 0.2176 p= 0.9297" 
[1] "ndcg 0.528 0.4772 p= 0.9838" 
[1] "P_20 0.3361 0.3514 p= 0.2011" 
[1] "ndcg_cut_20 0.4077 0.4069 p= 0.5111" 
[1] "P_100 0.2081 0.1881 p= 0.9771" 
[1] "ndcg_cut_100 0.3915 0.3576 p= 0.885" 
root@integration-1:~/biocaddie# Rscript scripts/compare_trecgenomics.R combined tfidf two orig 
[1] "map 0.2465 0.2379 p= 0.7532" 
[1] "ndcg 0.528 0.5128 p= 0.8973" 
[1] "P_20 0.3361 0.3569 p= 0.1197" 
[1] "ndcg_cut_20 0.4077 0.437 p= 0.1039" 
[1] "P_100 0.2081 0.1986 p= 0.8416" 
[1] "ndcg_cut_100 0.3915 0.399 p= 0.3308" 
root@integration-1:~/biocaddie# Rscript scripts/compare_trecgenomics.R combined tfidf jm orig 
[1] "map 0.2465 0.2136 p= 0.996" 
[1] "ndcg 0.528 0.4771 p= 1" 
[1] "P_20 0.3361 0.3403 p= 0.4073" 
[1] "ndcg_cut_20 0.4077 0.3951 p= 0.7083" 
[1] "P_100 0.2081 0.1847 p= 0.9802" 
[1] "ndcg_cut_100 0.3915 0.3583 p= 0.9727" 
root@integration-1:~/biocaddie# Rscript scripts/compare_trecgenomics.R combined tfidf okapi orig 
[1] "map 0.2465 0.0666 p= 1" 
[1] "ndcg 0.528 0.2568 p= 1" 
[1] "P_20 0.3361 0.1389 p= 0.9999" 
[1] "ndcg_cut_20 0.4077 0.1393 p= 1" 
[1] "P_100 0.2081 0.0953 p= 0.9998" 
[1] "ndcg_cut_100 0.3915 0.1415 p= 1"