1. Query stats:
Train queries (EA1 - EA6)
root@integration-1:~/biocaddie# Rscript scripts/qrels_stats.R train Rel Query Doccount 1 No EA1 19 2 Yes EA1 39 3 No EA2 69 4 Yes EA2 11 5 No EA3 46 6 Yes EA3 41 7 No EA4 46 8 Yes EA4 31 9 Yes EA5 94 10 No EA6 9 11 Yes EA6 78
Test queries (T1 - T15)
root@integration-1:~/biocaddie# Rscript scripts/qrels_stats.R test Rel Query Doccount 1 No T1 994 2 Yes T1 637 3 No T10 993 4 Yes T10 244 5 No T11 1520 6 Yes T11 234 7 No T12 732 8 Yes T12 126 9 No T13 1146 10 Yes T13 207 11 No T14 586 12 Yes T14 250 13 No T15 908 14 Yes T15 484 15 No T2 1029 16 Yes T2 39 17 No T3 657 18 Yes T3 595 19 No T4 1127 20 Yes T4 279 21 No T5 1376 22 Yes T5 84 23 No T6 872 24 Yes T6 392 25 No T7 1622 26 Yes T7 94 27 No T8 1657 28 Yes T8 77 29 No T9 1084 30 Yes T9 139
2. Baselines run results for train (EA) and test (T) queries
Using only train queries (EA1 - EA6)
metric | tfidf | dir | jm | two | okapi | rm3 | variance |
---|---|---|---|---|---|---|---|
map | 0.1707 | 0.1586 | 0.1463 | 0.1619 | 0.171 | 0.1518 | 9.919e-05 |
ndcg | 0.5167 | 0.5117 | 0.4951 | 0.5234 | 0.5237 | 0.5015 | 1.370e-04 |
P_20 | 0.2583 | 0.275 | 0.2167 | 0.275 | 0.2583 | 0.2167 | 7.211e-04 |
ndcg_20 | 0.3 | 0.2824 | 0.2299 | 0.2694 | 0.2824 | 0.2199 | 1.022e-03 |
P_100 | 0.1433 | 0.1567 | 0.1433 | 0.1483 | 0.1683+ | 0.1533 | 9.086e-05 |
ndcg_100 | 0.3166 | 0.3146 | 0.2925 | 0.318 | 0.3305 | 0.2901 | 2.501e-04 |
Using only test queries (T1- T15)
metric | tfidf | dir | jm | two | okapi | rm3 | variance |
---|---|---|---|---|---|---|---|
map | 0.3032 | 0.3284 | 0.2988 | 0.3261 | 0.2913 | 0.344 | 0.0004213 |
ndcg | 0.4348 | 0.544+ | 0.5242 | 0.545+ | 0.4466 | 0.5567+ | 0.0028851 |
P_20 | 0.5767 | 0.6833 | 0.6667 | 0.6667 | 0.6067+ | 0.73+ | 0.0030327 |
ndcg_20 | 0.4921 | 0.5923+ | 0.5829+ | 0.5549 | 0.488 | 0.6023+ | 0.0025603 |
P_100 | 0.4467 | 0.5013 | 0.4747 | 0.4853 | 0.4233 | 0.4787 | 0.0008037 |
ndcg_100 | 0.4293 | 0.5099+ | 0.4974 | 0.4906 | 0.4321 | 0.5334+ | 0.0017997 |
Result details (for verification):
root@integration-1:~/biocaddie# Rscript scripts/compare.R train tfidf dir short [1] "map 0.1707 0.1586 p= 0.7679" [1] "ndcg 0.5167 0.5117 p= 0.6238" [1] "P_20 0.2583 0.275 p= 0.2881" [1] "ndcg_cut_20 0.3 0.2824 p= 0.7313" [1] "P_100 0.1433 0.1567 p= 0.173" [1] "ndcg_cut_100 0.3166 0.3146 p= 0.5544" root@integration-1:~/biocaddie# Rscript scripts/compare.R train tfidf jm short [1] "map 0.1707 0.1463 p= 0.8619" [1] "ndcg 0.5167 0.4951 p= 0.8548" [1] "P_20 0.2583 0.2167 p= 0.8295" [1] "ndcg_cut_20 0.3 0.2299 p= 0.8995" [1] "P_100 0.1433 0.1433 p= 0.5" [1] "ndcg_cut_100 0.3166 0.2925 p= 0.7781" root@integration-1:~/biocaddie# Rscript scripts/compare.R train tfidf two short [1] "map 0.1707 0.1619 p= 0.7705" [1] "ndcg 0.5167 0.5234 p= 0.2322" [1] "P_20 0.2583 0.275 p= 0.3303" [1] "ndcg_cut_20 0.3 0.2694 p= 0.7444" [1] "P_100 0.1433 0.1483 p= 0.1816" [1] "ndcg_cut_100 0.3166 0.318 p= 0.4317" root@integration-1:~/biocaddie# Rscript scripts/compare.R train tfidf okapi short [1] "map 0.1707 0.171 p= 0.4889" [1] "ndcg 0.5167 0.5237 p= 0.3181" [1] "P_20 0.2583 0.2583 p= 0.5" [1] "ndcg_cut_20 0.3 0.2824 p= 0.7526" [1] "P_100 0.1433 0.1683 p= 0.0378" [1] "ndcg_cut_100 0.3166 0.3305 p= 0.0996" root@integration-1:~/biocaddie# Rscript scripts/compare.R train tfidf rm3 short [1] "map 0.1707 0.1518 p= 0.92" [1] "ndcg 0.5167 0.5015 p= 0.7868" [1] "P_20 0.2583 0.2167 p= 0.7907" [1] "ndcg_cut_20 0.3 0.2199 p= 0.9231" [1] "P_100 0.1433 0.1533 p= 0.1887" [1] "ndcg_cut_100 0.3166 0.2901 p= 0.9506"
root@integration-1:~/biocaddie# Rscript scripts/compare.R test tfidf dir short [1] "map 0.3032 0.3284 p= 0.1419" [1] "ndcg 0.4348 0.544 p= 0.0212" [1] "P_20 0.5767 0.6833 p= 0.0617" [1] "ndcg_cut_20 0.4921 0.5923 p= 0.0328" [1] "P_100 0.4467 0.5013 p= 0.0889" [1] "ndcg_cut_100 0.4293 0.5099 p= 0.032" root@integration-1:~/biocaddie# Rscript scripts/compare.R test tfidf jm short [1] "map 0.3032 0.2988 p= 0.5479" [1] "ndcg 0.4348 0.5242 p= 0.069" [1] "P_20 0.5767 0.6667 p= 0.0754" [1] "ndcg_cut_20 0.4921 0.5829 p= 0.0407" [1] "P_100 0.4467 0.4747 p= 0.2595" [1] "ndcg_cut_100 0.4293 0.4974 p= 0.0684" root@integration-1:~/biocaddie# Rscript scripts/compare.R test tfidf two short [1] "map 0.3032 0.3261 p= 0.1853" [1] "ndcg 0.4348 0.545 p= 0.0233" [1] "P_20 0.5767 0.6667 p= 0.1001" [1] "ndcg_cut_20 0.4921 0.5549 p= 0.13" [1] "P_100 0.4467 0.4853 p= 0.202" [1] "ndcg_cut_100 0.4293 0.4906 p= 0.0812" root@integration-1:~/biocaddie# Rscript scripts/compare.R test tfidf okapi short [1] "map 0.3032 0.2913 p= 0.8685" [1] "ndcg 0.4348 0.4466 p= 0.1854" [1] "P_20 0.5767 0.6067 p= 0.0349" [1] "ndcg_cut_20 0.4921 0.488 p= 0.61" [1] "P_100 0.4467 0.4233 p= 0.9528" [1] "ndcg_cut_100 0.4293 0.4321 p= 0.4053" root@integration-1:~/biocaddie# Rscript scripts/compare.R test tfidf rm3 short [1] "map 0.3032 0.344 p= 0.1121" [1] "ndcg 0.4348 0.5567 p= 0.016" [1] "P_20 0.5767 0.73 p= 0.0073" [1] "ndcg_cut_20 0.4921 0.6023 p= 0.0105" [1] "P_100 0.4467 0.4787 p= 0.2809" [1] "ndcg_cut_100 0.4293 0.5334 p= 0.0127"
root@integration-1:~/biocaddie/scripts# Rscript ./variance.R traindata metric tfidf dir jm two okapi rm3 variance 1 map 0.1707 0.1586 0.1463 0.1619 0.1710 0.1518 9.919e-05 2 ndcg 0.5167 0.5117 0.4951 0.5234 0.5237 0.5015 1.370e-04 3 P_20 0.2583 0.2750 0.2167 0.2750 0.2583 0.2167 7.211e-04 4 ndcg_20 0.3000 0.2824 0.2299 0.2694 0.2824 0.2199 1.022e-03 5 P_100 0.1433 0.1567 0.1433 0.1483 0.1683 0.1533 9.086e-05 6 ndcg_100 0.3166 0.3146 0.2925 0.3180 0.3305 0.2901 2.501e-04 root@integration-1:~/biocaddie/scripts# Rscript ./variance.R testdata metric tfidf dir jm two okapi rm3 variance 1 map 0.3032 0.3284 0.2988 0.3261 0.2913 0.3440 0.0004213 2 ndcg 0.4348 0.5440 0.5242 0.5450 0.4466 0.5567 0.0028851 3 P_20 0.5767 0.6833 0.6667 0.6667 0.6067 0.7300 0.0030327 4 ndcg_20 0.4921 0.5923 0.5829 0.5549 0.4880 0.6023 0.0025603 5 P_100 0.4467 0.5013 0.4747 0.4853 0.4233 0.4787 0.0008037 6 ndcg_100 0.4293 0.5099 0.4974 0.4906 0.4321 0.5334 0.0017997
3. Top k document statistics for train and test queries.
3.1. Get top k documents for each queries in train and test set and their statistics.
Run Rscript result_stats.R
cd ~/biocaddie mkdir -p stats Rscript scripts/result_stats.R <model> <collection> <topic> <k_value>
This script does 4 tasks:
- load data from train/test qrel file into dataframe qrelsData
Columns: "query", "ID", "docno", "relno", "rel"
***Note: "rel" value includes yes (if relno>0) and no (if relno=0)
- load top k documents return from the outputs of IndriRunQuery (in ~/biocaddie/output/<model>/<collection>/<topic>) into dataframe output_df
Columns: "query", "ID", "docno", "topk", "score", "indri", "file"
***Note: get top k documents by selecting records with topk <= k_value
"file" column is the output filename as each baseline run using different parameter combinations (Eg: sweeping mu for dir). Based on the parameter combination, top k documents will vary.
- inner join/merge qrels (qrelsData) and top k output (output_df) on "query" and "docno" columns to get which documents are relevant, non-relevant or unjudged. Result is saved in dataframe topdocs
Columns: "query", "docno", "topk", "file", "rel"
***Note: for records with "rel" value missing(NA), assign its value to "unjudged"
- get number of judged (relevant = yes/no) documents and unjudged documents from top k documents for each query in each file. Result is saved in topdocs_stats dataframe and stats output file - ~/biocaddie/stats/stats.<model>.<collection>.<topic>.<k-value>.csv
Columns: "file", "query", "rel", "docno"
Eg: stats.dir.train.short.5.csv
root@integration-1:~/biocaddie/stats# head stats.dir.train.short.5.csv "file","query","rel","docno" "./output/dir/train/short/10000.out","EA1","unjudged",3 "./output/dir/train/short/10000.out","EA1","yes",2 "./output/dir/train/short/10000.out","EA2","no",2 "./output/dir/train/short/10000.out","EA2","unjudged",3 "./output/dir/train/short/10000.out","EA3","unjudged",5 "./output/dir/train/short/10000.out","EA4","no",1 "./output/dir/train/short/10000.out","EA4","unjudged",2 "./output/dir/train/short/10000.out","EA4","yes",2 "./output/dir/train/short/10000.out","EA5","unjudged",3
3.2. Visualize the statistics for top k documents.
Use R script result_plot.R
cd ~/biocaddie mkdir -p plot Rscript scripts/result_plot.R <model> <collection> <topic>
a. For each stats output file we get from 3.1, we can view the distribution of judged and unjudged documents for each query of each k-value from file ./biocaddie/plot/<model>.<collection>.<topic>.<k-value>.png
Eg: dir.test.short.50.png
*** Note: Calculation of average judged/unjudged documents for each query for each k-value.
For each baseline, collection, topic and k-value, we have multiple output files when sweeping different parameter combinations.
To get the average documents for each relevant category (yes, no, unjudged) for each query, get sum of #documents for each category of each query over all output files and divide by number of output files (for dir, there are 7 output files)
Eg: get average relevant document for train query EA1 when k-value=5 for dir baseline.
#relevant_docs=(2+3+3+3+2+3+3)/7=2.7143
root@integration-1:~/biocaddie/stats# cat stats.dir.train.short.5.csv | grep EA1 | grep yes "./output/dir/train/short/10000.out","EA1","yes",2 "./output/dir/train/short/1000.out","EA1","yes",3 "./output/dir/train/short/2500.out","EA1","yes",3 "./output/dir/train/short/250.out","EA1","yes",3 "./output/dir/train/short/5000.out","EA1","yes",2 "./output/dir/train/short/500.out","EA1","yes",3 "./output/dir/train/short/50.out","EA1","yes",3
b. From the average judged/unjudged documents for each query and k-value in part a, get the average judged/unjudged documents over all queries for each k-value.
To do this, for each k-value, we sum up all the average documents of each relevant categories over all queries and divide by number of queries (Eg: 15 for test query set and 6 for train query set)
Results are saved in ./biocaddie/plot/<model>.<collection>.<topic>.all.png
Below is the difference in judged/unjudged document distribution when sweeping k-value between train and test queries for dir, okapi and rm3 baselines.
As expected, outputs for train queries are dominated by unjudged documents (very few judgements available)