Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

3. Top k document statistics for train and test queries.

3.1. Get top k documents for each queries in train and test set and their statistics.

Run Rscript result_stats.R


No Format
cd ~/biocaddie
mkdir -p stats
Rscript scripts/result_stats.R <model> <collection> <topic> <k_value>


This script does 4 tasks:

  • load data from train/test qrel file into dataframe qrelsData

              Columns: "query", "ID", "docno", "relno", "rel"

              ***Note: "rel" value includes yes (if relno>0) and no (if relno=0)

  • load top k documents return from the outputs of IndriRunQuery (in ~/biocaddie/output/<model>/<collection>/<topic>) into dataframe output_df 

              Columns: "query", "ID", "docno", "topk", "score", "indri", "file" 

              ***Note: get top k documents by selecting records with topk <= k_value

                            "file" column is the output filename as each baseline run using different parameter combinations (Eg: sweeping mu for dir). Based on the parameter combination, top k documents will vary.

  • inner join/merge qrels (qrelsData) and top k output (output_df) on "query" and "docno" columns to get which documents are relevant, non-relevant or unjudged. Result is saved in dataframe topdocs

             Columns: "query", "docno", "topk", "file", "rel"

             ***Note: for records with "rel" value missing(NA), assign its value to "unjudged"

  • get number of judged (relevant = yes/no) documents and unjudged documents from top k documents for each query in each file. Result is saved in topdocs_stats dataframe and stats output file - ~/biocaddie/stats/stats.<model>.<collection>.<topic>.<k-value>.csv
       Columns: "file", "query", "rel", "docno"

             Eg: stats.dir.train.short.5.csv

No Format
root@integration-1:~/biocaddie/stats# head stats.dir.train.short.5.csv
"file","query","rel","docno"
"./output/dir/train/short/10000.out","EA1","unjudged",3
"./output/dir/train/short/10000.out","EA1","yes",2
"./output/dir/train/short/10000.out","EA2","no",2
"./output/dir/train/short/10000.out","EA2","unjudged",3
"./output/dir/train/short/10000.out","EA3","unjudged",5
"./output/dir/train/short/10000.out","EA4","no",1
"./output/dir/train/short/10000.out","EA4","unjudged",2
"./output/dir/train/short/10000.out","EA4","yes",2
"./output/dir/train/short/10000.out","EA5","unjudged",3


3.2. Visualize the statistics for top k documents.

Use R script result_plot.R


No Format
cd ~/biocaddie
mkdir -p plot
Rscript scripts/result_plot.R <model> <collection> <topic>


a. For each stats output file we get from 3.1, we can view the distribution of judged and unjudged documents for each query of each k-value from file ./biocaddie/plot/<model>.<collection>.<topic>.<k-value>.png

Eg: dir.test.short.50.png

Image Added

*** Note: Calculation of average judged/unjudged documents for each query for each k-value.

 For each baseline, collection, topic and k-value, we have multiple output files when sweeping different parameter combinations.

To get the average documents for each relevant category (yes, no, unjudged) for each query, get sum of #documents for each category of each query over all output files and divide by number of output files (for dir, there are 7 output files)

Eg: get average relevant document for train query EA1 when k-value=5 for dir baseline.

#relevant_docs=(2+3+3+3+2+3+3)/7=2.7143

No Format
root@integration-1:~/biocaddie/stats# cat stats.dir.train.short.5.csv | grep EA1 | grep yes
"./output/dir/train/short/10000.out","EA1","yes",2
"./output/dir/train/short/1000.out","EA1","yes",3
"./output/dir/train/short/2500.out","EA1","yes",3
"./output/dir/train/short/250.out","EA1","yes",3
"./output/dir/train/short/5000.out","EA1","yes",2
"./output/dir/train/short/500.out","EA1","yes",3
"./output/dir/train/short/50.out","EA1","yes",3


b. From the average judged/unjudged documents for each query and k-value in part a, get the average judged/unjudged documents over all queries for each k-value.

To do this, for each k-value, we sum up all the average documents of each relevant categories over all queries and divide by number of queries (Eg: 15 for test query set and 6 for train query set)

Results are saved in  ./biocaddie/plot/<model>.<collection>.<topic>.all.png

Below is the difference in judged/unjudged document distribution when sweeping k-value between train and test queries for dir, okapi and rm3 baselines.

As expected, outputs for train queries are dominated by unjudged documents (very few judgements available)