Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- BioCADDIE

Sprint:
NDS Sprint 27
Epic Link:
BioCADDIE

There are differences in the number of qrels for the train (EA1-EA6) and test (T1-T9) queries. It would be good to know whether the differences in the number of judgments is having a negative affect on our retrieval metrics.

I suggest you start with the following:

Find the exact numbers of judged relevant (qrel >= 1) and non-relevant (qrel = 0) documents for each query
For the usual baseline runs (QL, TF-IDF, Okapi, RM3), get the usual metrics (MAP, nDCG, P@20, nDCG@20, etc.) using just the EA queries and then just the T queries.
Calculate the variance of each metric for each type of query across runs. In R, you can use the var() function with a list of the metrics to calculate variance. For example, if 0.23 is MAP for EA queries under QL, 0.35 is MAP for EA queries under TF-IDF, etc.:

var(c(0.23, 0.35, 0.56, ...))

Record all of the above in the wiki

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

short_query_15_datamed_benchmark.txt
526 kB
12/Jun/17 6:13 PM

Assignee:: Garrick Sherman

Reporter:: Garrick Sherman

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 09/Jun/17 4:58 PM

Updated:: 20/Jun/17 4:03 PM

Resolved:: 20/Jun/17 4:03 PM

Estimated:

Remaining:

Logged:

Not Specified

Details

Description

Gliffy Diagrams

Attachments

Attachments

Activity

People

Dates

Time Tracking

Tasks