Baselines
The following baselines were run using the combined, short topics on the biocaddie_all index using the baselines/ scripts to sweep parameter combinations and the mkeval.sh script to perform LOOCV for each desired metric.
Model | MAP | NDCG | P@20 | NDCG@20 | P@100 | NDCG@100 | Notes |
---|---|---|---|---|---|---|---|
TFIDF | 0.2524 | 0.459 | 0.4643 | 0.4144 | 0.3638 | 0.3973 | Sweep b and k1 |
Okapi | 0.2548 | 0.466 | 0.5+ | 0.4414+ | 0.3419 | 0.3981 | Sweep b, k1, k3 |
QL (JM) | 0.2573 | 0.5159 | 0.5524+ | 0.4886+ | 0.3752 | 0.4249 | Sweep lambda |
QL (Dir) | 0.2837+ | 0.5306+ | 0.5667+ | 0.5054+ | 0.3981 | 0.4541+ | Sweep mu |
QL (TS) | 0.2794 | 0.5391+ | 0.531 | 0.4702 | 0.391 | 0.4455 | Sweep mu and lambda |
+ indicates significant improvement over TFIDF baselines (p < 0.05)
Feedback/SDM models
Model | MAP | NDCG | P@20 | NDCG@20 | P@100 | NDCG@100 | Notes |
---|---|---|---|---|---|---|---|
QL (Dir) | 0.2837 | 0.5306 | 0.5667 | 0.5054 | 0.3981 | 0.4541 |
|
RM3 | 0.2982 | 0.5381 | 0.5476 | 0.5054 | 0.4038 | 0.4768 | Sweep mu, fbDocs, fbTerms, and lambda |
Pubmed | 0.3076+ | 0.5853+ | 0.5381 | 0.4855 | 0.4248 | 0.4501 | Use mu=2500 for pubmed query sweeping fbDocs, fbTerms, and lambda. Sweep mu for final retrieval. |
Wikipedia | 0.2947 | 0.5956+ | 0.5738 | 0.4956 | 0.4062 | 0.4487 | Use mu=2500 for wikipedia query sweeping fbDocs, fbTerms, and lambda. Sweep mu for final retrieval. |
SDM | 0.2874 | 0.5558 | 0.4833 | 0.4706 | 0.4019 | 0.4559 | Sweep mu, w1, w2, w3 |
+ indicates significant improvement over QL (Dir) baselines (p < 0.05)