Page History

...

Using orig queries (pre-test queries included)

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
tfidf	0.2204	0.4538	0.2995	0.2904	0.1735	0.3376	Sweep b and k1	06/07/17
Okapi	0.2218	0.4557	0.2819-	0.3035	0.1717	0.3386	Sweep b, k1, k3	06/07/17
QL (JM)	0.1876-	0.4212-	0.2505-	0.2773	0.1403-	0.295-	Sweep lambda	06/07/17
QL (Dir)	0.2032-	0.4359-	0.2713-	0.2927	0.1633-	0.3304	Sweep mu	06/07/17
QL (TS)	0.2101-	0.4415-	0.2761-	0.3029	0.1638-	0.3277	Sweep mu and lambda	06/07/17
RM3	0.2618+	0.4592	0.3277+	0.2965	0.1913+	0.3662+	Sweep mu, fbDocs, fbTerms, and lambda	06/08/17

No Format

root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf dir orig 
[1] "map 0.2204 0.2032 p= 0.9988" 
[1] "ndcg 0.4538 0.4359 p= 0.9987" 
[1] "P_20 0.2995 0.2713 p= 0.9985" 
[1] "ndcg_cut_20 0.2904 0.2927 p= 0.417" 
[1] "P_100 0.1735 0.1633 p= 0.9945" 
[1] "ndcg_cut_100 0.3376 0.3304 p= 0.7764" 
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf jm orig 
[1] "map 0.2204 0.1876 p= 0.9966" 
[1] "ndcg 0.4538 0.4212 p= 0.9992" 
[1] "P_20 0.2995 0.2505 p= 0.9999" 
[1] "ndcg_cut_20 0.2904 0.2773 p= 0.8572" 
[1] "P_100 0.1735 0.1403 p= 1" 
[1] "ndcg_cut_100 0.3376 0.295 p= 0.9996" 
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf two orig 
[1] "map 0.2204 0.2101 p= 0.972" 
[1] "ndcg 0.4538 0.4415 p= 0.9859" 
[1] "P_20 0.2995 0.2761 p= 0.9954" 
[1] "ndcg_cut_20 0.2904 0.3029 p= 0.1072" 
[1] "P_100 0.1735 0.1638 p= 0.9992" 
[1] "ndcg_cut_100 0.3376 0.3277 p= 0.857" 
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf okapi orig 
[1] "map 0.2204 0.2218 p= 0.4445" 
[1] "ndcg 0.4538 0.4557 p= 0.414" 
[1] "P_20 0.2995 0.2819 p= 0.975" 
[1] "ndcg_cut_20 0.2904 0.3035 p= 0.1157" 
[1] "P_100 0.1735 0.1717 p= 0.6907" 
[1] "ndcg_cut_100 0.3376 0.3386 p= 0.4437"

Using short queries (pre-test queries not included)

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
tfidf	0.3188	0.6084	0.45	0.4255	0.2657	0.4625	Sweep b and k1	06/07/17
Okapi	0.3117	0.6044	0.4408	0.4277	0.261	0.4569	Sweep b, k1, k3	06/07/17
QL (JM)	0.2545-	0.5527-	0.3908-	0.3882-	0.2135-	0.3883-	Sweep lambda	06/07/17
QL (Dir)	0.2924-	0.5866-	0.3975	0.4018-	0.2492-	0.432-	Sweep mu	06/07/17
QL (TS)	0.2934-	0.5828-	0.4092-	0.4122	0.2508-	0.4385-	Sweep mu and lambda	06/07/17
RM3	0.3717+	0.6087	0.5067+	0.4529 (p-value: 0.0541)	0.291+	0.4934+	Sweep mu, fbDocs, fbTerms, and lambda	06/08/17

No Format

root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf dir short
[1] "map 0.3188 0.2924 p= 0.9997"
[1] "ndcg 0.6084 0.5866 p= 0.9994"
[1] "P_20 0.45 0.3975 p= 0.9998"
[1] "ndcg_cut_20 0.4255 0.4018 p= 0.9881"
[1] "P_100 0.2657 0.2492 p= 0.9947"
[1] "ndcg_cut_100 0.4625 0.432 p= 0.9999"
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf jm short
[1] "map 0.3188 0.2545 p= 1"
[1] "ndcg 0.6084 0.5527 p= 1"
[1] "P_20 0.45 0.3908 p= 0.9984"
[1] "ndcg_cut_20 0.4255 0.3882 p= 0.9973"
[1] "P_100 0.2657 0.2135 p= 1"
[1] "ndcg_cut_100 0.4625 0.3883 p= 1"
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf okapi short
[1] "map 0.3188 0.3117 p= 0.7974"
[1] "ndcg 0.6084 0.6044 p= 0.6834"
[1] "P_20 0.45 0.4408 p= 0.7506"
[1] "ndcg_cut_20 0.4255 0.4277 p= 0.4236"
[1] "P_100 0.2657 0.261 p= 0.791"
[1] "ndcg_cut_100 0.4625 0.4569 p= 0.747"
root@integration-1:~/biocaddie# Rscript scripts/compare_ohsumed.R combined tfidf two short
[1] "map 0.3188 0.2934 p= 1"
[1] "ndcg 0.6084 0.5828 p= 0.9997"
[1] "P_20 0.45 0.4092 p= 0.9989"
[1] "ndcg_cut_20 0.4255 0.4122 p= 0.89"
[1] "P_100 0.2657 0.2508 p= 0.9991"
[1] "ndcg_cut_100 0.4625 0.4385 p= 0.9992"

8. Comments:

BioCADDIE dataset contains descriptive metadata (structured and unstructured) of more than 1.5 millions documents from biomedical datasets. There are 20 queries which are manually refined and shortened including important keywords. Relevant judgements contains 3 categories 0-"not relevant", 1-"possibly relevant" and 2-"definitely relevant".

TREC CDS dataset is a collection of 733.328 full-text biomedical literature of journal articles. 30 topics are provided, each includes topic "description" (containing a complete account of the patients' visits, including details such as their vital statistics, drug dosages, etc) and topic "summary" (a simplified versions of the narratives that contain less irrelevant information). Queries are constructed by topic summaries. Similar to bioCADDIE, relevant judgements are divided into 3 categories 0-"not relevant", 1-"possibly relevant" and 2-"definitely relevant".

The OHSUMED test collection is a set of 348,566 references/documents from MEDLINE, the on-line medical information database, consisting of titles and/or abstracts from 270 medical journals. Compared to the two above collections, Ohsumed dataset is quite small. OHSUMED topics include 2 fields - tilte(patient description) and description (information request). Topic descriptions are selected to construct queries. Relevant judgements include 2 categories 1-"possibly relevant" and 2-"definitely relevant"

Based on the characteristics of 3 collections, TREC CDS is far different from bioCADDIE and Ohsumed as it uses full text search and its queries are patient visit record summary instead of common information queries. Ohsumed collection is closer to bioCADDIE in term of dataset similarity (non full-text). However, bioCADDIE queries are short keyword queries while OHSUMED queries are short verbose queries.

As per the baselines run results over all 3 collections, RM3 baselines generally perform well and consistent. Especially for TREC CDS and Ohsumed, RM3 gives best results for most of the metrics compared to other baselines. This was expected as RM3 based on Rocchio relevance feedback which can help to generate good query (query expansion) even we don’t know the collection well.

One surprising result was that Query likelihood baselines with smoothing (such as JM, Dir and TS) did not improve the retrieval results over TFIDF for any metrics in TREC CDS and Ohsumed collections as bioCADDIE or previous studies did (http://trec.nist.gov/pubs/trec23/papers/pro-UCLA_MII_clinical.pdf) or. However, type of queries could be an important factor that might cause the differences in retrieval results. This was also mentioned in the study of Zhai C (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.8978) that queries with only keywords tend to perform better than more verbose queries.

We tried to examine the difference in using verbose queries and keyword queries on Ohsumed collection.

Using original queries (verbose queries) for OHSUMED

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
tfidf	0.3188	0.6084	0.45	0.4255	0.2657	0.4625	Sweep b and k1	06/07/17
QL (JM)	0.2545-	0.5527-	0.3908-	0.3882-	0.2135-	0.3883-	Sweep lambda	06/07/17
QL (Dir)	0.2924-	0.5866-	0.3975	0.4018-	0.2492-	0.432-	Sweep mu	06/07/17
QL (TS)	0.2934-	0.5828-	0.4092-	0.4122	0.2508-	0.4385-	Sweep mu and lambda	06/07/17

Using manually refined queries (mostly keywords) for OHSUMED

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
tfidf	0.315	0.5949	0.4198	0.3802	0.2614	0.4454	Sweep b and k1	06/09/17
QL (JM)	0.2587-	0.5466-	0.3817-	0.3608	0.2257-	0.3806-	Sweep lambda	06/09/17
QL (Dir)	0.3027-	0.5883	0.4087	0.379	0.261	0.4333-	Sweep mu	06/09/17
QL (TS)	0.3052-	0.5871	0.4159	0.3896	0.2627	0.4354-	Sweep mu and lambda	06/09/17

We can see that when using keyword queries, the difference in retrieval results between tfidf and QL is smaller.

Specifically, tfidf performed worse for all metrics when using keyword queries than using verbose/original queries. QL (JM) also performed worse for NDCG, P@20, NDCG@20 and NDCG@100. However, QL (Dir) and QL (TS) performed better for most of the metrics. This matched with the finding in Zhai's study that JM works worst for short keywords queries but more effective when queries are verbose while Dir works better for concise keyword queries then verbose queries.

Also number of queries used for running baselines in each collection could be considered for the difference.

(to be continued)

Space shortcuts

Page tree

Versions Compared

Old Version 5

New Version 6

Key