Page History

...

a) Lucene Run (lucene-output)

Using biocaddie_all indexes

No Format
cd ~/biocaddie baselines/new/<model>-lucene.sh <topics> <subset> <col>\| parallel -j 20 bash -c "{}" baselines/new/<model>-lucene.sh <topics> <subset> <col> <year>\| parallel -j 20 bash -c "{}"

Eg: baselines/new/dir-lucene.sh short test biocaddie| parallel -j 20 bash -c "{}"

b) Evaluation and Cross-validation (lucene-eval, loocv)

No Format
cd ~/biocaddie scripts/new/mkeval-lucene.sh <model> <topics> <subset> <col> scripts/new/mkeval-lucene.sh <model> <topics> <subset> <col> <year>

Eg: scripts/new/mkeval-lucene.sh dir short test biocaddie

c) Compare models

We have to input running method for comparison:

0 - both from and to models are from Indri run

1 - both from and to models are from Lucene run

2 - from model is from Indri run, to model is from Lucene run

baselines/new/tfidf-lucene.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/jm-lucene.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/bm25-lucene.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/rocchio-lucene.sh short test biocaddie| parallel -j 20 bash -c "{}"

Using biocaddie_all.snowball indexes 3 - from model is from Lucene run, to model is from Indri run

No Format
cd ~/biocaddie Rscript scriptsbaselines/new/compare.R <subset> <from> <to> <topics> <col> Rscript scripts/new/compare.R <subset> <from> <to> <topics> <col> <year>

Eg: Rscript scripts/new/compare.R test tfidf dir short biocaddie

4. Results

Using biocaddie_all indexes.

...

0.3675

(p-value=0.0548)

...

0.6163+

(p-value=0.0502)

...

Sweep mu

...

0.6417

(p-value= p= 0.0533)

...

<model>-lucene-snowball.sh <topics> <subset> <col>| parallel -j 20 bash -c "{}"
baselines/new/<model>-lucene-snowball.sh <topics> <subset> <col> <year>| parallel -j 20 bash -c "{}"

Eg: baselines/new/dir-lucene-snowball.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/tfidf-lucene-snowball.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/jm-lucene-snowball.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/bm25-lucene-snowball.sh short test biocaddie| parallel -j 20 bash -c "{}"
baselines/new/rocchio-lucene-snowball.sh short test biocaddie| parallel -j 20 bash -c "{}"

b) Evaluation and Cross-validation (lucene-eval, loocv)

No Format
cd ~/biocaddie scripts/new/mkeval-lucene.sh <model> <topics> <subset> <col> scripts/new/mkeval-lucene.sh <model> <topics> <subset> <col> <year>

Eg: scripts/new/mkeval-lucene.sh dir short test biocaddie
scripts/new/mkeval-lucene.sh tfidf short test biocaddie
       scripts/new/mkeval-lucene.sh jm short test biocaddie
       scripts/new/mkeval-lucene.sh bm25 short test biocaddie
   scripts/new/mkeval-lucene.sh rocchio short test biocaddie

   scripts/new/mkeval-lucene.sh dir-snowball short test biocaddie
scripts/new/mkeval-lucene.sh tfidf-snowball short test biocaddie
       scripts/new/mkeval-lucene.sh jm-snowball short test biocaddie
       scripts/new/mkeval-lucene.sh bm25-snowball short test biocaddie
   scripts/new/mkeval-lucene.sh rocchio-snowball short test biocaddie

c) Compare models

We have to input running method for comparison:

0 - both from and to models are from Indri run

1 - both from and to models are from Lucene run

2 - from model is from Indri run, to model is from Lucene run

3 - from model is from Lucene run, to model is from Indri run

No Format
cd ~/biocaddie Rscript scripts/new/compare.R <subset> <from> <to> <topics> <col> Rscript scripts/new/compare.R <subset> <from> <to> <topics> <col> <year>

Eg: Rscript scripts/new/compare.R test tfidf dir short biocaddie

Rscript scripts/new/compare.R test tfidf-snowball dir-snowball short biocaddie

4. Results

...

Using biocaddie_all indexes.

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
classic tfidf	0.

3375

3282

0.

5944

5824

0.

6667

6867

0.

5256

5478

0.

4987

5013

0.

5002

5018

No parameters

07/

06/17BM250.3764+0.6239+0.73+0.6006+0.5413+0.539+Sweep b, k107/06/17QL (JM)0.34480.60580.670.5813+0.49870.5289+Sweep lambda07/06/17QL (Dir)

0.3776+

0.6315+

0.70330.6006+0.53070.5365+

Sweep mu

07/06/17Rocchio0.3959

0.6052

0.72670.598+0.54530.525Sweep b, k1, fbTerms, fbDocs, fbOrigWeight07/06/17

Verification

05/17
BM25	0.3543	0.6105+	0.7467+	0.5917+	0.506	0.5186	Sweep b, k1	07/05/17
QL (JM)	0.3382	0.6022	0.7233	0.571	0.5	0.4996	Sweep lambda	07/05/17
QL (Dir)	0.3675 (p-value=0.0548)	0.6163+ (p-value=0.0502)	0.6567	0.5664	0.5213	0.522	Sweep mu	07/05/17
Rocchio	0.4044+	0.6417 (p-value=0.0533)	0.6967	0.5403	0.492	0.4912	Sweep b, k1, fbTerms, fbDocs, fbOrigWeight	07/05/17

Using biocaddie_all.snowball indexes

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
classic tfidf (tfidf-snowball)	0.3375	0.5944	0.6667	0.5256	0.4987	0.5002	No parameters	07/06/17
BM25 (bm25-snowball)	0.3764+	0.6239+	0.73+	0.6006+	0.5413+	0.539+	Sweep b, k1	07/06/17
QL (JM) (jm-snowball)	0.3448	0.6058	0.67	0.5813+	0.4987	0.5289+	Sweep lambda	07/06/17
QL (Dir) (dir-snowball)	0.3776+	0.6315+	0.7033	0.6006+	0.5307	0.5365+	Sweep mu	07/06/17
Rocchio (rocchio-snowball)	0.3959	0.6052	0.7267	0.598+	0.5453	0.525	Sweep b, k1, fbTerms, fbDocs, fbOrigWeight	07/06/17

Difference between unstemmed and stemmed indexes

Model	MAP	NDCG	P@20	NDCG@20	P@100	NDCG@100	Notes	Date
classic tfidf	0.3282	0.5824	0.6867	0.5478	0.5013	0.5018	No parameters	07/10/17
classic tfidf (tfidf-snowball)	0.3375	0.5944	0.6667	0.5256	0.4987	0.5002	No parameters	07/10/17
BM25	0.3543	0.6105	0.7467	0.5917	0.506	0.5186	Sweep b, k1	07/10/17
BM25 (bm25-snowball)	0.3764+	0.6239	0.73	0.6006	0.5413+	0.539+	Sweep b, k1	07/10/17
QL (JM)	0.3382	0.6022	0.7233	0.571	0.5	0.4996	Sweep lambda	07/10/17
QL (JM) (jm-snowball)	0.3448	0.6058	0.67	0.5813	0.4987	0.5289+	Sweep lambda	07/10/17
QL (Dir)	0.3675	0.6163	0.6567	0.5664	0.5213	0.522	Sweep mu	07/10/17
QL (Dir) (dir-snowball)	0.3776	0.6315 (p-value=0.0534)	0.7033+	0.6006+	0.5307	0.5365	Sweep mu	07/10/17
Rocchio	0.4044	0.6417	0.6967	0.5403	0.492	0.4912	Sweep b, k1, fbTerms, fbDocs, fbOrigWeight	07/11/17
Rocchio (rocchio-snowball)	0.3959	0.6052-	0.7267	0.598+	0.5453+	0.525	Sweep b, k1, fbTerms, fbDocs, fbOrigWeight	07/11/17

Verification

Using biocaddie_all indexes:

No Format

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf dir short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.3675 p= 0.0548"
[1] "ndcg 0.5824 0.6163 p= 0.0502"
[1] "P_20 0.6867 0.6567 p= 0.9461"
[1] "ndcg_cut_20 0.5478 0.5664 p= 0.186"
[1] "P_100 0.5013 0.5213 p= 0.2168"
[1] "ndcg_cut_100 0.5018 0.522 p= 0.1401"


thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf jm short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.3382 p= 0.1719"
[1] "ndcg 0.5824 0.6022 p= 0.0932"
[1] "P_20 0.6867 0.7233 p= 0.0831"
[1] "ndcg_cut_20 0.5478 0.571 p= 0.145"
[1] "P_100 0.5013 0.5 p= 0.5301"
[1] "ndcg_cut_100 0.5018 0.4996 p= 0.5552"


thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf bm25 short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.3543 p= 0.0846"
[1] "ndcg 0.5824 0.6105 p= 0.0148"
[1] "P_20 0.6867 0.7467 p= 0.0491"
[1] "ndcg_cut_20 0.5478 0.5917 p= 0.0496"
[1] "P_100 0.5013 0.506 p= 0.428"
[1] "ndcg_cut_100 0.5018 0.5186 p= 0.2195"

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf rocchio short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.4044 p= 0.0188"
[1] "ndcg 0.5824 0.6417 p= 0.0533"
[1] "P_20 0.6867 0.6967 p= 0.3785"
[1] "ndcg_cut_20 0.5478 0.5403 p= 0.6276"
[1] "P_100 0.5013 0.492 p= 0.6184"
[1] "ndcg_cut_100 0.5018 0.4912 p= 0.6071"

Using biocaddie_all.snowball indexesUsing biocaddie_all indexes:

No Format

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf-snowball dir-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.32823375 0.36753776 p= 0.05480387"
[1] "ndcg 0.58245944 0.61636315 p= 0.05020072"
[1] "P_20 0.68676667 0.65677033 p= 0.94611042"
[1] "ndcg_cut_20 0.54785256 0.56646006 p= 0.1860046"
[1] "P_100 0.50134987 0.52135307 p= 0.21680652"
[1] "ndcg_cut_100 0.50185002 0.5225365 p= 0.14010207"


thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf-snowball jm-snowball short biocaddbiocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to              is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3375 0.3448 p= 0.2782"
[1] "ndcg 0.5944 0.6058 p= 0.2069"
[1] "P_20 0.6667 0.67 p= 0.475"
[1] "ndcg_cut_20 0.5256 0.5813 p= 0.0161"
[1] "P_100 0.4987 0.4987 p= 0.5"
[1] "ndcg_cut_100 0.5002 0.5289 p= 0.0117"

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf-snowball bm25-snowball short  iebiocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.32823375 0.33823764 p= 0.17190284"
[1] "ndcg 0.58245944 0.60226239 p= 0.0932011"
[1] "P_20 0.68676667 0.723373 p= 0.08310331"
[1] "ndcg_cut_20 0.54785256 0.5716006 p= 0.1450045"
[1] "P_100 0.50134987 0.55413 p= 0.53010326"
[1] "ndcg_cut_100 0.50185002 0.4996539 p= 0.55520149"


thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf bm25rocchio-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.35433959 p= 0.08460427"
[1] "ndcg 0.5824 0.61056052 p= 0.01482869"
[1] "P_20 0.6867 0.74677267 p= 0.04911189"
[1] "ndcg_cut_20 0.5478 0.5917598 p= 0.04960424"
[1] "P_100 0.5013 0.5065453 p= 0.4281152"
[1] "ndcg_cut_100 0.5018 0.5186525 p= 0.21952733"

Compare results between unstemmed and stemmed indexes:

No Format

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidf rocchiotfidf-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.3282 0.40443375 p= 0.01881463"
[1] "ndcg 0.5824 0.64175944 p= 0.05331454"
[1] "P_20 0.6867 0.69676667 p= 0.3785808"
[1] "ndcg_cut_20 0.5478 0.54035256 p= 0.62768715"
[1] "P_100 0.5013 0.4924987 p= 0.61845819"
[1] "ndcg_cut_100 0.5018 0.49125002 p= 0.6071"

Using biocaddie_all.snowball indexes

No Format

root@integration-1:~/biocaddie#5652"

thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidfdir dir-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.33753675 0.3776 p= 0.03870842"
[1] "ndcg 0.59446163 0.6315 p= 0.00720534"
[1] "P_20 0.66676567 0.7033 p= 0.10420011"
[1] "ndcg_cut_20 0.52565664 0.6006 p= 0.00460222"
[1] "P_100 0.49875213 0.5307 p= 0.06521942"
[1] "ndcg_cut_100 0.5002522 0.5365 p= 0.02070645"

root@integrationthphan@biocaddie-1:~/biocaddie#dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidfjm jm-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.33753382 0.3448 p= 0.27822603"
[1] "ndcg 0.59446022 0.6058 p= 0.20693358"
[1] "P_20 0.66677233 0.67 p= 0.4758885"
[1] "ndcg_cut_20 0.5256571 0.5813 p= 0.01612551"
[1] "P_100 0.49875 0.4987 p= 0.555"
[1] "ndcg_cut_100 0.50024996 0.5289 p= 0.01170026"

root@integrationthphan@biocaddie-1:~/biocaddie#dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidfbm25 bm25-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.33753543 0.3764 p= 0.02840317"
[1] "ndcg 0.59446105 0.6239 p= 0.0110945"
[1] "P_20 0.66677467 0.73 p= 0.03318548"
[1] "ndcg_cut_20 0.52565917 0.6006 p= 0.00452775"
[1] "P_100 0.4987506 0.5413 p= 0.03260209"
[1] "ndcg_cut_100 0.50025186 0.539 p= 0.01490441"


thphan@biocaddie-dev:/data/thphan/biocaddie$ Rscript scripts/new/compare.R test tfidfrocchio rocchio-snowball short biocaddie
Please enter run methods for comparison:
        0: both are Indri
        1: both are Lucene
        2: from is Indri, to is Lucene
        3: from is Lucene, to is Indri
1
[1] "map 0.33754044 0.3959 p= 0.07076841"
[1] "ndcg 0.59446417 0.6052 p= 0.40189667"
[1] "P_20 0.66676967 0.7267 p= 0.0752037"
[1] "ndcg_cut_20 0.52565403 0.598 p= 0.02310465"
[1] "P_100 0.4987492 0.5453 p= 0.11550035"
[1] "ndcg_cut_100 0.50024912 0.525 p= 0.27190625"

Space shortcuts

Page tree

Versions Compared

Old Version 10

New Version Current

Key