Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The following plots illustrate the effect of varying the Dirichlet mu parameter (a) and , RM3 fbOrigWeight parameter for RM3 (b), PubMed RM3 (c), and Wikipedia RM RM3 (d) on for NDCG@1000 and NDCG@20.  The zero line is the per-query cross-validated QL (Dirichlet) score for each topic. The box plots represent the variation in scores as the parameter changes.  For the RM3 models, all other parameters are fixed at their cross-validated values (i.e., only fbOrigWeight is changed). The X axis shows the BioCADDIE topics, the Y-axis is the difference in NDCG from the cross-validated QL/Dirichlet baseline. The boxplots represent one point per parameter value – blue for low and red for higher parameter values.  The green dots are the cross-validated fbOrigWeight values for the RM models.

...

(a) Dirichlet (mu)(b) RM3 (fbOrigWeight)(c) PubMed RM3 (fbOrigWeight)(d) Wikiepdia RM3 (fbOrigWeight)


In each of these cases, our current runs, for each held-out query the fbOrigWeight that controls the mixing of the original and feedback query is relatively fixed – learned on average from the training queries. In The values are relatively fixed.  In the next section, we explore whether we can reliably predict when to apply one model or another or to predict the fbOrigWeight mixing parameter via query performance prediction methods.

...

A central goal of query performance prediction or query difficulty estimation is to identify features, both pre- and post-retrieval, that can be used to predict the performance of a query. This is generally done by predicting average precision tfor use in model selection or to predict a parameter, such as the RM3 fbOrigWeight.   Unfortunately, there are no comprehensive reviews of predictor effectivenessA common approach is to predict some effectiveness metric (e.g., MAP/NDCG).  In the past, performance predictors were evaluated based on simple correlation with the target metric. Hauff et al (2009) argue that linear correlation coefficients are misleading and overstate performance. They instead focus on RMSE (through linear regression) for comparison of individual predictors and penalized regression for combinations of predictors.

For BioCADDIE, we are focused on expansion models and therefore are primarily concerned with adaptive feedback. Lv and Zhai's (2009) approach seems to be the most applicable – estimating the feedback mixing parameter per-query.  This will require the following:

  • A framework for implementing baseline and custom predictors (ir-utils or otherwise)
    • Preliminary implementation in edu/gslis/biocaddie/qpp/predictors
  • Ability to generate a set of pre- and post-retrieval predictor values for each query for multiple collections. This will output a matrix of queries to predictorsand predictors.
    • edu.gslis.biocaddie.qpp.RunQPP produces predictor matrix.
  • Calculate correlation (Pearson and Spearman), RMSE between the predictor and a given metric or parameter (eg., RM3 lambda)
  • Ability to select features (manually or automatically) and to construct a predictive model (i.e, regression) using one or more predictors.
    • Investigating penalized regression via R glmnet.
  • Evaluate the predictive model via cross validation.
    • Implemented preliminary draft of CrossValidateQPP class.

Adaptive feedback

One The approach explored by Lv and Zhai (2009) is to learn a model to predict the expansion mixing weight. They found six features to be predictive of the feedback weight in a linear model (in order of significance).  All of these predictors are post-retrieval predictors, some with significant overhead (marked with *).

  • Topic model clarity (FBEnt_R3^R3*): Relative entropy of the feedback document topic model to the collection.
  • Exponentiated feedback clarity (FBEnt_R2^R2*): Exponentiated relative entropy of feedback documents to the collection
  • Divergence (QFBDiv_A): KL-divergence of query and feedback documents
  • Feedback radius (FBRadius): average divergence between each document and the centroid of the feedback documents.
  • Query clarity (QEnt_R1): Relative entropy of the query compared to the collection
  • Log query clarity (QEnt_R3): Log of the relative entropy of the query compared to the collection

...



There are a variety of other predictors, such as those discussed in Carmel and Yom-Tov's (2010) monograph. These include:

...

  • Clarity (Cronen-Townshend, 2002)
  • Drift ( Shtok, Kurland, Carmel, 2009)
  • Deviation (Perez-Iglesia and Araujo, 2010)Absolute/relative divergence (Lv & Zhai)

Preliminary results

NDCG

The following predictors were found to be significant from (overfitted) analysis of test data. These are predictors for fbOrigWeight 

  • deviation (R^2=0.49, p<0.001)
  • varSCQ (R^2=0.37, p <0.01)
  • drift (R^2=0.37, p<0.01)

The following linear models were found to have the best fit for the RM3 fbOrigWeight:

  • fbOrigWeight^(1/2) ~ drift + varSCQ  (R^2=0.6269, p<0.0001)
  • fbOrigWeight^(1/2) ~ drift + deviation (R^2=0.5801, p<0.001)
  • fbOrigWeight^(1/2)  ~ varSCQ + deviation(R^2=0.6496, p<0.001)

Using the "varSCQ + deviation" model to estimate fbOrigWeight for the held-out query using leave-one-query-out crossvalidation, we see NDCG=0.59 compared to NDCG=0.56. This is a significant difference with 1-tailed t-test and p < 0.01

NDCG@20

NDCG@20 is a bit of a challenge. Only one standard predictor, devation, was found to be a significant predictor:

  • deviation (R^20.31, p<0.01)

With two predictors, again deviation and varSCQ provide the best fit:

  • varSCQ ~ deviation (R^2 0.43, p<0.01)

Using the two-predictor model to estimate fbOrigWeight, we see NDCG@20=0.53 versus 0.52.  There is no significant difference between these two.

References



References

Carmel, D., & Yom-Tov, E. (2010). Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1), 1–89. http://doi.org/10.2200/S00235ED1V01Y201004ICR015

Cronen-Townsend, S., Zhou, Y., & Croft, W. B. (2002). Predicting Query Performance. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 299–306). New York, NY, USA: ACM. http://doi.org/10.1145/564376.564429

Hauff, C., Azzopardi, L., & Hiemstra, D. (2009). The Combination and Evaluation of Query Performance Prediction Methods. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (pp. 301–312). Berlin, Heidelberg: Springer-Verlag. Carmel, D., & Yom-Tov, E. (2010). Estimating the Query Difficulty for Information Retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1), 1–89. http://doi.org/10.2200/S00235ED1V01Y201004ICR0151007/978-3-642-00958-7_28

Lv, Y., & Zhai, C. (2009). Adaptive Relevance Feedback in Information Retrieval. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (pp. 255–264). New York, NY, USA: ACM. http://doi.org/10.1145/1645953.1645988

Zhao, Y., Scholer, F., & Tsegay, Y. (2008). Effective Pre-retrieval Query Performance Prediction Using Similarity and Variability Evidence. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, & R. W. White (Eds.), Advances in Information Retrieval: 30th European Conference on IR Research, ECIR 2008, Glasgow, UK, March 30-April 3, 2008. Proceedings (pp. 52–64). Berlin, Heidelberg: Springer Berlin Heidelberg. http://doi.org/10.1007/978-3-540-78646-7_8