Notes from attempt to move from Indri to Lucene for evaluation.
Similarity
- The LMSimilarity classes are basic QL, not KL.
- Similarities are used at both index and query time.
- At index time:
- computeNorm – stores per-document normalization value later used by getNormValues
- At query time
- computeWeight called once per query
- getValueForNormalization is query normalization, called once per query
- score() method called for each document
- exactSimScorer
- sloppySimScorer
- Document length is only accessible to Similarity (not for our re-ranking approach without explicitly storing as a field)
- For some reason Lucene JM is giving very different results than Indri JM.