Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-958

Explore edge cases of example Rocchio implementation

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • None
    • None

      Currently, we are checking for very basic attributes during validation:

      • index and query must be explicitly specified
      • type and field must have values, which can be assumed if they are not passed
      • fbDocs and fbTerms must be positive integers
      • alpha and beta must be positive reals between 0 and 1
      • k1 and b must be positive reals

      We also check that the target index has been instantiated properly to run Rocchio:

      1. index exists and contains the expected type
      2. index.type contains a mapping (either under "_all" or under or target field) that enables term vectors by setting store=true

      Open questions:

      • Must alpha = 1 - beta? perhaps this is just convention I've seen elsewhere? (right now I make no assumptions about the two value being related, but this would remove the need to explicitly specify beta)
      • Is there a minimum fbTerms/fbDocs (perhaps 10?) under which the values would be nonsensical? (currently just checking that they are both >= 1)
      • Do k1 and b have upper/lower bounds? (these are easy to adjust in Rocchio.java)
      • Can any of these values ever be zero or negative? (my assumption is currently no)
      • Are there other things we should verify on the index settings? (ie check that target index has documents added?)
      • is #2 above sufficient? perhaps I should actually retrieve term vectors / field stats instead, to verify that they are accessible?

      This ticket is complete when we have discussed and explored the edge cases described above.

              Unassigned Unassigned
              lambert8 Sara Lambert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: