Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-951

Generalize run/mkeval/compare scripts

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None

      The "run", mkeval and compare scripts currently take a set of arguments:

      • Model: retrieval model (i.e., dir, rm3, jm, pubmed, etc)
      • Topics: name of the topics file (short/orig/stopped, etc)
      • Collection: name of the "collection" – for biocaddie this is "combined" v "test"

      These variables are used to standardize paths and filenames and allow us to do things like easily compare dir/short/test to rm3/short/test for a single collection.

      In these scripts, the index is hardcoded.

      For new test collections, such as OHSUMED, Genomics, and CDS, we've created subdirectories under the baselines directory and copied the baseline scripts, changing the index path and sometimes the topics file name/format. We've also copied the mkeval.sh and compare.R scripts for each collection.

      Let's consider how we can generalize this to have a single set of scripts that work for all of the test collections.  Ideally, there would only be one "dir.sh", "mkeval.sh" and "compare.R" that works for everything. 

       

      My motivation in asking for this is mostly for the Lucene runs.  It will be necessary for us to compare how the Lucene runs compare to Indri runs for the different models, topics, and collections.

      Another motivation is to have fewer scripts to maintain, since much of the content of the scripts is the same.

      It will probably make sense to add another parameter to each script to differentiate biocaddie from ohsumed, genomics, and treccds. You might use the "col" variable for this and then come up with another name to distinguish between the combined/test within biocaddie.  It will also likely make sense to externalize the index paths, so they aren't maintained in each file.

      Of course, this might also not be worth the effort if it takes too much time, so feel free to push back.

       

       

              willis8 Craig Willis
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: