Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-949

Run Lucene baselines on BioCADDIE data

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None

      I've update the ir-utils.jar to support Lucene 6.6 and updated the biocaddie pom.xml to use the updated dependencies.  This will allow us to run comparisons between Indri and Lucene's implementations of various baselines using the BioCADDIE data. I've made some updates to the BioCADDIE repository, so you'll need to update your copy:

      • git pull
      • mvn clean package -U

      I've added some baseline scripts to the lucene/ subdirectory for Lucene's Dirichlet, JM, BM25, and "classic" tfidf models.

      We've created a Lucene 6.6 index on biocaddie.ndslabs.org under  /data/biocaddie/lucene/biocaddie_all.6.6.0/. For now, just copy this to the shared directory on SDSC.

      The main task is to run the usual baselines and comparison (run, mkeval, compare).

      Completion criteria:

      • Document your baseline results for Lucene Dir,LM,BM25,TFIDF in the wiki.

      As always, let me know if any of this is unclear!

       

              thphan2 Thuong Phan
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: