Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-955

Run Lucene Rocchio baselines

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None

      I've merged the pull request with the initial Rocchio implementation. This isn't final, but I'd like to get a first pass.

      You'll need to update your Maven dependencies. You should be able to run "mvn package -U".  Make to make sure that you have the right version of irutils:

      ls -al ~/.m2/repository/edu/illinois/lis/ir-utils//0.2.0-SNAPSHOT/*

      You should see ir-utils-0.2.0-20170630.185648-5.jar. If not, delete the contents of the 0.2.0-SNAPSHOT directory and run mvn package again.

      I've also added the lucene/rocchio.sh runer to the biocaddie repo.  Like RM3, this will create a lot of output files (>9000). I recommend that you run this on the NCSA server – biocaddie.ndslabs.org – since it has 32 cores.  Because of disk space issues, you'll want to store your results in /data/thphan (I created a directory for you).

      Since this is a new implementation, there may be problems. You can either email me or assign the ticket to me if you get blocked.

       

       

              willis8 Craig Willis
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: