Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-929

Run baselines with OHSUMED test collection

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None

      The goal of this task is to run our baselines against the OHSUMED test collection:

      http://trec.nist.gov/data/t9_filtering.html
       

      Basic steps:

      • Download test collection.  Put data in shared data directory
      • Create new build_index scripts in biocaddie/index for collection.
      • Build index, put output in shared directory (but also keep a copy on your VM for performance)
      • Convert topics to indri format. Check converted topics into biocaddie/queries
      • Run baselines (all non-feedback + rm3), add results to Wiki.

      When done, create PR with your changes to the biocaddie repo and assign this ticket to Craig for review.

       

      Note: The OHSUMED test collection was originally used for filtering, so the queries and qrels appear to be split into train/test. You'll want to combine these for ad-hoc evaluation.  Also, the documents are in a non-standard format (probably  old PubMed). You might spend some time looking to see of there is TREC-formatted documents or if someone has provided a script to convert the data.

       

              thphan2 Thuong Phan
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - 1 day
                  1d
                  Remaining:
                  Remaining Estimate - 1 day
                  1d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified