Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-953

Create Lucene/ElasticSearch PubMed indexes

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Normal Normal
    • None
    • None
    • None
    • NDS Sprint 28, NDS Sprint 29

      One of our final tasks will be to evaluate the Lucene/Rocchio implementation using both BioCADDIE and PubMed for query expansion.  Ultimately, the BioCADDIE system will use ElasticSearch, but we know that we need to use Lucene for evaluation.  This means that we need to create Lucene indexes for BioCADDIE and Pubmed.

      We have two choices to create Lucene indexes:  We can use the LuceneBuildIndex command from the ir-utils library. This will create an index that is comparable to the Indri indexes we've used for evaluation. We can also create the Lucene index using ElasticSearch itself, which is closer to what BioCADDIE will use in production.

      This task is to create two separate PubMed indexes:

      • First, use the LuceneBuildIndex command using the existing TREC-text formatted PubMed data. 
      • Second, use ElasticSearch API. In this case, you will likely need to convert the PubMed data to an ElasticSearch-friendly format (maybe JSON?). 

      When this is done, we'll want to re-run the PubMed expansion using these indexes.

       

              willis8 Craig Willis
              willis8 Craig Willis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: