Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We create pubmeddata.json using script: ~/biocaddie/scripts/xml2json-pubmed.sh.This script will extract pmcid value and use it for "_id" and "pmcid" values, filename for "name" value and document text (just stripped off xml tags and remove special characters & new line) for "text" value.

3. Use pubmeddataIndex pubmeddata.json to index PubMed test dataset with ElasticSearch.

Run query: curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/pubmed/dataset/_bulk?pretty' --data-binary "@/data/thphan/pubmed/json_test/pubmeddata.json"

...