...
We create pubmeddata.json using script: ~/biocaddie/scripts/xml2json-pubmed.sh.This script will extract pmcid value and use it for "_id" and "pmcid" values, filename for "name" value and document text (just stripped off xml tags and remove special characters & new line) for "text" value.
3. Use pubmeddataIndex pubmeddata.json to index PubMed test dataset with ElasticSearch.
Run query: curl -H 'Content-Type: application/json' -XPOST 'localhost:9200/pubmed/dataset/_bulk?pretty' --data-binary "@/data/thphan/pubmed/json_test/pubmeddata.json"
...