This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments.
As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:
- be able to run the extractor
- add a README, specifically a readme.md (i.e. in markdown), with information on how to install dependencies and run the extractor (in its current shape)
- start looking at dbpedia extractor for template
- Learn about jsonld by playing in the playground here http://json-ld.org/
- Go through the README for the docker extractors template: https://opensource.ncsa.illinois.edu/bitbucket/projects/BD/repos/bd-extractor-template/browse
Steps to take for every extractor in this list:
- Docker containers
- JSONLD
- Extractor info registration
- Use pyclowder (for python extractors)
- Add status messages to all extractors and fix level granularity
- Make status constants (DONE, ERROR)
- Arcgis multiprocessing extractor
- Register on on demand execution queues
- Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
- Standardize around python logging
- Figure out what to log and what format to follow
Add logstash to docker compose- Add sample input/ouput to git repository
- Add icon for tools catalog to git repository
- Add entry to Tools catalog, with icon
ID (Extractor Name from config file, same as queue name) | Programming Language | Software | OS | Can be Dockerized? | Assigned To | Repo | Author |
---|
DEPLOYED |
---|
ncsa.image.ocr | Python | Tesseract | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr | |
---|
ncsa.cv.faces | Python | OpenCV | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencv | Liana |
---|
ncsa.cv.eyes | Python | OpenCV | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencv | Liana |
---|
ncsa.cv.closeups | Python | OpenCV | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencv | Liana |
---|
ncsa.cv.profiles | Python | OpenCV | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencv | Liana |
---|
ncsa.cellprofiler.fluorescentcomet | Python | pymedici | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.fly | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.human | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.silvercomet | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.speckle | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.trackobject | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.tumor | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.cellprofiler.yeast | Python | | Windows | No | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana |
---|
ncsa.image.sphog | Python | Matlab, mnist-sphog | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/HandwrittenNumbers | |
---|
ncsa.image.caltech101 | | | | | | | |
---|
ncsa.bisque.histogram (notes: disabled) | Python | | Linux | | | | |
---|
ncsa.bisque.metadata (notes: disabled) | Python | | Linux | | | | |
---|
census-section-segmentor | Java | | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/census | Liana, Inna |
---|
ncsa.cv.river | Python | OpenCV (python), convert (from imagemagick), and Gdal | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/river | Liana |
---|
ncsa.geo.shpExtractor | Python | gdal | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browse | Jong Lee |
---|
ncsa.geo.tiffExtractor | Python | gdal | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browse | Jong Lee |
---|
ncsa.image.geotiff | Python | GDAL, Cython, numpy, pygeoprocessing | Linux | | Rui | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geotiff/browse | Rui, Mostafa Elag |
---|
ncsa.image.ponddetect | Python | Matlab | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/feature_detection | Marcus, Ankit |
---|
ncsa.image.humanpref | Python | Matlab | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/humanpref | Marcus, Ankit |
---|
ncsa.xml.greenindexroute, ncsa.csv.greenindexroute | Python | OpenCV | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/greenroute | Marcus |
---|
ncsa.image.knn_numerals | Python | OpenCV | Linux | | | | Marcus |
---|
ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/speech2text | Marcus |
---|
ncsa.audio.preview | Python | | | | Inna | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/preview | |
---|
ncsa.nlp.simplelanguage | Python | numpy | | | Inna | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleLanguage | Liana |
---|
ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least: nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt. | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleSummary | Liana |
---|
ncsa.nlp.SNLPSentiment | Java | Stanford CoreNLP tool, java, maven | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SNLP/SNLPSentimentExtractor | Liana, Marcus(?) |
---|
ncsa.nlp.wordtables | Python | requests, pika, win32com | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/WordTablesExtractor | Liana |
---|
siegfried | Python | | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-siegfried/browse | Gregory Jansen |
---|
ncsa.versus.image | Java | Versus | Linux | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-versus/browse | Kenton, Smruti |
---|
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/image/preview | Rob, Sandeep |
---|
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/pdf/preview | Rob |
---|
ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/video/preview | Rob |
---|
NOT DEPLOYED |
---|
ncsa.image.digitpy | Python | opencv | | | |
---|
...
ID (Extractor Name from config file, same as queue name) | Programming Language | Software | OS | Assigned To | Link to repo | Who wrote or worked on the code |
---|
DEPLOYED | | | | | | |
---|
ncsa.image.ocr | Python | Tesseract | Linux | Rui | ocr | |
---|
ncsa.cv.faces | Python | OpenCV | Linux | Inna (may be?)cv/opencvLiana | ncsa.cv.eyesOpenCVLinuxInnacv/opencvLianacv.closeupsOpenCVLinuxInnacv/opencvLiana | ncsa.cv.profiles | Python | OpenCV | Linux | Innacv/opencvncsa.cellprofiler.fluorescentcomet | Python | pymedici | , Constantinos |
LSVA integrated | | | |
---|
Windowscv/cellprofilerLianacellprofiler.flyWindowscv/cellprofilerLiana | ncsa.cellprofiler.human | Python | pymedici, subprocess, logging, os, numpy, shutil, zipfile | |
---|
WindowscvcellprofilerLiana | Marcus |
msc-ChemCBCExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux | | |
---|
ncsa.cellprofiler.silvercomet | Python | | Windows | cvcellprofilerncsa.cellprofiler.speckle | Python | | Windows | Liana | Yan |
msc-IsletExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux | | |
---|
cvcellprofilerLiana | ncsa.cellprofiler.trackobject | Python | | Windows | IsletExtractor | Yan |
msc-MonitorExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux | | |
---|
cvcellprofilerLianacellprofilertumordailymonitor | Python | requests, pika, openpyxl, xlrd, pymongo | |
---|
Windows cvcellprofilerLiana | ncsa.cellprofiler.yeast | Python | | Windows | OldMonitorExtractor | Ashwini |
msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux | | |
---|
cvcellprofilerLianaimagesphogPythonMatlab, mnist-sphogLinuxcvhandwritten/HandwrittenNumbers | | | | | | | bisque.histogram (notes: disabled)Linux | | bisque.metadata (notes: disabled)Python | | Linuxcensus-section-segmentor | Java | | LinuxcvcensusLiana, Innancsa.cv.river | Python | OpenCV (python), convert (from imagemagick), and Gdal | terra.plantcv | Python | pika requests wheel | | |
---|
Linuxcv/riverLiana | | Yan |
medici_PTM_thumbnails | Java | | | |
---|
ncsa.geo.shpExtractor | Python | gdal | LinuxgeoJong Lee | ncsa.geo.tiffExtractor | Python | gdal | LinuxgeoJong Lee | ncsa.image.ponddetect | Python | Matlab | Linux | Marcusmapsfeature_detectionMarcus, Ankit | ncsa.image.humanpref | Python | Matlab | Linux | MarcusmapshumanprefMarcus, Ankit | ncsa.xml.greenindexroute, ncsa.csv.greenindexroute | Python | OpenCV | Linux | Constantinos |
medici_ptm_3d | Java | | | | |
---|
MarcusmapsgreenrouteMarcus | ncsa.image.knn_numerals | Python | OpenCV | Linux | Marcus | | Marcus |
---|
ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | Marcus | | Marcus |
---|
ncsa.nlp.simplelanguage | Python | numpynlpSimpleLanguageLiana | ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:
Constantinos |
extractors-rabbitmq (look like examples) | | | |
---|
nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.nlp/SimpleSummaryLiana | ncsa.nlp.SNLPSentiment | Java | Stanford CoreNLP tool, java, mavennlp/SNLP/SNLPSentimentExtractorLiana, Marcus(?) | ncsa.nlp.wordtables | Python | requests, pika, win32comnlpWordTablesExtractorLiana | | | | | | | |
---|
| | | | | | |
---|
| | NOT DEPLOYED | | image.digitpy (notes: not in the Wiki page)Python | opencvcv/handwrittenSimpleDigitPython | ncsa.cv.pdfimages (not in the wiki page) | | pdfimages, from poppler-utilscvpoppler | ncsa.cv.caltech101 | Python | Matlab and VLFeat | 64-bit Mac OS cvvlfeat | Constantinos |
ncsa.arcgis.landsat7mosaic |
---|
dbpedia Natural Language Toolkit (NLTK) and rdflib. | | Luigi MarinidbpediaLuigi Marini | Smruti |
ncsa.arcgis.floodplain |
---|
digest digest | ncsa.image.geotiff | Python | GDAL, Cython, numpy, pygeoprocessing, pika, requests | Linux | RuigeotiffRui, Mostafa Elag | ncsa.hpc | Pythonhpc | LSVAlsvaLSVA integrated | | | | lsva-integrated | ncsa.movieslice | PythonmoviesliceSandeepmri2meshpymedici, subprocess, logging, os, numpy, shutil, zipfilemri/mri2meshMarcus | Constantinos |
ncsa.image.metadata |
---|
msc-ChemCBCExtractormsc/ChemCBCExtractorYan | msc-IsletExtractor | Python | requests, pika, openpyxl, xlrd, pymongoCATSmsc/IsletExtractorYan | msc-MonitorExtractor | Python | requests, pika, openpyxl, xlrd, pymongoCATSmsc/MonitorExtractorYanncsa.msc.dailymonitor | Python | requests, pika, openpyxl, xlrd, pymongo | | | CATSmsc/OldMonitorExtractorAshwini | msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongoCATSmsc/PhenotypeExtractorYannlp.SNLPJava | Stanford CoreNLP tool, java, mavenCATSextractors-nlpSNLPSNLPExtractorLiananlptikaPythonTika project page, pymediciCATSextractors-nlptikaLiana | | | | | | | |
---|
| | | | | | |
---|
| | | | | |