...
- be able to run the extractor
- add a README, specifically a readme.md (i.e. in markdown), with information on how to install dependencies and run the extractor
- add entry to Tools catlaog, with icon, sample input/output
- (in its current shape)
- start looking at dbpedia extractor for template on how to dockerize it
- ... more to come
Steps to take for every extractor in this list:
- Docker containers
- JSONLD
- Extractor info registration
- Use pyclowder (for python extractors)
- Add status messages to all extractors and fix level granularity
- Make status constants (DONE, ERROR)
- Arcgis multiprocessing extractor
- Register on on demand execution queues
- Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
- Standardize around python logging
- Figure out what to log and what format to follow
Add logstash to docker compose- Add sample input/ouput to git repository
- Add icon for tools catalog to git repository
- Add entry to Tools catalog, with icon
ID (Extractor Name from config file, same as queue name) | Programming Language | Software | OS | Can be Dockerized? | Can be upload to Docker Hub ? | Assigned To | Link to repo | Who wrote or worked on the code |
---|
DEPLOYED | | | | | | | | |
---|
ncsa.image.ocr | Python | Tesseract | Linux | | | Rui | ocr | |
---|
ncsa.cv.faces | Python | OpenCV | Linux | | | RuiopencvcveyesOpenCVLinux RuiopencvcvcloseupsOpenCVLinux Rui | opencvcvprofilesOpenCVLinux Ruiopencvfluorescentcometpymedici fly human cellprofilersilvercometWindows | Matlab, mnist-sphog | Linux |
No /cellprofilerLianacellprofiler.specklebisque.metadata (notes: disabled) | Python | | Linux |
---|
WindowsNoncsa.cellprofiler.trackobject | Python | | Windows | No | | | https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofiler | Liana | census-section-segmentor | Java | | Linux | | |
---|
cellprofilercellprofilertumorPythonWindows | (python), convert (from imagemagick), and Gdal | Linux |
No cellprofilercellprofileryeast NoWindows | cv/cellprofilerLianaimagesphog Matlab, mnist-sphog Gregory Jansen | cv/browse/handwritten/HandwrittenNumbers | | | | | | | Gregory Jansen | | |
---|
ncsa.bisque.histogram (notes: disabled) | Python | | Linux | | | | | |
---|
ncsa.bisque.metadata (notes: disabled) | Python | geo/browse | Jong Lee |
ncsa.image.geotiff | Python | GDAL, Cython, numpy, pygeoprocessing |
---|
| | | | census-section-segmentor | Java | Sandeep Puthanveetil Satheesan | cvcensusLiana Innacvriver Python | OpenCV (python), convert (from imagemagick), and Gdal Smruti Padhy | cvriverLiana | humanpref | Marcus, Ankit |
ncsa.xml.greenindexroute, ncsa.csv.greenindexroute |
---|
ncsa.geo.shpExtractorgdalJong Lee | geoJong LeegeotiffExtractorgdal | Marcus |
ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | | |
---|
Jong LeegeoJong LeeimagegeotiffLinuxGDAL, Cython, numpy, pygeoprocessing | RuigeotiffRui, Mostafa ElagimageponddetectMatlabLinux | Marcus Slavenasmapsfeature_detectionSimpleLanguage | Liana |
ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least: nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt. |
---|
Marcus, Ankit | ncsa.image.humanpref | Python | Matlab | LinuxMarcus SlavenasmapshumanprefMarcus, Ankitxml.greenindexroute, ncsa.csv.greenindexroutePython | OpenCV | Linuxmapsgreenrouteimageknn_numeralsOpenCV | Linux | | | Marcus Slavenas | | Marcus | ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | | | Marcus Slavenascoreaudio/speech2textMarcus | ncsa.audio.previewInna | core/audio/preview nlpsimplelanguagePythonnumpy Inna | nlp/SimpleLanguageLiana | ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:
Kenton, Smruti |
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python |
---|
nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.Gregory JansennlpSimpleSummaryLiana | ncsa.nlp.SNLPSentiment | Java | Stanford CoreNLP tool, java, mavenimage/preview | Rob, Sandeep |
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | | |
---|
| nlpSNLPSNLPSentimentExtractorLiana, Marcus(?).nlp.wordtablesPython | requests, pika, win32comnlpWordTablesExtractorLiana | siegfried | siegfriedGregory JansenversusimageJava | Versus | | pdfimages, from poppler-utils | |
LinuxSmruti PadhyversusKenton, Smruti | ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | /poppler | |
ncsa.cv.caltech101 | Python | Matlab and VLFeat | 64-bit Mac OS |
---|
core/browse/image/previewRob, Sandeep | ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | core/pdf/previewRob | ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | core/browse/video/previewRob | | | | | | | | | |
---|
| | | | | | | | |
---|
| | | | | | | | |
---|
NOT DEPLOYED | | | | image.digitpyPython | opencvcv/handwritten/SimpleDigitPython | ncsa.cv.pdfimages | | pdfimages, from poppler-utilscv/poppler cv.caltech101Matlab and VLFeat64-bit Mac OS cv/vlfeat dbpedia Natural Language Toolkit (NLTK) and rdflib.pymedici, subprocess, logging, os, numpy, shutil, zipfile | | | |
Luigi MarinidbpediaLuigi Marini | Marcus |
msc-ChemCBCExtractor |
---|
digest | pika, openpyxl, xlrd, pymongo | Linux |
digest | ncsa.hpc | requests, pika, openpyxl, xlrd, pymongo | Linux |
hpc | LSVA | /IsletExtractor | Yan |
msc-MonitorExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux |
---|
Java | | lsva | LSVA integrated | | Yan |
ncsa.msc.dailymonitor | Python | requests, pika, openpyxl, xlrd, pymongo |
---|
| lsva-integrated | Ashwini |
msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux |
---|
ncsa.movieslice | Python | | moviesliceSandeep | mri2mesh | Python | pymedici, subprocess, logging, os, numpy, shutil, zipfile | mrimri2meshMarcus | msc-ChemCBCExtractorrequests, pika, openpyxl, xlrd, pymongo | Linux | YanmscChemCBCExtractorYanmscIsletExtractordetector | Python | MATLAB, FFMPEG, requests |
---|
, pika, openpyxl, xlrd, pymongoLinuxYan | mscIsletExtractorYanmscMonitorExtractortracker | Python | python, MATLAB, FFMPEG requests |
---|
, pika, openpyxl, xlrd, pymongoLinuxYanmscMonitorExtractorYanncsamsc.dailymonitorplantcv | Python | pika requests |
---|
, pika, openpyxl, xlrd, pymongonot usedmsc/OldMonitorExtractorAshwini | msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Yan |
medici_PTM_thumbnails | Java | | |
---|
LinuxYanmscPhenotypeExtractorYan | Constantinos |
medici_PTM_metadata |
---|
ncsa.nlp.SNLPStanford CoreNLP tool, java, maven | nlpSNLP/SNLPExtractorLiana | ncsa.nlp.tika | Python | Tika project page, pymedicinlptikaLiana | person-detector | Python | MATLAB, FFMPEG, requests and pikaperson-detectorpythonSandeep | ncsa.person-tracker | Python | python, MATLAB, FFMPEG requests and pikaperson-trackingpythonSandeep | Constantinos |
medici_images_ptm | Java |
---|
terra.plantcv | Python | pikarequestswheelplantcvYan | Constantinos |
extractors-rabbitmq (look like examples) |
---|
medici_PTM_thumbnails | Javaptm/PTMThumbnailExtractorConstantinos | medici_PTM_metadata | Java | ptm/PTMMetadataExtractorConstantinos | Name not clear
PtmMetadata(? ptmPTMMetadataptm_maps ptmPTMMapsExtractorptm_3d ptmPTM3DExtractorimages_ptm ptmImagesPTMExtractorrabbitmq(look like examples)
rabbitmq | Name not clear extractors-seabird/ | Scala | seabirdLuigi | medici_3d_x3d (one of extractors-3d) | Java | | | | | Constantinos |
ncsa.arcgis.landsat7mosaic | Python | ArcGIS | Windows | No | |
---|
extractors3dObjJSONExtractorConstantinos | medici_3d_obj_merger (one of extractors-3d) | Java | | | | | | 3dOBJMergerExtractorConstantinosoni (one of extractors-3d) 3dOniExtractorConstantinosply_obj (one of extractors-3d) 3dPlyObjExtractorConstantinos | medici_3d_metadata (one of extractors-3d) | Java | 3d/browse/ThreeDMetadataExtractorConstantinos | medici_x3d_html (one of extractors-3d) | Java | | | | | | 3dX3DhtmlExtractorConstantinos | ncsa.arcgis.landsat7mosaic | Python | ArcGIS | Windows | No | | Smruti Padhybd-cz/ndviextractorSmrutiarcgis.floodplainArcGISWindowsNoSmruti Padhybd-cz/terex_floodplain/config.py | Constantinos |
ncsa.image.metadata | Python |
---|
Smruti | medici_book | Java | booksBookPreviewExtractorTheerasit Issaranon | medici_image_pyramid | JavaCATSbooks/ImagePreviewPyramidExtractor-shebookTheerasit Issaranon | shebook | Java | CATSbooks/SheBookPreviewExtractor/src/BookPreviewExtractor CATSbooks/SheBookPreviewExtractor/src/bookpreviewextractorTheerasit Issaranon | lsva-cedd | Java | CATSceddConstantinos.cinemetricsPythonCATS-cinemetricsbrowseConstantinosimagemetadataPythonCATSextractors-coreimagemetadataMax. Rob | | | | | | | | | |
---|
| | | | | | | |