This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments.
As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:
- be able to run the extractor
- add a README, specifically a readme.md (i.e. in markdown), with information on how to install dependencies and run the extractor (in its current shape)
- start looking at dbpedia extractor for template on how to dockerize it
- ... more to come
Steps to take for every extractor in this list:
- Docker containers
- JSONLD
- Extractor info registration
- Use pyclowder (for python extractors)
- Add status messages to all extractors and fix level granularity
- Make status constants (DONE, ERROR)
- Arcgis multiprocessing extractor
- Register on on demand execution queues
- Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
- Standardize around python logging
- Figure out what to log and what format to follow
Add logstash to docker compose- Add sample input/ouput to git repository
- Add icon for tools catalog to git repository
- Add entry to Tools catalog, with icon
ID (Extractor Name from config file, same as queue name) | Programming Language | Software | OS | Can be Dockerized? | Can be upload to Docker Hub ? | Assigned To | Link to repo | Who wrote or worked on the code |
---|
DEPLOYED | | | | | | | | |
---|
ncsa.image.ocr | Python | Tesseract | Linux | | | Rui | ocr | |
---|
ncsa.cv.faces | Python | OpenCV | Linux | | | Inna (may be?)opencvcveyesOpenCVLinux Inna | opencvcvcloseupsOpenCVLinux InnaopencvcvprofilesOpenCVLinux Inna | opencvfluorescentcometpymedici flytrackobject | Python | | Windows | No | |
---|
| human silvercomet cellprofilerspecklesphog | Python | Matlab, mnist-sphog |
---|
WindowsNo | /cellprofilerLianacellprofiler.trackobjectbisque.histogram (notes: disabled) | Python | | Linux |
---|
WindowsNohttps://opensource.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerbisque.metadata (notes: disabled) | Python | | Linux | | | | |
---|
census-section-segmentor | Java | | Linux | | |
---|
Liana | ncsa.cellprofiler.tumor | Python | | Windows | No | | cellprofilercellprofileryeastPython | | Windows | No | | Python | OpenCV (python), convert (from imagemagick), and Gdal | Linux | | |
cellprofilerimagesphog Matlab, mnist-sphog Gregory Jansen | cv/handwritten/HandwrittenNumbers imagecaltech101 | Gregory Jansen | | | ncsa.bisque.histogram (notes: disabled) | Python | | Linux | | | | | |
---|
ncsa.bisque.metadata (notes: disabled) | Python | | | | | census-section-segmentor | Java | | cvcensusLiana Innacvriver Python | OpenCV (python), convert (from imagemagick), and GdalSmruti Padhy | cvriverLiana | humanpref | Marcus, Ankit |
ncsa.xml.greenindexroute, ncsa.csv.greenindexroute |
---|
ncsa.geo.shpExtractorgdalJong Lee | geoJong LeegeotiffExtractorgdal | Marcus |
ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | | |
---|
Jong Leegeo/browseJong LeeimagegeotiffGDAL, Cython, numpy, pygeoprocessing | Linux | RuigeotiffRui, Mostafa ElagimageponddetectMatlabLinux | Marcusmapsfeature_detectionSimpleLanguage | Liana |
ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least: nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt. |
---|
Marcus, Ankit | ncsa.image.humanpref | Python | Matlab | LinuxMarcusmapshumanprefMarcus, Ankitxml.greenindexroute, ncsa.csv.greenindexroutePython | OpenCV | Linuxmapsgreenrouteimageknn_numeralsOpenCV | Linux | | | Marcus | | Marcus | ncsa.audio.speech2text | Java | CMU Sphinx, ffmpeg, sox | Linux | | | Marcuscoreaudio/speech2textMarcus | ncsa.audio.preview | core/audio/preview nlpsimplelanguagePythonnumpy | nlp/SimpleLanguageLiana | ncsa.nlp.simplesummary | Python | Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:
Kenton, Smruti |
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python |
---|
nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.Gregory JansennlpSimpleSummaryLiana | ncsa.nlp.SNLPSentiment | Java | Stanford CoreNLP tool, java, mavenimage/preview | Rob, Sandeep |
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | | |
---|
| nlpSNLPSNLPSentimentExtractorLiana, Marcus(?).nlp.wordtablesPython | requests, pika, win32comnlpWordTablesExtractorLiana | siegfried | siegfriedGregory JansenversusimageJava | Versus | | pdfimages, from poppler-utils | |
LinuxSmruti PadhyversusKenton, Smruti | ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | | | /poppler | |
ncsa.cv.caltech101 | Python | Matlab and VLFeat | 64-bit Mac OS |
---|
core/browse/image/previewRob, Sandeep | ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | core/pdf/previewRob | ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.) | Python | core/browse/video/previewRob | | | | | | | | | |
---|
| | | | | | | | |
---|
| | | | | | | | |
---|
NOT DEPLOYED | | | | image.digitpyPython | opencvcv/handwritten/SimpleDigitPython | ncsa.cv.pdfimages | | pdfimages, from poppler-utilscv/poppler cv.caltech101Matlab and VLFeat64-bit Mac OS cv/vlfeat dbpedia Natural Language Toolkit (NLTK) and rdflib.pymedici, subprocess, logging, os, numpy, shutil, zipfile | | | |
Luigi MarinidbpediaLuigi Marini | Marcus |
msc-ChemCBCExtractor |
---|
digest | pika, openpyxl, xlrd, pymongo | Linux |
digest | ncsa.hpc | requests, pika, openpyxl, xlrd, pymongo | Linux |
hpc | LSVA | /IsletExtractor | Yan |
msc-MonitorExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux |
---|
Java | | lsva | LSVA integrated | | Yan |
ncsa.msc.dailymonitor | Python | requests, pika, openpyxl, xlrd, pymongo |
---|
| lsva-integrated | Ashwini |
msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Linux |
---|
ncsa.movieslice | Python | | moviesliceSandeep | mri2mesh | Python | pymedici, subprocess, logging, os, numpy, shutil, zipfile | mrimri2meshMarcus | msc-ChemCBCExtractorrequests, pika, openpyxl, xlrd, pymongo | Linux | YanmscChemCBCExtractorYanmscIsletExtractordetector | Python | MATLAB, FFMPEG, requests |
---|
, pika, openpyxl, xlrd, pymongoLinuxYan | mscIsletExtractorYanmscMonitorExtractortracker | Python | python, MATLAB, FFMPEG requests |
---|
, pika, openpyxl, xlrd, pymongoLinuxYanmscMonitorExtractorYanncsamsc.dailymonitorplantcv | Python | pika requests |
---|
, pika, openpyxl, xlrd, pymongonot usedmsc/OldMonitorExtractorAshwini | msc-PhenotypeExtractor | Python | requests, pika, openpyxl, xlrd, pymongo | Yan |
medici_PTM_thumbnails | Java | | |
---|
LinuxYanmscPhenotypeExtractorYan | Constantinos |
medici_PTM_metadata |
---|
ncsa.nlp.SNLPStanford CoreNLP tool, java, maven | nlpSNLP/SNLPExtractorLiana | ncsa.nlp.tika | Python | Tika project page, pymedicinlptikaLiana | person-detector | Python | MATLAB, FFMPEG, requests and pikaperson-detectorpythonSandeep | ncsa.person-tracker | Python | python, MATLAB, FFMPEG requests and pikaperson-trackingpythonSandeep | Constantinos |
medici_images_ptm | Java |
---|
terra.plantcv | Python | pikarequestswheelplantcvYan | Constantinos |
extractors-rabbitmq (look like examples) |
---|
medici_PTM_thumbnails | Javaptm/PTMThumbnailExtractorConstantinos | medici_PTM_metadata | Java | ptm/PTMMetadataExtractorConstantinos | Name not clear
PtmMetadata(? ptmPTMMetadataptm_maps ptmPTMMapsExtractorptm_3d ptmPTM3DExtractorimages_ptm ptmImagesPTMExtractorrabbitmq(look like examples)
rabbitmq | Name not clear extractors-seabird/ | Scala | seabirdLuigi | medici_3d_x3d (one of extractors-3d) | Java | | | | | Constantinos |
ncsa.arcgis.landsat7mosaic | Python | ArcGIS | Windows | No | |
---|
extractors3dObjJSONExtractorConstantinos | medici_3d_obj_merger (one of extractors-3d) | Java | | | | | | 3dOBJMergerExtractorConstantinosoni (one of extractors-3d) 3dOniExtractorConstantinosply_obj (one of extractors-3d) 3dPlyObjExtractorConstantinos | medici_3d_metadata (one of extractors-3d) | Java | 3d/browse/ThreeDMetadataExtractorConstantinos | medici_x3d_html (one of extractors-3d) | Java | | | | | | 3dX3DhtmlExtractorConstantinos | ncsa.arcgis.landsat7mosaic | Python | ArcGIS | Windows | No | | Smruti Padhybd-cz/ndviextractorSmrutiarcgis.floodplainArcGISWindowsNoSmruti Padhybd-cz/terex_floodplain/config.py | Constantinos |
ncsa.image.metadata | Python |
---|
Smruti | medici_book | Java | booksBookPreviewExtractorTheerasit Issaranon | medici_image_pyramid | JavaCATSbooks/ImagePreviewPyramidExtractor-shebookTheerasit Issaranon | shebook | Java | CATSbooks/SheBookPreviewExtractor/src/BookPreviewExtractor CATSbooks/SheBookPreviewExtractor/src/bookpreviewextractorTheerasit Issaranon | lsva-cedd | Java | CATSceddConstantinos.cinemetricsPythonCATS-cinemetricsbrowseConstantinosimagemetadataPythonCATSextractors-coreimagemetadataMax. Rob | | | | | | | | | |
---|
| | | | | | | |