Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments. 

 

As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:

Steps to take for every extractor in this list:

  1. Docker containers
  2. JSONLD
  3. Extractor info registration
  4. Use pyclowder (for python extractors)
  5. Add status messages to all extractors and fix level granularity
    1. Make status constants (DONE, ERROR)
    2. Arcgis multiprocessing extractor
  6. Register on on demand execution queues
    1. Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
  7. Standardize around python logging
    1. Figure out what to log and what format to follow
  8. Add logstash to docker compose
  9. Add sample input/ouput to git repository
  10. Add icon for tools catalog to git repository
  11. Add entry to Tools catalog, with icon

 

ID (Extractor Name from config file,

same as queue name)

Programming

Language

SoftwareOSAssigned ToLink to repoWho wrote or worked on the codeDEPLOYED      ncsa.image.ocrPythonTesseractLinuxRuiocr 

ncsa.cv.faces

PythonOpenCVLinuxInna (may be?)

ID (Extractor Name from config file, same as queue name)

Programming Language

SoftwareOSCan be Dockerized?Assigned ToRepoAuthor
DEPLOYED
ncsa.image.ocrPythonTesseractLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr 

ncsa.cv.faces

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.eyes

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.profiles

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.fly

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.human

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.silvercomet

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.speckle

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.trackobject

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.tumor

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.yeast

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.image.sphog

Python Matlab, mnist-sphog Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/HandwrittenNumbers 

ncsa.image.caltech101

      
ncsa.bisque.histogram (notes: disabled)Python Linux    
ncsa.bisque.metadata (notes: disabled)Python Linux    
census-section-segmentorJava Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/censusLiana, Inna
ncsa.cv.river PythonOpenCV (python), convert (from imagemagick), and GdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/riverLiana
ncsa.geo.shpExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.geo.tiffExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.image.geotiffPython

GDAL, Cython, numpy,
pygeoprocessing

Linux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geotiff/browseRui, Mostafa Elag

ncsa.image.ponddetect

PythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/feature_detectionMarcus, Ankit
ncsa.image.humanprefPythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/humanprefMarcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/greenrouteMarcus

ncsa.image.knn_numerals

PythonOpenCVLinux  Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/speech2textMarcus
ncsa.audio.previewPython   Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/preview 
ncsa.nlp.simplelanguagePythonnumpy  Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleLanguageLiana
ncsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleSummaryLiana
ncsa.nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SNLP/SNLPSentimentExtractorLiana, Marcus(?)
ncsa.nlp.wordtablesPython requestspikawin32com   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/WordTablesExtractorLiana
siegfriedPython    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-siegfried/browseGregory Jansen
ncsa.versus.imageJavaVersusLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-versus/browseKenton, Smruti
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/image/previewRob, Sandeep
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/pdf/previewRob
ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/video/previewRob
NOT DEPLOYED
ncsa.image.digitpyPythonopencv   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/SimpleDigitPython 
ncsa.cv.pdfimages pdfimages, from poppler-utils   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/poppler 
ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/vlfeat 
dbpediaPython Natural Language Toolkit (NLTK) and rdflib.  
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
dbpedia/browse
/opencv
Liana
digest
ncsa.cv.eyes
Python 
OpenCV
 
Linux
 
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
digest/browse
/opencv
Liana
 
ncsa.
cv.closeups
hpcPython 
OpenCV
 
Linux
 
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
hpc/browse
/opencvLiana
LSVAJava    

ncsa.cv.profiles

PythonOpenCVLinuxInna
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva/browse
/opencv
Liana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)
, Constantinos
LSVA integrated   
Windows
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva-integrated/browse
/cellprofiler
Liana
ncsa.
cellprofiler.fly
movieslicePython  
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
movieslice/browse
/cellprofiler
Liana
Sandeep
mri2meshPythonpymedici, subprocess, logging, os, numpy, shutil, zipfile

ncsa.cellprofiler.human

Python
 
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
mri/browse/
cellprofiler
mri2mesh
Lianancsa.cellprofiler.silvercomet
Marcus
msc-ChemCBCExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler

ncsa.cellprofiler.speckle

Python Windows
ChemCBCExtractor
Liana
Yan
msc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofilerLiana

ncsa.cellprofiler.trackobject

Python Windows
IsletExtractorYan
msc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler
MonitorExtractor
Liana
Yan
ncsa.
cellprofiler
msc.
tumor
dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo 
Windows
 
 
not usedhttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofilerLiana

ncsa.cellprofiler.yeast

Python Windows
OldMonitorExtractorAshwini
msc-PhenotypeExtractorPython

requests, pika, openpyxl, xlrd, pymongo

Linux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler
PhenotypeExtractor
Liana
Yan
ncsa.
image
nlp.
sphog
SNLP
Python
Java 
Matlab, mnist-sphog
Stanford CoreNLP tool, java, maven 
Linux
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
handwritten/HandwrittenNumbers 
SNLP/SNLPExtractorLiana
  • ncsa.image.caltech101
      
ncsa.
bisque.histogram (notes: disabled)Linux
nlp.tikaPython 
Tika project page, pymedici  
 Python Linux
https://opensource.ncsa.
bisque.metadata (notes: disabled)
illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/tikaLiana
person-detectorPython MATLAB, FFMPEG, requests and pika
   
census-section-segmentorJava 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-detector/browse/pythonSandeep
ncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika  
Linux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
cv
tracking/browse/
census
python
Liana, Inna
Sandeep
ncsa.cv.river PythonOpenCV (python), convert (from imagemagick), and GdalLinux
terra.plantcvPython

pika
requests
wheel

   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
plantcv/browse
/riverLiana
Yan
medici_PTM_thumbnailsJava   
ncsa.geo.shpExtractorPythongdalLinux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
ptm/browse
Jong Lee
/PTMThumbnailExtractorConstantinos
medici_PTM_metadataJava   
ncsa.geo.tiffExtractorPythongdalLinux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
ptm/browse
Jong Lee

ncsa.image.ponddetect

PythonMatlabLinux
/PTMMetadataExtractorConstantinos

Name not clear

PtmMetadata(?)

Java    
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
feature_detectionMarcus, Ankitncsa.image.humanprefPythonMatlabLinuxMarcus
PTMMetadataConstantinos
medici_ptm_mapsJava    
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
humanpref
PTMMapsExtractor
Marcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux
Constantinos
medici_ptm_3dJava    
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
greenroute
PTM3DExtractor
Marcus
Constantinos
medici_images_ptmJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-ptm/browse/ImagesPTMExtractorConstantinos

extractors-rabbitmq

(look like examples)

   

ncsa.image.knn_numerals

PythonOpenCVLinuxMarcus Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinuxMarcus Marcusncsa.nlp.simplelanguagePythonnumpy
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
rabbitmq/browse
/SimpleLanguage
Lianancsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 
Name not clear extractors-seabird/Scala  
 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/
repos/extractors-nlp/browse/SimpleSummaryLianancsa.nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven
repos/extractors-seabird/browseLuigi
medici_3d_x3d (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
3d/browse/
SNLP/SNLPSentimentExtractorLiana, Marcus(?)ncsa.nlp.wordtablesPython requestspikawin32com
ObjJSONExtractorConstantinos
medici_3d_obj_merger (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp/browse/WordTablesExtractorLiana                     NOT DEPLOYED      ncsa.image.digitpy (notes: not in the Wiki page)Pythonopencv
3d/browse/OBJMergerExtractorConstantinos
medici_oni (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse/
handwritten/SimpleDigitPython
OniExtractor
 ncsa.cv.pdfimages (not in the wiki page) pdfimages, from poppler-utils
Constantinos
medici_ply_obj (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse/
poppler ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS 
PlyObjExtractorConstantinos
medici_3d_metadata (one of extractors-3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse/
vlfeat
ThreeDMetadataExtractorConstantinos
medici_x3d_html (one of extractors-3d)
 
dbpedia
Java
Python
  
Natural Language Toolkit (NLTK) and rdflib.
 
Luigi Marini
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
dbpedia
3d/browse/X3DhtmlExtractor
Luigi Marini
Constantinos
ncsa.arcgis.landsat7mosaic
digest
Python
 
ArcGISWindows
 
No
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-
digest
cz/browse/ndviextractor
 
Smruti
ncsa.
image
arcgis.
geotiff
floodplainPython

GDAL, Cython, numpy,
pygeoprocessing,
pika,
requests

Linux
ArcGISWindowsNo
Rui
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-
geotiff
cz/browse
Rui, Mostafa Elag
/terex_floodplain/config.pySmruti
medici_bookJava 
ncsa.hpcPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
hpc
books/browse/BookPreviewExtractor
 
Theerasit Issaranon
medici_image_pyramid
LSVA
Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva
books/browse
 
/ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
shebookJava
LSVA integrated
    

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

lsva-integrated

books/browse/SheBookPreviewExtractor/src/BookPreviewExtractor

 

ncsa.movieslicePython   

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors

-movieslice/browseSandeepmri2meshPythonpymedici, subprocess, logging, os, numpy, shutil, zipfile

-books/browse/SheBookPreviewExtractor/src/bookpreviewextractor

Theerasit Issaranon
lsva-ceddJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
mri
cedd/browse
/mri2mesh
Marcus
Constantinos
ncsa.cinemetrics
msc-ChemCBCExtractor
Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
cinemetrics/browse
/ChemCBCExtractor
Yan
Constantinos
ncsa.image.metadata
msc-IsletExtractor
Python
requests, pika, openpyxl, xlrd, pymongo
    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
core/browse/
IsletExtractorYanmsc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
image/metadataMax. Rob
ncsa.debod.segmentor     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
msc
cellsegmentor/browse
/MonitorExtractor
Yan
 
ncsa.
msc
image.
dailymonitor
dmp 
Pythonrequests, pika, openpyxl, xlrd, pymongo
    

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

msc

debod/browse

/OldMonitorExtractorAshwinimsc-PhenotypeExtractorPythonrequests, pika, openpyxl, xlrd, pymongo

  

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

msc

dmp/browse

/PhenotypeExtractor

Yan
 
ncsa.
nlp.SNLPJava Stanford CoreNLP tool, java, maven
image.sphog.debod     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
nlp
handwrittendecimals/browse
/SNLP/SNLPExtractor
Liana
 

ncsa.

nlp.tikaPython Tika project page, pymedici

image.iarp_remove_circle

     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/
extractors-nlp
image_fetcher/browse/
tikaLianaperson-detectorPython MATLAB, FFMPEG, requests and pika
extractors/remove_circleMarcus
ncsa.cv.meangrey     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/
extractors-person-detector
image_fetcher/browse/extractors/
pythonSandeep
mean_greyMarcus