Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments. 

As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:

Steps to take for every extractor in this list:

  1. Docker containers
  2. JSONLD
  3. Extractor info registration
  4. Use pyclowder (for python extractors)
  5. Add status messages to all extractors and fix level granularity
    1. Make status constants (DONE, ERROR)
    2. Arcgis multiprocessing extractor
  6. Register on on demand execution queues
    1. Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
  7. Standardize around python logging
    1. Figure out what to log and what format to follow
  8. Add logstash to docker compose
  9. Add sample input/ouput to git repository
  10. Add icon for tools catalog to git repository
  11. Add entry to Tools catalog, with icon

 

ID (Extractor Name from config file, same as queue name)

Programming Language

SoftwareOSCan be Dockerized?Assigned ToRepoAuthor
DEPLOYED
ncsa.image.ocrPythonTesseractLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr 

ncsa.cv.faces

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.eyes

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.profiles

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.fly

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.human

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.silvercomet

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.speckle

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.trackobject

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.tumor

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.yeast

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.image.sphog

Python Matlab, mnist-sphog Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/HandwrittenNumbers 

ncsa.image.caltech101

      
ncsa.bisque.histogram (notes: disabled)Python Linux    
ncsa.bisque.metadata (notes: disabled)Python Linux    
census-section-segmentorJava Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/censusLiana, Inna
ncsa.cv.river PythonOpenCV (python), convert (from imagemagick), and GdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/riverLiana
ncsa.geo.shpExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.geo.tiffExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.image.geotiffPython

GDAL, Cython, numpy,
pygeoprocessing

Linux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geotiff/browseRui, Mostafa Elag

ncsa.image.ponddetect

PythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/feature_detectionMarcus, Ankit
ncsa.image.humanprefPythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/humanprefMarcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/greenrouteMarcus

ncsa.image.knn_numerals

PythonOpenCVLinux  Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/speech2textMarcus
ncsa.audio.previewPython   Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/preview 
ncsa.nlp.simplelanguagePythonnumpy  Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleLanguageLiana
ncsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleSummaryLiana
ncsa.nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SNLP/SNLPSentimentExtractorLiana, Marcus(?)
ncsa.nlp.wordtablesPython requestspikawin32com   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/WordTablesExtractorLiana
siegfriedPython    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-siegfried/browseGregory Jansen
ncsa.versus.imageJavaVersusLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-versus/browseKenton, Smruti
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/image/previewRob, Sandeep
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/pdf/previewRob
ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/video/previewRob
NOT DEPLOYED
ncsa.image.digitpyPythonopencv   

...

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/SimpleDigitPython 
ncsa.cv.pdfimages pdfimages, from poppler-utils   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/poppler 
ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/vlfeat 
dbpediaPython Natural Language Toolkit (NLTK) and rdflib.  

ID (Extractor Name from config file,

same as queue name)

Programming

Language

SoftwareOSAssigned ToLink to repoWho wrote or worked on the codeDEPLOYED      ncsa.image.ocrPythonTesseractLinuxRuiocr 

ncsa.cv.faces

PythonOpenCVLinuxInna (may be?)
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
dbpedia/browse
/opencv
Liana
digest
ncsa.cv.eyes
Python
OpenCV
  
Linux
 
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
digest/browse
/opencv
Liana
 
ncsa.
cv.closeups
hpcPython
OpenCV
  
Linux
 
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
hpc/browse
/opencvLiana
LSVAJava    

ncsa.cv.profiles

PythonOpenCVLinuxInna
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva/browse
/opencv
Liana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)
, Constantinos
LSVA integrated   
Windows
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva-integrated/browse
/cellprofiler
Liana
ncsa.
cellprofiler.fly
movieslicePython  
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
movieslice/browse
/cellprofiler
Liana
Sandeep
mri2mesh
ncsa.cellprofiler.human
Pythonpymedici, subprocess, logging, os, numpy, shutil, zipfile 
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
mri/browse/
cellprofiler
mri2mesh
Liana
Marcus
msc-ChemCBCExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 

ncsa.cellprofiler.silvercomet

Python Windows 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler

ncsa.cellprofiler.speckle

Python Windows
ChemCBCExtractor
Liana
Yan
msc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofilerLiana

ncsa.cellprofiler.trackobject

Python Windows
IsletExtractorYan
msc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler
MonitorExtractor
Liana
Yan
ncsa.
cellprofiler
msc.
tumor
dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo 
Windows
 
 
not usedhttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofilerLiana

ncsa.cellprofiler.yeast

Python Windows
OldMonitorExtractorAshwini
msc-PhenotypeExtractorPython

requests, pika, openpyxl, xlrd, pymongo

Linux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
msc/browse/
cellprofiler
PhenotypeExtractor
Liana
Yan
ncsa.
image
nlp.
sphog
SNLP
Python
Java 
Matlab, mnist-sphog
Stanford CoreNLP tool, java, maven 
Linux
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
handwritten/HandwrittenNumbers 
SNLP/SNLPExtractorLiana
  • ncsa.image.caltech101
      
ncsa.
bisque.histogram (notes: disabled)
nlp.tikaPython Tika project page, pymedici
Linux
  
 
https://opensource.ncsa.
bisque.metadata (notes: disabled)
illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/tikaLiana
person-detectorPython MATLAB, FFMPEG, requests and pika
Python Linux
   
census-section-segmentorJava 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-detector/browse/pythonSandeep
ncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika  
Linux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
person-tracking/browse/
census
python
Liana, Inna
Sandeep
ncsa.cv.river PythonOpenCV (python), convert (from imagemagick), and Gdal
terra.plantcvPython

pika
requests
wheel

  
Linux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
plantcv/browse
/riverLiana
Yan
medici_PTM_thumbnailsJava   
ncsa.geo.shpExtractorPythongdalLinux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
ptm/browse
Jong Lee
/PTMThumbnailExtractorConstantinos
medici_PTM_metadataJava   
ncsa.geo.tiffExtractorPythongdalLinux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
ptm/browse
Jong Lee

ncsa.image.ponddetect

PythonMatlabLinux
/PTMMetadataExtractorConstantinos

Name not clear

PtmMetadata(?)

Java    
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
feature_detectionMarcus, Ankit
PTMMetadataConstantinos
medici_ptm_mapsJava    
ncsa.image.humanprefPythonMatlabLinuxMarcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
humanpref
PTMMapsExtractor
Marcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux
Constantinos
medici_ptm_3dJava    
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
ptm/browse/
greenrouteMarcus
PTM3DExtractorConstantinos
medici_images_ptmJava  

ncsa.image.knn_numerals

PythonOpenCVLinuxMarcus Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinuxMarcus Marcusncsa.nlp.simplelanguagePythonnumpy
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
ptm/browse/
SimpleLanguage
ImagesPTMExtractor
Lianancsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

Constantinos

extractors-rabbitmq

(look like examples)

   
 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
rabbitmq/browse
/SimpleSummaryLianancsa.nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven
 
Name not clear extractors-seabird/Scala    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
seabird/browse
/SNLP/SNLPSentimentExtractorLiana, Marcus(?)ncsa.nlp.wordtablesPython requestspikawin32com
Luigi
medici_3d_x3d (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
3d/browse/
WordTablesExtractorLiana
ObjJSONExtractorConstantinos
medici_3d_obj_merger (one of extractors-3d)Java
                
    
 NOT DEPLOYED 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-3d/browse/OBJMergerExtractorConstantinos
medici_oni (one of extractors-3d)Java
 
    https://opensource.ncsa.
image.digitpy (notes: not in the Wiki page)
illinois.edu/bitbucket/projects/CATS/repos/extractors-3d/browse/OniExtractorConstantinos
medici_ply_obj (one of extractors-3d)Java  
Pythonopencv
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse
/handwritten
/
SimpleDigitPython
PlyObjExtractor
 
Constantinos
medici_3d_metadata (one of extractors-3d) Java  
ncsa.cv.pdfimages (not in the wiki page) pdfimages, from poppler-utils
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse/
poppler 
ThreeDMetadataExtractorConstantinos
medici_x3d_html (one of extractors-3d) Java   
ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS 
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
3d/browse/
vlfeat
X3DhtmlExtractor
 
Constantinos
ncsa.arcgis.landsat7mosaic
dbpedia
Python
 Natural Language Toolkit (NLTK) and rdflib. 
ArcGISWindowsNo
Luigi Marini
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
dbpedia
bd-cz/browse/ndviextractor
Luigi Marini
Smruti
ncsa.arcgis.floodplain
digest
PythonArcGIS
 
Windows
 
Nohttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
digest
bd-cz/browse
 ncsa.image.geotiffPython

GDAL, Cython, numpy,
pygeoprocessing,
pika,
requests

Linux
/terex_floodplain/config.pySmruti
medici_bookJava    
Rui
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geotiff
books/browse
Rui, Mostafa Elag
/BookPreviewExtractorTheerasit Issaranon
medici_image_pyramidJava 
ncsa.hpcPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
hpc
books/browse
 
/ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
shebook
LSVA
Java    

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

lsva

books/browse/SheBookPreviewExtractor/src/BookPreviewExtractor

 

LSVA integrated

    

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

lsva-integrated

books/browse

 

/SheBookPreviewExtractor/src/bookpreviewextractor

Theerasit Issaranon
lsva-ceddJava 
ncsa.movieslicePython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
movieslice
cedd/browse
Sandeep
Constantinos
mri2mesh
ncsa.cinemetricsPython
pymedici, subprocess, logging, os, numpy, shutil, zipfile
    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
mri
cinemetrics/browse
/mri2mesh
Marcus
Constantinos
ncsa.image.metadata
msc-ChemCBCExtractor
Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
core/browse
/ChemCBCExtractorYanmsc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
/image/metadataMax. Rob
ncsa.debod.segmentor     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
msc
cellsegmentor/browse
/IsletExtractorYanmsc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
 
ncsa.image.dmp     

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

msc

debod/browse

/MonitorExtractorYan

ncsa.msc.dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo  

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

msc

dmp/browse

/OldMonitorExtractorAshwinimsc-PhenotypeExtractorPythonrequests, pika, openpyxl, xlrd, pymongo

 
ncsa.image.sphog.debod     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
msc
handwrittendecimals/browse
/PhenotypeExtractor
Yan
 

ncsa.

nlp.SNLPJava Stanford CoreNLP tool, java, maven

image.iarp_remove_circle

     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/
extractors-nlp
image_fetcher/browse/
SNLP
extractors/
SNLPExtractor
remove_circle
Liana
Marcus
ncsa.
nlp
cv.
tika
meangrey 
Python
  
Tika project page, pymedici
  https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/
extractors-nlp
image_fetcher/browse/
tikaLiana                    
extractors/mean_greyMarcus