Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments. 

 

As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:

Steps to take for every extractor in this list:

  1. Docker containers
  2. JSONLD
  3. Extractor info registration
  4. Use pyclowder (for python extractors)
  5. Add status messages to all extractors and fix level granularity
    1. Make status constants (DONE, ERROR)
    2. Arcgis multiprocessing extractor
  6. Register on on demand execution queues
    1. Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
  7. Standardize around python logging
    1. Figure out what to log and what format to follow
  8. Add logstash to docker compose
  9. Add sample input/ouput to git repository
  10. Add icon for tools catalog to git repository
  11. Add entry to Tools catalog, with icon

 

ID (Extractor Name from config file, same as queue name)

Programming Language

SoftwareOSCan be Dockerized?Assigned ToRepoAuthor
DEPLOYED
ncsa.image.ocrPythonTesseractLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr 

ncsa.cv.faces

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.eyes

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.profiles

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.fly

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.human

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.silvercomet

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.speckle

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.trackobject

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.tumor

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.yeast

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.image.sphog

Python Matlab, mnist-sphog Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/HandwrittenNumbers 

ncsa.image.caltech101

      
ncsa.bisque.histogram (notes: disabled)Python Linux    
ncsa.bisque.metadata (notes: disabled)Python Linux    
census-section-segmentorJava Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/censusLiana, Inna
ncsa.cv.river PythonOpenCV (python), convert (from imagemagick), and GdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/riverLiana
ncsa.geo.shpExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.geo.tiffExtractorPythongdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.image.geotiffPython

GDAL, Cython, numpy,
pygeoprocessing

Linux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geotiff/browseRui, Mostafa Elag

ncsa.image.ponddetect

PythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/feature_detectionMarcus, Ankit
ncsa.image.humanprefPythonMatlabLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/humanprefMarcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/greenrouteMarcus

ncsa.image.knn_numerals

PythonOpenCVLinux  Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/speech2textMarcus
ncsa.audio.previewPython   Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/audio/preview 
ncsa.nlp.simplelanguagePythonnumpy  Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SimpleLanguageLiana
ncsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

  

ID (Extractor Name from config file,

same as queue name)

Programming

Language

SoftwareOSAssigned ToLink to repoWho wrote or worked on the codeDEPLOYED      ncsa.image.ocrPythonTesseractLinuxRuiocr 

ncsa.cv.faces

PythonOpenCVLinuxInna (may be?)
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
opencv
SimpleSummaryLiana
ncsa.
cv.eyesPythonOpenCVLinuxInna
nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven  
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/SNLP/
opencv
SNLPSentimentExtractorLiana, Marcus(?)
ncsa.
cv
nlp.
closeups
wordtablesPython
OpenCVLinux
 requestspikawin32com   
Inna
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
opencv
WordTablesExtractorLiana
ncsa.cv.profiles
siegfriedPython
OpenCV
  
Linux
 
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
siegfried/browse
/opencv
Liana
Gregory Jansen
ncsa.
cellprofiler
versus.
fluorescentcomet
image
Python
Java
pymedici (question)
Versus
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
versus/browse
/cellprofiler
Liana
Kenton, Smruti
ncsa
.cellprofiler.fly
.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python 
Windows
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
core/browse/image/
cellprofiler
preview
Liana
Rob, Sandeep
ncsa
.cellprofiler.human
.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python 
Python
 
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
core/browse/
cellprofiler
pdf/preview
Liana
Rob
ncsa.video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python   

ncsa.cellprofiler.silvercomet

Python Windows
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
core/browse/
cellprofiler
video/previewRob
NOT DEPLOYED
Liana
ncsa.
cellprofiler
image.
speckle
digitpyPythonopencv 
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/handwritten/
cellprofiler
SimpleDigitPython
Liana
 
ncsa.
cellprofiler
cv.
trackobject
pdfimages pdfimages, from poppler-utils
Python
 
Windows
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
cellprofiler
poppler
Liana
 
ncsa.
cellprofiler
cv.
tumor
caltech101PythonMatlab and VLFeat 
64-bit Mac OS  
Windows
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
cellprofiler
vlfeat
Lianancsa.cellprofiler.yeast
 
dbpediaPython 
Natural Language Toolkit (NLTK) and rdflib. 
Windows
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
dbpedia/browse
/cellprofiler
Liana
digest
ncsa.image.sphog
Python
 Matlab, mnist-sphog
  
Linux
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv/browse/handwritten/HandwrittenNumbers 
  • ncsa.image.caltech101
      ncsa.bisque.histogram (notes: disabled)Python Linux  
digest/browse 
ncsa.
bisque.metadata (notes: disabled)
hpcPython 
Linux
   
census-section-segmentor
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-hpc/browse
LSVAJava 
Linux
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva/browse
/census
Liana
, Innancsa.cv.river PythonOpenCV (python), convert (from imagemagick), and GdalLinux
, Constantinos
LSVA integrated    
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva-integrated/browse
/river
Liana
ncsa.
geo.shpExtractor
movieslicePython 
gdal
 
Linux
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
movieslice/browse
Sandeep
mri2meshPythonpymedici, subprocess, logging, os, numpy, shutil, zipfile  
Jong Leencsa.geo.tiffExtractorPythongdalLinux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
mri/browse/mri2mesh
Jong Lee
Marcus
msc-ChemCBCExtractor
ncsa.image.ponddetect
Python
Matlab
requests, pika, openpyxl, xlrd, pymongoLinux
Marcus
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
msc/browse/
feature_detection
ChemCBCExtractorYan
msc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
Marcus, Ankitncsa.image.humanprefPythonMatlabLinuxMarcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
msc/browse/
humanpref
IsletExtractor
Marcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

PythonOpenCVLinux
Yan
msc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
msc/browse/
greenroute
MonitorExtractor
Marcus
Yan
ncsa.
image
msc.
knn_numerals

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinuxMarcus Marcusncsa.nlp.simplelanguagePythonnumpy
dailymonitorPython
OpenCVLinuxMarcus Marcus
requests, pika, openpyxl, xlrd, pymongo  not usedhttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
msc/browse/
SimpleLanguage
OldMonitorExtractor
Lianancsa.nlp.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

 
Ashwini
msc-PhenotypeExtractorPython

requests, pika, openpyxl, xlrd, pymongo

Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
msc/browse/
SimpleSummary
PhenotypeExtractor
Liana
Yan
ncsa.nlp.
SNLPSentiment
SNLPJava Stanford CoreNLP tool, java, maven   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SNLP/
SNLPSentimentExtractor
SNLPExtractorLiana
, Marcus(?)
ncsa.nlp.
wordtables
tikaPython 
requests
Tika project page, pymedici 
pikawin32com
 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/
WordTablesExtractor
tikaLiana
siegfried
person-detectorPython MATLAB, FFMPEG, requests and pika   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
siegfried
detector/browse
Gregory Jansenncsa.versus.image
/pythonSandeep
ncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
versus/browseKenton, Smruti
tracking/browse/pythonSandeep
terra.plantcvPython

pika
requests
wheel

                     NOT DEPLOYED  

   
 
https://opensource.ncsa.
image.digitpy (notes: not in the Wiki page)
illinois.edu/bitbucket/projects/CATS/repos/extractors-plantcv/browseYan
medici_PTM_thumbnailsJava  
Pythonopencv
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
ptm/browse/
handwritten/SimpleDigitPython ncsa.cv.pdfimages (not in the wiki page) pdfimages, from poppler-utils
PTMThumbnailExtractorConstantinos
medici_PTM_metadataJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
ptm/browse/
poppler ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS 
PTMMetadataExtractorConstantinos

Name not clear

PtmMetadata(?)

Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
ptm/browse/
vlfeat dbpediaPython Natural Language Toolkit (NLTK) and rdflib. 
PTMMetadataConstantinos
medici_ptm_mapsJava    
Luigi Marini
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
dbpedia
ptm/browse
Luigi Marini
/PTMMapsExtractorConstantinos
medici_ptm_3dJava 
digestPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
digest
ptm/browse/PTM3DExtractor
 ncsa.image.geotiffPython

GDAL, Cython, numpy,
pygeoprocessing,
pika,
requests

Linux
Constantinos
medici_images_ptmJava    
Rui
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geotiff
ptm/browse
Rui, Mostafa Elag
/ImagesPTMExtractorConstantinos

extractors-rabbitmq

(look like examples)

  
ncsa.hpcPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
hpc
rabbitmq/browse 
LSVA
Name not clear extractors-seabird/Scala 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva
seabird/browse
 
Luigi
medici_3d_x3d (one of extractors-3d)Java
LSVA integrated
    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva-integrated/browse
3d/browse/ObjJSONExtractorConstantinos
medici_3d_obj_merger (one of extractors-3d)Java 
 ncsa.movieslicePython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
movieslice
3d/browse
Sandeepmri2meshPythonpymedici, subprocess, logging, os, numpy, shutil, zipfile
/OBJMergerExtractorConstantinos
medici_oni (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
mri
3d/browse/
mri2meshMarcus
OniExtractorConstantinos
medici_ply_obj (one of extractors-3d)Java 
msc-ChemCBCExtractorPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
3d/browse/
ChemCBCExtractor
PlyObjExtractor
Yanmsc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
Constantinos
medici_3d_metadata (one of extractors-3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
3d/browse/
IsletExtractor
ThreeDMetadataExtractor
Yanmsc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
Constantinos
medici_x3d_html (one of extractors-3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
3d/browse/
MonitorExtractor
X3DhtmlExtractor
Yan
Constantinos
ncsa.
msc
arcgis.
dailymonitor
landsat7mosaicPython
requests, pika, openpyxl, xlrd, pymongo 
ArcGISWindowsNo
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
bd-cz/browse/
OldMonitorExtractorAshwinimsc-PhenotypeExtractorPython

requests, pika, openpyxl, xlrd, pymongo

 
ndviextractorSmruti
ncsa.arcgis.floodplainPythonArcGISWindowsNohttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-
msc
cz/browse
/PhenotypeExtractorYan
/terex_floodplain/config.pySmruti
medici_book
ncsa.nlp.SNLP
Java 
Stanford CoreNLP tool, java, maven
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
books/browse/
SNLP/SNLPExtractorLianancsa.nlp.tikaPython Tika project page, pymedici
BookPreviewExtractorTheerasit Issaranon
medici_image_pyramidJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
books/browse/
tikaLianaperson-detectorPython MATLAB, FFMPEG, requests and pika
ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
shebookJava    

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

person-detector/browse/pythonSandeepncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika

books/browse/SheBookPreviewExtractor/src/BookPreviewExtractor

 

 

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

person-tracking

books/browse/SheBookPreviewExtractor/src/

python

bookpreviewextractor

Sandeep
Theerasit Issaranon
lsva-ceddJava  
terra.plantcvPythonpika
requests
wheel
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
plantcv
cedd/browseConstantinos
Yanmedici_PTM_thumbnails
ncsa.cinemetricsPython 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
cinemetrics/browse
/PTMThumbnailExtractor
Constantinos
medici_PTM_metadata
ncsa.image.metadataPython 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
core/browse/image/
PTMMetadataExtractor
metadata
Constantinos

Name not clear

PtmMetadata(?)

Max. Rob
ncsa.debod.segmentor  
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
ptm
cellsegmentor/browse
/PTMMetadataConstantinos
 
ncsa.image.dmp  
medici_ptm_mapsJava
   

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

ptm

debod/browse

/PTMMapsExtractorConstantinosmedici_ptm_3dJava   

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

ptm

dmp/browse

/PTM3DExtractorConstantinos

 
ncsa.image.sphog.debod  
medici_images_ptmJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
ptm
handwrittendecimals/browse
/ImagesPTMExtractor
Constantinos

extractors-rabbitmq

 

ncsa.image.iarp_remove_circle

 
(look like examples)
    https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/image_fetcher/browse/extractors
-rabbitmq/browse
/remove_circleMarcus
ncsa.cv.meangrey 
Name not clear extractors-seabird/Scala
    https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/image_fetcher/browse/extractors
-seabird
/
browse
mean_grey
Luigi
Marcus