Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments. 

 

As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:

Steps to take for every extractor in this list:

  1. Docker containers
  2. JSONLD
  3. Extractor info registration
  4. Use pyclowder (for python extractors)
  5. Add status messages to all extractors and fix level granularity
    1. Make status constants (DONE, ERROR)
    2. Arcgis multiprocessing extractor
  6. Register on on demand execution queues
    1. Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
  7. Standardize around python logging
    1. Figure out what to log and what format to follow
  8. Add logstash to docker compose
  9. Add sample input/ouput to git repository
  10. Add icon for tools catalog to git repository
  11. Add entry to Tools catalog, with icon

 

ID (Extractor Name from config file, same as queue name)

Programming Language

SoftwareOSCan be Dockerized?Assigned ToRepoAuthor
DEPLOYED
ncsa.image.ocrPythonTesseractLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr 

ncsa.cv.faces

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.eyes

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.profiles

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cellprofiler.fluorescentcomet

Pythonpymedici (question)WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.fly

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.human

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.silvercomet

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.speckle

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.trackobject

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.tumor

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.yeast

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.image.sphog

Python Matlab, mnist-sphog Linux 

ID (Extractor Name from config file,

same as queue name)

Programming

Language

SoftwareOSAssigned ToLink to repoWho wrote or worked on the codeDEPLOYED      ncsa.image.ocrPythonTesseractLinuxRuiocr 

ncsa.cv.faces

PythonOpenCVLinuxInna (may be?)
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv/browse/opencvLiana
cv/browse/handwritten/HandwrittenNumbers 

ncsa.image.caltech101

      
ncsa.bisque.histogram (notes: disabled)Python Linux    
ncsa.bisque.metadata (notes: disabled)Python Linux    
census-section-segmentorJava Linux 

ncsa.cv.eyes

PythonOpenCVLinuxInnahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

PythonOpenCVLinuxInna
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
opencv
censusLiana, Inna
ncsa.cv.
profiles
river
Python
 PythonOpenCV (python), convert (from imagemagick), and GdalLinux 
Inna
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
opencv
riverLiana
ncsa.
cellprofiler
geo.
fluorescentcomet
shpExtractorPython
pymedici (question)
gdal
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
geo/browse
/cellprofiler
Liana
Jong Lee
ncsa.
cellprofiler
geo.
fly
tiffExtractorPython
 
gdal
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
geo/browse
/cellprofiler
Liana
Jong Lee
ncsa.
cellprofiler
image.
human
geotiffPython
 

GDAL, Cython, numpy,
pygeoprocessing

Linux
Windows
 Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
geotiff/browse
/cellprofiler
Rui, Mostafa Elag
Liana

ncsa.

cellprofiler

image.

silvercomet

ponddetect

Python
 
Matlab
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
maps/browse/
cellprofiler
feature_detectionMarcus, Ankit
Liana
ncsa.
cellprofiler
image.
speckle
humanprefPython
 
Matlab
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
maps/browse/
cellprofiler
humanpref
Liana
Marcus, Ankit

ncsa.xml.greenindexroute, ncsa.

cellprofiler

csv.

trackobject

greenindexroute

Python
 
OpenCV
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv/browse/cellprofiler
maps/browse/greenrouteMarcus

ncsa.image.knn_numerals

PythonOpenCVLinux  Marcus
Liana

ncsa.

cellprofiler

audio.

tumor

speech2text

Python Windows
JavaCMU Sphinx, ffmpeg, soxLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
core/browse/audio/
cellprofiler
speech2text
Liana
Marcus
ncsa.
cellprofiler
audio.
yeast
previewPython 
Windows
  Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
core/browse/audio/
cellprofiler
preview
Liana
 
ncsa.
image
nlp.
sphog
simplelanguagePythonnumpy 
Matlab, mnist-sphog 
 
Linux
Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
handwritten/HandwrittenNumbers
SimpleLanguage
 
Liana
ncsa.
image
nlp.
caltech101
simplesummary
      ncsa.bisque.histogram (notes: disabled)Python Linux   ncsa.bisque.metadata (notes: disabled)Python Linux   census-section-segmentorJava Linux 
Python

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/
census
SimpleSummaryLiana
, Inna
ncsa.
cv
nlp.
river
SNLPSentiment
 Python
Java
OpenCV (python), convert (from imagemagick), and GdalLinux
 Stanford CoreNLP tool, java, maven  
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
nlp/browse/SNLP/
river
SNLPSentimentExtractorLiana, Marcus(?)
ncsa.
geo
nlp.
shpExtractor
wordtablesPython
gdal
 requestspikawin32com  
Linux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
nlp/browse/WordTablesExtractor
Jong Leencsa.geo.tiffExtractor
Liana
siegfriedPython 
gdal
 
Linux
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
siegfried/browse
Jong Lee
Gregory Jansen
ncsa.versus.image
.ponddetect
Python
Java
Matlab
VersusLinux 
Marcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
versus/browse
/feature_detection
Marcus
Kenton,
Ankit
Smruti
ncsa.image.
humanpref
preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python
Matlab
  
Linux
 
Marcus
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
core/browse/image/
humanpref
preview
Marcus
Rob,
Ankit
Sandeep
ncsa.
xml.greenindexroute, ncsa.csv.greenindexroute
pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    
PythonOpenCVLinuxMarcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
core/browse
/greenrouteMarcus
/pdf/previewRob

ncsa.image.knn_numerals

PythonOpenCVLinuxMarcus Marcus
ncsa.
audio.speech2textJavaCMU Sphinx, ffmpeg, soxLinuxMarcus Marcus
video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python  
ncsa.nlp.simplelanguagePythonnumpy
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
core/browse/
SimpleLanguage
video/previewRob
NOT DEPLOYED
Liana
ncsa.
nlp
image.
simplesummary
digitpyPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

opencv 
 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
cv/browse/handwritten/
SimpleSummary
SimpleDigitPython
Liana
 
ncsa.
nlp
cv.
SNLPSentiment
pdfimages
Java Stanford CoreNLP tool, java, maven
 pdfimages, from poppler-utils   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
cv/browse/
SNLP/SNLPSentimentExtractorLiana, Marcus(?)
poppler 
ncsa.
nlp
cv.
wordtables
caltech101Python
 requestspikawin32com
Matlab and VLFeat 64-bit Mac OS   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
cv/browse/
WordTablesExtractor
vlfeat
Liana
 
siegfried
dbpediaPython Natural Language Toolkit (NLTK) and rdflib.  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
siegfried
dbpedia/browse
Gregory Jansen
digestPython 
ncsa.versus.imageJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
versus
digest/browse
Kenton, Smruti
 
 
ncsa.hpc
 
Python    
              NOT DEPLOYED 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-hpc/browse
LSVAJava
 
    
ncsa.image.digitpy (notes: not in the Wiki page)Pythonopencv  
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva/browse
/handwritten/SimpleDigitPython ncsa.cv.pdfimages (not in the wiki page) pdfimages, from poppler-utils
Liana, Constantinos
LSVA integrated    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-lsva-
cv
integrated/browse
/poppler
 
ncsa.
cv.caltech101
movieslicePython 
Matlab and VLFeat
 
64-bit Mac OS 
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
movieslice/browse
/vlfeat
 
Sandeep
dbpedia
mri2meshPython
 Natural Language Toolkit (NLTK) and rdflib.
pymedici, subprocess, logging, os, numpy, shutil, zipfile  
Luigi Marini
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
dbpedia
mri/browse/mri2mesh
Luigi Marini
Marcus
msc-ChemCBCExtractor
digest
Pythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
digest
msc/browse/ChemCBCExtractor
 
Yan
msc-IsletExtractor
ncsa.image.geotiff
Python
GDAL
requests,
Cython
pika,
numpy,
pygeoprocessing,
pika,
requests
openpyxl, xlrd, pymongoLinux 
LinuxRui
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geotiffncsa.hpcPython  
msc/browse
Rui, Mostafa Elag
/IsletExtractorYan
msc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
hpc
msc/browse
 LSVA
/MonitorExtractorYan
ncsa.msc.dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo
Java
  
 
not usedhttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva
msc/browse
 LSVA integrated
/OldMonitorExtractorAshwini
msc-PhenotypeExtractorPython

requests, pika, openpyxl, xlrd, pymongo

Linux
 
 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva-integrated
msc/browse/PhenotypeExtractor
 
Yan
ncsa.nlp.
movieslicePython
SNLPJava Stanford CoreNLP tool, java, maven   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
movieslice
nlp/browse/SNLP/SNLPExtractor
Sandeep
Liana
ncsa.nlp.tika
mri2mesh
Python
pymedici, subprocess, logging, os, numpy, shutil, zipfile
 Tika project page, pymedici  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
mri
nlp/browse/
mri2mesh
tika
Marcus
Liana
msc
person-
ChemCBCExtractor
detectorPython MATLAB, FFMPEG, requests and pika   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
msc
detector/browse/
ChemCBCExtractor
python
Yan
Sandeep
msc
ncsa.person-
IsletExtractor
trackerPython
requests, pika, openpyxl, xlrd, pymongo
python, MATLAB, FFMPEG requests and pika   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
msc
tracking/browse/
IsletExtractor
python
Yan
Sandeep
terra.plantcv
msc-MonitorExtractor
Python

pika
requests

, pika, openpyxl, xlrd, pymongo


wheel

   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
plantcv/browse
/MonitorExtractor
Yan
ncsa.msc.dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo
medici_PTM_thumbnailsJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
ptm/browse/
OldMonitorExtractorAshwinimsc-PhenotypeExtractorPythonrequests, pika, openpyxl, xlrd, pymongo
PTMThumbnailExtractorConstantinos
medici_PTM_metadataJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
ptm/browse/
PhenotypeExtractorYanncsa.nlp.SNLPJava Stanford CoreNLP tool, java, maven
PTMMetadataExtractorConstantinos

Name not clear

PtmMetadata(?)

Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
ptm/browse/
SNLP/SNLPExtractorLianancsa.nlp.tikaPython Tika project page, pymedici
PTMMetadataConstantinos
medici_ptm_mapsJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
ptm/browse/
tikaLianaperson-detectorPython MATLAB, FFMPEG, requests and pika
PTMMapsExtractorConstantinos
medici_ptm_3dJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
person-detector
ptm/browse/
pythonSandeepncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika
PTM3DExtractorConstantinos
medici_images_ptmJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
person-tracking
ptm/browse/
python
ImagesPTMExtractor
Sandeep
Constantinos

extractors-rabbitmq

(look like examples)

   
terra.plantcvPythonpika
requests
wheel
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
plantcv
rabbitmq/browse
Yanmedici_PTM_thumbnails
 
Name not clear extractors-seabird/Scala 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
seabird/browse
/PTMThumbnailExtractor
Constantinos
Luigi
medici_
PTM_metadata
3d_x3d (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTMMetadataExtractorPtmMetadata(?
ObjJSONExtractorConstantinos

Name not clear

medici_3d_obj_merger (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTMMetadata
OBJMergerExtractorConstantinos
medici_
ptm_maps
oni (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTMMapsExtractor
OniExtractorConstantinos
medici_
ptm_3d
ply_obj (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTM3DExtractor
PlyObjExtractorConstantinos
medici_
images_ptm
3d_metadata (one of extractors-3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
ImagesPTMExtractor
ThreeDMetadataExtractorConstantinos
medici_x3d_html (one of extractors-
rabbitmq(look like examples)
3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
rabbitmq
3d/browse
 
/X3DhtmlExtractorConstantinos
ncsa.arcgis.landsat7mosaicPythonArcGISWindowsNo
Name not clear extractors-seabird/Scala   
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-
seabird
cz/browse
Luigimedici_3d_x3d (one of extractors-3d)Java  
/ndviextractorSmruti
ncsa.arcgis.floodplainPythonArcGISWindowsNohttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d/browse/ObjJSONExtractorConstantinosmedici_3d_obj_merger (one of extractors-3d)
bd-cz/browse/terex_floodplain/config.pySmruti
medici_bookJava 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d
books/browse/
OBJMergerExtractor
BookPreviewExtractor
Constantinos
Theerasit Issaranon
medici
_oni (one of extractors-3d)
_image_pyramidJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d
books/browse/
OniExtractorConstantinosmedici_ply_obj (one of extractors-3d)
ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
shebookJava 
Java
   

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

3d

books/browse

/PlyObjExtractorConstantinosmedici_3d_metadata (one of extractors-3d) Java 

/SheBookPreviewExtractor/src/BookPreviewExtractor

 

 

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

3d

books/browse/SheBookPreviewExtractor/src/

ThreeDMetadataExtractor

bookpreviewextractor

Constantinosmedici_x3d_html (one of extractors-3d) 
Theerasit Issaranon
lsva-ceddJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d
cedd/browse
/X3DhtmlExtractor
Constantinos
ncsa.
arcgis.landsat7mosaic
cinemetricsPython    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
bd-cz
cinemetrics/browse
/ndviextractor
Smruti
Constantinos
ncsa.
arcgis
image.
floodplain
metadataPython    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
bd-cz
core/browse/
terex_floodplain/config.py
image/metadataMax. Rob
ncsa.debod.segmentor  
Smrutimedici_bookJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
books
cellsegmentor/browse
/BookPreviewExtractor
Theerasit Issaranonmedici_image_pyramid
 
ncsa.image.dmp  
Java
   

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

books

debod/browse

/ImagePreviewPyramidExtractor-shebookTheerasit IssaranonshebookJava   

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

books/browse/SheBookPreviewExtractor/src/BookPreviewExtractor

dmp/browse

 
ncsa.image.sphog.debod     https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
books/browse/SheBookPreviewExtractor/src/bookpreviewextractor
handwrittendecimals/browse 

ncsa.image.iarp_remove_circle

  
Theerasit Issaranonlsva-ceddJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/image_fetcher/browse/extractors
-cedd
/
browse
remove_circle
Constantinos
Marcus
ncsa.cv.
cinemetrics
meangrey 
Python
    https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/image_fetcher/browse/extractors
-cinemetrics
/
browse
mean_grey
Constantinos
Marcus