Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for the refactoring of the existing extractors. The original wiki page Hosted VMs is still used for the deployments. 

As we figure out who's working on what, please start with the following steps for the extractor(s) you chose:

Steps to take for every extractor in this list:

  1. Docker containers
  2. JSONLD
  3. Extractor info registration
  4. Use pyclowder (for python extractors)
  5. Add status messages to all extractors and fix level granularity
    1. Make status constants (DONE, ERROR)
    2. Arcgis multiprocessing extractor
  6. Register on on demand execution queues
    1. Add on demand key binding to configuration file: messageType = "*.file.text.plain", "extractors."+extractorName
  7. Standardize around python logging
    1. Figure out what to log and what format to follow
  8. Add logstash to docker compose
  9. Add sample input/ouput to git repository
  10. Add icon for tools catalog to git repository
  11. Add entry to Tools catalog, with icon

 

ID (Extractor Name from config file,

same as queue name)

Programming

Language

SoftwareOSAssigned ToLink to repoWho wrote or worked on the code

same as queue name)

Programming Language

SoftwareOSCan be Dockerized?Assigned ToRepoAuthor
DEPLOYED
ncsa.image.ocrPythonTesseractLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/ocr 

ncsa.cv.faces

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.eyes

PythonOpenCVLinux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.closeups

DEPLOYED      ncsa.image.ocrPythonTesseractLinuxRuiocr ncsa.cv.faces

PythonOpenCVLinux
Inna (may be?)
 Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.cv.

eyes

profiles

PythonOpenCVLinux 
Inna
Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/opencvLiana

ncsa.

cv

cellprofiler.

closeups

fluorescentcomet

Pythonpymedici (question)
OpenCV
Windows
Linux
No
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
opencv
cellprofilerLiana

ncsa.

cv

cellprofiler.

profiles

fly

Python 
OpenCV
Windows
Linux
No
Inna
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
opencv
cellprofilerLiana

ncsa.cellprofiler.

fluorescentcomet

human

Python
pymedici (question)
 WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.

fly

silvercomet

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.

human

speckle

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.

silvercomet

trackobject

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.

speckle

tumor

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.cellprofiler.

trackobject

yeast

Python WindowsNo https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/cellprofilerLiana

ncsa.

cellprofiler

image.

tumor

sphog

Python Matlab, mnist-sphog 
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse
/cellprofiler
/handwritten/HandwrittenNumbers 

ncsa.image.caltech101

      
ncsa.bisque.histogram (notes: disabled)Python Linux    
Liana
ncsa.
cellprofiler.yeast
bisque.metadata (notes: disabled)Python Linux    
census-section-segmentorJava
Python
 
Windows
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
cellprofiler
censusLiana, Inna
ncsa.
image
cv.
sphog
river
Python Matlab, mnist-sphog 
 PythonOpenCV (python), convert (from imagemagick), and GdalLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/
handwritten/HandwrittenNumbers
river
 
Liana
ncsa.
image.caltech101      ncsa.bisque.histogram (notes: disabled)Python Linux   ncsa.bisque.metadata (notes: disabled)
geo.shpExtractorPython
 
gdalLinux 
  
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-geo/browseJong Lee
ncsa.geo.tiffExtractorPythongdal
census-section-segmentorJava 
Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
geo/browse
/census
Jong Lee
Liana, Inna
ncsa.
cv
image.
river
geotiff
 PythonOpenCV (python), convert (from imagemagick), and Gdal
Python

GDAL, Cython, numpy,
pygeoprocessing

Linux Ruihttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
geotiff/browse
/river
Rui, Mostafa Elag
Liana

ncsa.

geo

image.

shpExtractor

ponddetect

Python
gdal
MatlabLinux 
Jong Lee
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
maps/browse
/feature_detectionMarcus, Ankit
Jong Lee
ncsa.
geo
image.
tiffExtractor
humanprefPython
gdal
MatlabLinux 
Jong Lee
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geo
maps/browse
Jong Leencsa.image.ponddetect
/humanprefMarcus, Ankit

ncsa.xml.greenindexroute, ncsa.csv.greenindexroute

Python
Matlab
OpenCVLinux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-maps/browse/
feature_detection
greenrouteMarcus
, Ankit

ncsa.image.

humanpref

knn_numerals

PythonOpenCVLinux  Marcus

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinux 
PythonMatlabLinuxMarcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
core/browse/audio/
humanpref
speech2textMarcus
, Ankit
ncsa.
xml.greenindexroute, ncsa.csv.greenindexroute
audio.previewPython 
OpenCV
 
Linux
 
Marcus
Innahttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
maps
core/browse/audio/
greenroute
preview
Marcus
 
ncsa.
image
nlp.
knn_numerals
simplelanguagePython
OpenCVLinuxMarcus Marcus
numpy  Inna

ncsa.audio.speech2text

JavaCMU Sphinx, ffmpeg, soxLinuxMarcus
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
core
nlp/browse/
audio/speech2text
SimpleLanguage
Marcus
Liana
ncsa.
audio
nlp.
preview
simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.

Python 

  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
core
nlp/browse/
audio/preview
SimpleSummary
 
Liana
ncsa.nlp.
simplelanguage
SNLPSentiment
Python
Java Stanford CoreNLP tool, java, maven
numpy
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/SNLP/
SimpleLanguage
SNLPSentimentExtractorLiana, Marcus(?)
ncsa.nlp
.simplesummaryPython

Natural Language Toolkit (NLTK) for Python, NLTK Data or at least:

.wordtablesPython requestspikawin32com 
 nltk.corpus,nltk.stem.porter and nltk.tokenize.punkt.
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-nlp/browse/
SimpleSummary
WordTablesExtractorLiana
ncsa.nlp.SNLPSentimentJava Stanford CoreNLP tool, java, maven
siegfriedPython    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
siegfried/browse
/SNLP/SNLPSentimentExtractorLiana, Marcus(?)
Gregory Jansen
ncsa.
nlp
versus.
wordtables
imageJava
Python requestspikawin32com
VersusLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
versus/browse
/WordTablesExtractor
Liana
Kenton, Smruti
ncsa.image.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python 
siegfriedPython
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
siegfried/browseGregory Jansenncsa.versus.imageJavaVersusLinux
core/browse/image/previewRob, Sandeep
ncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    
Smruti Padhy
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
versus
core/browse
/pdf/previewRob
Kenton, Smruti
ncsa.
image
video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-core/browse/
image
video/previewRob
, Sandeepncsa.pdf.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)
NOT DEPLOYED
ncsa.image.digitpyPythonopencv
Python
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
core
cv/browse/
pdf
handwritten/
preview
SimpleDigitPython
Rob
 
ncsa.
video.preview (note: check if really deployed. there is an extractor in Hosted VMs list with a similar name.)
cv.pdfimages pdfimages, from poppler-utils
Python
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
core
cv/browse/
video/previewRob
poppler 
   
ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS 
 
  
          
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-cv/browse/vlfeat 
dbpediaPython Natural Language Toolkit (NLTK) and rdflib.
 
  
 NOT DEPLOYED      ncsa.image.digitpyPython
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-dbpedia/browse
digestPython  
opencv
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
digest/browse
/handwritten/SimpleDigitPython
 
ncsa.
cv.pdfimages
hpcPython 
pdfimages, from poppler-utils
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
hpc/browse
/poppler ncsa.cv.caltech101PythonMatlab and VLFeat 64-bit Mac OS 
LSVAJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
cv
lsva/browse
/vlfeatPython
Liana, Constantinos
LSVA integrated 
dbpedia
  
Natural Language Toolkit (NLTK) and rdflib.
 
Luigi Marini
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-lsva-
dbpedia
integrated/browse
Luigi Marini
ncsa.movieslice
digest
Python    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
digest
movieslice/browse
 ncsa.image.geotiff
Sandeep
mri2meshPython
GDAL, Cython
pymedici, subprocess, logging, os, numpy,

pygeoprocessing
shutil, zipfile
pika,
 
requestsRui
 
Linux
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
geotiff
mri/browse
Rui, Mostafa Elag
/mri2meshMarcus
msc-ChemCBCExtractor
ncsa.hpc
Pythonrequests, pika, openpyxl, xlrd, pymongoLinux 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
hpc
msc/browse
 LSVA
/ChemCBCExtractorYan
msc-IsletExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux
Java
 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva
msc/browse
 LSVA integrated
/IsletExtractorYan
msc-MonitorExtractorPythonrequests, pika, openpyxl, xlrd, pymongoLinux
 
 
 
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
lsva-integrated
msc/browse/MonitorExtractor
 
Yan
ncsa.msc.
movieslice
dailymonitorPythonrequests, pika, openpyxl, xlrd, pymongo  
 
not usedhttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
movieslice
msc/browse/OldMonitorExtractor
Sandeep
Ashwini
mri2mesh
msc-PhenotypeExtractorPython
pymedici

requests,

subprocess

pika,

logging, os, numpy, shutil, zipfile

openpyxl, xlrd, pymongo

Linux https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
mri
msc/browse/
mri2meshMarcus
PhenotypeExtractorYan
ncsa.nlp.SNLPJava Stanford CoreNLP tool, java, maven
msc-ChemCBCExtractorPython
 
Linux
 
Yan
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
nlp/browse/SNLP/
ChemCBCExtractor
SNLPExtractor
Yan
Liana
ncsa.nlp.tika
msc-IsletExtractor
Python
requests, pika, openpyxl, xlrd, pymongoLinux
 Tika project page, pymedici  
Yan
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
nlp/browse/
IsletExtractor
tika
Yan
Liana
msc
person-
MonitorExtractor
detectorPython MATLAB, FFMPEG, requests
, pika, openpyxl, xlrd, pymongo
and pika   
LinuxYan
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
msc
detector/browse/
MonitorExtractor
python
Yan
Sandeep
ncsa.
msc.dailymonitor
person-trackerPythonpython, MATLAB, FFMPEG requests
, pika, openpyxl, xlrd, pymongo
and pika  
not used
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-person-
msc
tracking/browse/
OldMonitorExtractor
python
Ashwini
Sandeep
terra.plantcv
msc-PhenotypeExtractor
Python

pika
requests

, pika, openpyxl, xlrd, pymongo


wheel

   
LinuxYan
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
msc
plantcv/browse
/PhenotypeExtractor
Yan
ncsa.nlp.SNLP
medici_PTM_thumbnailsJava  
Stanford CoreNLP tool, java, maven
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
ptm/browse/
SNLP/SNLPExtractorLianancsa.nlp.tikaPython Tika project page, pymedici
PTMThumbnailExtractorConstantinos
medici_PTM_metadataJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
nlp
ptm/browse/
tikaLianaperson-detectorPython MATLAB, FFMPEG, requests and pika
PTMMetadataExtractorConstantinos

Name not clear

PtmMetadata(?)

Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
person-detector
ptm/browse/
pythonSandeepncsa.person-trackerPythonpython, MATLAB, FFMPEG requests and pika
PTMMetadataConstantinos
medici_ptm_mapsJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
person-tracking
ptm/browse/
python
PTMMapsExtractor
Sandeep
Constantinos
medici_ptm_3dJava  
terra.plantcvPythonpika
requests
wheel
  https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
plantcv
ptm/browse/PTM3DExtractor
Yan
Constantinos
medici_
PTM
images_
thumbnails
ptmJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-ptm/browse/
PTMThumbnailExtractorConstantinos
ImagesPTMExtractorConstantinos

extractors-rabbitmq

(look like examples)

  
medici_PTM_metadataJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
rabbitmq/browse
/PTMMetadataExtractor
Constantinos
 
Name not clear

PtmMetadata(?)

extractors-seabird/Scala 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
seabird/browse
/PTMMetadata
Constantinos
Luigi
medici_
ptm_maps
3d_x3d (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTMMapsExtractor
ObjJSONExtractorConstantinos
medici
_ptm_3d
_3d_obj_merger (one of extractors-3d)Java 
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
PTM3DExtractor
OBJMergerExtractorConstantinos
medici_
images_ptm
oni (one of extractors-3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
ptm
3d/browse/
ImagesPTMExtractor
OniExtractorConstantinos
medici_ply_obj (one of extractors-
rabbitmq(look like examples)
3d)Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
rabbitmq
3d/browse/PlyObjExtractor
 
Constantinos
medici_3d_metadata (one of extractors-3d) Java 
Name not clear extractors-seabird/Scala
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
seabird
3d/browse/ThreeDMetadataExtractor
Luigi
Constantinos
medici_
3d
x3d_
x3d
html (one of extractors-3d) Java    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-3d/browse/
ObjJSONExtractor
X3DhtmlExtractorConstantinos
medici_3d_obj_merger (one of extractors-3d)Java  
ncsa.arcgis.landsat7mosaicPythonArcGISWindowsNohttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-cz/browse/ndviextractorSmruti
ncsa.arcgis.floodplainPythonArcGISWindowsNohttps://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-bd-
3d
cz/browse
/OBJMergerExtractor
/terex_floodplain/config.pySmruti
Constantinos
medici_
oni (one of extractors-3d)
bookJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d
books/browse/
OniExtractor
BookPreviewExtractor
Constantinos
Theerasit Issaranon
medici_
ply_obj (one of extractors-3d)
image_pyramidJava    https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
3d
books/browse/
PlyObjExtractorConstantinos
ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
shebookJava 
medici_3d_metadata (one of extractors-3d) Java
   

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

3d

books/browse

/ThreeDMetadataExtractorConstantinosmedici_x3d_html (one of extractors-3d) Java  

/SheBookPreviewExtractor/src/BookPreviewExtractor

 

https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-

3d

books/browse

/X3DhtmlExtractorConstantinosncsa.arcgis.landsat7mosaicPythonArcGISWindows

/SheBookPreviewExtractor/src/bookpreviewextractor

Theerasit Issaranon
lsva-ceddJava    
Smruti Padhy
https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
bd-cz
cedd/browse
/ndviextractor
Smruti
Constantinos
ncsa.
arcgis.floodplain
cinemetricsPython 
ArcGIS
 
Windows
 
Smruti Padhy
 https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
bd-cz/browse/terex_floodplain/config.pySmruti
cinemetrics/browseConstantinos
ncsa.image.metadataPython 
medici_bookJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/extractors-
books
core/browse
/BookPreviewExtractorTheerasit Issaranonmedici_image_pyramid
/image/metadataMax. Rob
ncsa.debod.segmentor  
Java
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
books
cellsegmentor/browse
/ImagePreviewPyramidExtractor-shebookTheerasit Issaranon
 
ncsa.image.dmp  
shebookJava
   

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

books

debod/browse

/SheBookPreviewExtractor/src/BookPreviewExtractor 

https://opensource.ncsa.illinois.edu/bitbucket/projects/

CATS

DEBOD/repos/extractors-

books

dmp/browse

/SheBookPreviewExtractor/src/bookpreviewextractor

 
ncsa.image.sphog.debod  
Theerasit Issaranonlsva-ceddJava
   https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
DEBOD/repos/extractors-
cedd
handwrittendecimals/browse
Constantinos
 

ncsa

.cinemetricsPython

.image.iarp_remove_circle

     https://opensource.ncsa.illinois.edu/bitbucket/projects
/CATS/repos
/IARP/repos/image_fetcher/browse/extractors
-cinemetrics
/
browse
remove_circle
Constantinos
Marcus
ncsa.
image
cv.
metadata
meangrey 
Python
    https://opensource.ncsa.illinois.edu/bitbucket/projects/
CATS
IARP/repos/
extractors-core
image_fetcher/browse/
image
extractors/
metadataMax. Rob             
mean_greyMarcus