Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • import pyclowder
  • from pyclowder.extractors import Extractor
  • import pyclowder.files

...etc.

Running a sample extractor

Now that we have our necessary dependencies, we can try running a simple extractor to make sure we've installed things correctly. The wordcount extractor is included with pyClowder 2 and will add metadata to text files when they are uploaded to Clowder.

  1. Go to /pyclowder2/sample-extractors/wordcount/
  2. Run the extractor
    1. python wordcount.py is basic example
    2. If you're running Docker, you'll need to specify the correct RabbitMQ URL because Docker is not localhost: 
      python wordcount.py --rabbitmqURI amqp://guest:guest@<dockerIP>/%2f
    3. You can use python wordcount.py -h to get other commandline options.
  3. When the extractor reports "Starting to listen for messages" you are ready.
  4. Upload a .txt file into Clowder and verify the extractor triggers and metadata is added to the file, e.g.:
    Image Added

Writing an extractor

Extractor events

...

message typetrigger eventmessage payloadexamples
*.file.#when any file is uploaded
  • added file ID
  • added filename
  • destination dataset ID, if applicable

clowder.file.image.png

clowder.file.text.csv

clowder.file.application.json

*.file.image.#

*.file.text.#

...

when any file of the given MIME type is uploaded

(this is just a more specific matching)

  • added file ID
  • added filename
  • destination dataset ID, if applicable
see above
*.dataset.file.addedwhen a file is added to a dataset
  • added file ID
  • dataset ID
  • full list of files in dataset
clowder.dataset.file.added
*.dataset.file.removedwhen a file is removed from a dataset
  • removed file ID
  • dataset ID
  • full list of files in dataset
clowder.dataset.file.removed
*.metadata.addedwhen metadata is added to a file or dataset
  • file or dataset ID
  • the metadata that was added
clowder.metadata.added
*.metadata.removedwhen metadata is removed from a file or dataset
  • file or dataset ID
clowder.metadata.removed

 

common requirements

 

Code Block
languagebash
sudo -s
export RABBITMQ_URL="amqp://guest:guest@localhost:5672/%2F"
export EXTRACTORS_HOME="/home/clowder"
 
apt-get -y install git python-pip
pip install pika requests
 
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git
chown -R clowder.users pyclowder

opencv

 

Code Block
languagebash
apt-get -y install python-opencv opencv-data
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-cv.git
for x in opencv-closeups opencv-eyes opencv-faces opencv-profiles; do
	ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-cv/opencv/$x
	sed -i -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" \
           -e "s#/usr/local/share/OpenCV#/usr/share/opencv#" ${EXTRACTORS_HOME}/extractors-cv/opencv/$x/config.py
    cp ${EXTRACTORS_HOME}/extractors-cv/opencv/$x/*.conf /etc/init
done
chown -R clowder.users extractors-cv

ocr

 

Code Block
languagebash
apt-get -y install tesseract-ocr
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-cv.git
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-cv/ocr
sed -i -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" ${EXTRACTORS_HOME}/extractors-cv/ocr/config.py
cp ${EXTRACTORS_HOME}/extractors-cv/ocr/clowder-ocr.conf /etc/init
chown -R clowder.users pyclowder

audio

 

Code Block
languagebash
apt-get -y install sox libsox-fmt-mp3
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-audio.git
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-audio/preview/
sed -i -e "s#Binary = .*#Binary = '`which sox`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-audio/preview/config.py
cp ${EXTRACTORS_HOME}/extractors-audio/preview/clowder-audio-preview.conf /etc/init
chown -R clowder.users extractors-audio

image

 

...

Code Block
languagebash
apt-get -y install imagemagick
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-image.git
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-image/preview/
sed -i -e "s#imageBinary = .*#imageBinary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-image/preview/config.py
cp ${EXTRACTORS_HOME}/extractors-image/preview/clowder-image-preview.conf /etc/init
chown -R clowder.users extractors-image

pdf

 

Code Block
languagebash
apt-get -y install imagemagick
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-pdf.git
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder /home/clowder/extractors-pdf/preview/
sed -i -e "s#Binary = .*#Binary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-pdf/preview/config.py
cp ${EXTRACTORS_HOME}/extractors-pdf/preview/clowder-pdf-preview.conf /etc/init
chown -R clowder.users extractors-pdf

video

Code Block
languagebash
apt-get -y install libav-tools
cd /home/clowder
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-video.git
ln -s /home/clowder/pyclowder/pyclowder /home/clowder/extractors-video/preview/
sed -i -e "s#Binary = .*#Binary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-video/preview/config.py
cp /home/clowder/extractors-video/preview/clowder-video-preview.conf /etc/init
chown -R clowder.users extractors-video

start extractors

Code Block
languagebash
cd /etc/init
for x in clowder-*.conf; do
  start `basename $x .conf`
done