...
import pyclowder
from pyclowder.extractors import Extractor
import pyclowder.files
...etc.
Running a sample extractor
Now that we have our necessary dependencies, we can try running a simple extractor to make sure we've installed things correctly. The wordcount extractor is included with pyClowder 2 and will add metadata to text files when they are uploaded to Clowder.
- Go to
/pyclowder2/sample-extractors/wordcount/
- Run the extractor
- python wordcount.py is basic example
- If you're running Docker, you'll need to specify the correct RabbitMQ URL because Docker is not localhost:
python wordcount.py --rabbitmqURI amqp://guest:guest@<dockerIP>/%2f
- You can use
python wordcount.py -h
to get other commandline options.
- When the extractor reports "
Starting to listen for messages"
you are ready. - Upload a .txt file into Clowder and verify the extractor triggers and metadata is added to the file, e.g.:
Writing an extractor
Extractor events
...
message type | trigger event | message payload | examples |
---|---|---|---|
*.file.# | when any file is uploaded |
| clowder.file.image.png clowder.file.text.csv clowder.file.application.json |
*.file.image.# *.file.text.# ... | when any file of the given MIME type is uploaded (this is just a more specific matching) |
| see above |
*.dataset.file.added | when a file is added to a dataset |
| clowder.dataset.file.added |
*.dataset.file.removed | when a file is removed from a dataset |
| clowder.dataset.file.removed |
*.metadata.added | when metadata is added to a file or dataset |
| clowder.metadata.added |
*.metadata.removed | when metadata is removed from a file or dataset |
| clowder.metadata.removed |
common requirements
Code Block | ||
---|---|---|
| ||
sudo -s
export RABBITMQ_URL="amqp://guest:guest@localhost:5672/%2F"
export EXTRACTORS_HOME="/home/clowder"
apt-get -y install git python-pip
pip install pika requests
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git
chown -R clowder.users pyclowder |
opencv
Code Block | ||
---|---|---|
| ||
apt-get -y install python-opencv opencv-data
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-cv.git
for x in opencv-closeups opencv-eyes opencv-faces opencv-profiles; do
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-cv/opencv/$x
sed -i -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" \
-e "s#/usr/local/share/OpenCV#/usr/share/opencv#" ${EXTRACTORS_HOME}/extractors-cv/opencv/$x/config.py
cp ${EXTRACTORS_HOME}/extractors-cv/opencv/$x/*.conf /etc/init
done
chown -R clowder.users extractors-cv
|
ocr
Code Block | ||
---|---|---|
| ||
apt-get -y install tesseract-ocr
cd ${EXTRACTORS_HOME}
git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-cv.git
ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-cv/ocr
sed -i -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" ${EXTRACTORS_HOME}/extractors-cv/ocr/config.py
cp ${EXTRACTORS_HOME}/extractors-cv/ocr/clowder-ocr.conf /etc/init
chown -R clowder.users pyclowder |
audio
Code Block | ||
---|---|---|
| ||
apt-get -y install sox libsox-fmt-mp3 cd ${EXTRACTORS_HOME} git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-audio.git ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-audio/preview/ sed -i -e "s#Binary = .*#Binary = '`which sox`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-audio/preview/config.py cp ${EXTRACTORS_HOME}/extractors-audio/preview/clowder-audio-preview.conf /etc/init chown -R clowder.users extractors-audio |
image
|
...
Code Block | ||
---|---|---|
| ||
apt-get -y install imagemagick cd ${EXTRACTORS_HOME} git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-image.git ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder ${EXTRACTORS_HOME}/extractors-image/preview/ sed -i -e "s#imageBinary = .*#imageBinary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-image/preview/config.py cp ${EXTRACTORS_HOME}/extractors-image/preview/clowder-image-preview.conf /etc/init chown -R clowder.users extractors-image |
Code Block | ||
---|---|---|
| ||
apt-get -y install imagemagick cd ${EXTRACTORS_HOME} git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-pdf.git ln -s ${EXTRACTORS_HOME}/pyclowder/pyclowder /home/clowder/extractors-pdf/preview/ sed -i -e "s#Binary = .*#Binary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-pdf/preview/config.py cp ${EXTRACTORS_HOME}/extractors-pdf/preview/clowder-pdf-preview.conf /etc/init chown -R clowder.users extractors-pdf |
video
Code Block | ||
---|---|---|
| ||
apt-get -y install libav-tools cd /home/clowder git clone https://opensource.ncsa.illinois.edu/stash/scm/cats/extractors-video.git ln -s /home/clowder/pyclowder/pyclowder /home/clowder/extractors-video/preview/ sed -i -e "s#Binary = .*#Binary = '`which convert`'#" -e "s#rabbitmqURL = .*#rabbitmqURL = '${RABBITMQ_URL}'#" extractors-video/preview/config.py cp /home/clowder/extractors-video/preview/clowder-video-preview.conf /etc/init chown -R clowder.users extractors-video |
start extractors
Code Block | ||
---|---|---|
| ||
cd /etc/init for x in clowder-*.conf; do start `basename $x .conf` done |