Installation
- On Mac, use brew to install maven, imagemagick, rabbitmq. Install mongo either manually or using brew.
- git clone the repo.
- In the top dir, do "mvn compile" to compile the code, should not take long. Then do "mvn package -Dmaven.test.skip=true -Dmaven.javadoc.skip=true -q".
- Start mongod on localhost.
- Start RabbitMQ on localhost.
- In the top dir, in one shell, run "bin/PolyglotRestlet.sh" to start Polyglot. In another shell tab, run "bin/SoftwareServerRestlet.sh" to start SoftwareServer. If the script fails, change its content from "2.1.0" to "2.2.0". Then you can use the URLs http://localhost:8184 to check Polyglot, and http://localhost:8182 to check SS.
User Interface
A user interacts with Polyglot. Polyglot works internally with software servers. By default Polyglot runs on port 8184, Softwareserver on port 8182. A user can use "http://<polyglot_ip>:8184/" to see the available endpoints / URLs, such as "http://<polyglot_ip>:8184/servers" to see the server IP list, or "http://<polyglot_ip>:8184/form" to use a form to submit a conversion request.
Internal Working
The Java Classes
- Software Server:
- SoftwareServer.java: handles wrapper scripts;
- SoftwareServerRestlet.java: handles the Restlet service;
- SoftwareServerRESTUtilities.java: the rabbitMQHandler() method handles all the interaction with RabbitMQ.
- Polyglot:
- The entry point is PolyglotRestlet.java.
java -cp polyglot.jar:lib/* -Xmx1g edu.illinois.ncsa.isda.softwareserver.polyglot.PolyglotRestlet
- Polyglot.java: abstract class;
- PolyglotStewardAMQ.java: handles IOGraph and interaction with RabbitMQ;
- PolyglotRestlet.java: handles the Restlet interface.
process_jobs(): at the end, writes the ".url" file.
- The entry point is PolyglotRestlet.java.
- Software Server:
SS Registration
A Polyglot process goes to RabbitMQ, gets the consumer IPs, connects to these IP's softwareserver at the URL "<softewareserver_ip>:8182/applications". If the URL is accessible and contains valid content, Polyglot adds the IP to its server list.
SS checkin
A Softwareserver connects to RabbitMQ, picks up jobs (aka msgs) in the queues, processes them, and sends the results back by accessing Polyglot's endpoint at "<polyglot_ip>:8184/checkin/<jobid>/<result_url>".
SS capabilities
SoftwareServer uses SoftwareServer.conf + scripts/*/.aliases.txt to configure which applications it will process. For example, SS on dap-dev is configured to convert only demclip and streamclip:
Polyglot REST APIs
POL REST endpoints that POL handles on its own without accessing/redirecting to SSes:
GET:
/ Returns a list of supported endpoints.
/alive Returns "yes".
/checkin
/convert Returns all supported output formats
/convert/output_format1 returns all supported input formats that can be converted to output_format1
/convert/output_format1/file_url1 do the conversion: download file_url1
/form
/image
/inputs
/inputs/<format1>
/outputs
/requests
/servers
/software
POL REST endpoints that access or redirect to SSes:
GET:
/file/<file1> # If file1 doesn't exist and file1.url exists.
/servers/<server1_ip>[/...] # Redirects to server1_ip:8182/software/...
/software/<sw1> # Accesses all SS:8182/software until finding one that contains sw1 and redirects to <ss_ip1>:8182/software/<sw1>.
POST:
/servers/<ip1>[/...] # Redirects to ip1:8182/software/<...>.
/software/<sw1> # Accesses all SS:8182/software until finding one that contains sw1 and redirects to <ss_ip1>:8182/software/<sw1>.
Implementation Details
Configuration files
Polyglot and SS are implemented in Java, currently using Restlet. Current configuration files are:
- Polyglot – PolyglotRestlet.conf (Polyglot port #, Softwareserver port #, etc.), PolyglotStewardAMQ.conf (RabbitMQ URI)
- Softwareserver – SoftwareServerRestlet.conf (port #, RabbitMQ URI).
Softwareserver job checkin is in SoftwareServerRESTUtilities.java, Polyglot accessing RabbitMQ part is in polyglot/PolyglotStewardAMQ.java.
Start-up / Initialization
- When Polyglot starts, it does:
- read the configuration file;
- start the PolyglotStewardAMQ thread;
- start a thread to update Mongo; call PolyglotRESTUtilities.updateMongo(). By default updates every 2 sec.
- start the restlet service.
- When PolyglotStewardAMQ thread starts, it starts 3 threads:
- discoveryAMQ(), every 30 s.
- process_jobs(), every 3 s,
- heartbeat(), to remove unresponsive SSs. every Heartbeat secs, default to 10 s.
- Two queryEndpoint() methods.
In SoftwareServerRESTUtilities.java: returns pure text; in PolyglotStewardAMQ.java, returns json. kgm.utility.Utility source code
https://isda.ncsa.illinois.edu/svn/isda/trunk/kgm/Utilities/src/kgm/utility/Utility.java
Testing
Testing conversion on dap-dev
- Commands:
Send the conversion request:
curl http://browndog.user:password1@dap-dev.ncsa.illinois.edu:8184/convert/jpg/http%3A%2F%2Fbrowndog.ncsa.illinois.edu%2Fexamples%2Fbrowndog.pngGet the converted file:
curl -u browndog.user:password1 -O http://dap-dev.ncsa.illinois.edu:8184/file/200598567_browndog.jpg
- Used a png to test.
http://browndog.ncsa.illinois.edu/examples/browndog.png
The converion URL requires an escaped URL.
To generate an escaped URL using perl:
perl -e 'use URI::Escape; print uri_escape("http://browndog.ncsa.illinois.edu/examples/browndog.png")'
http://www.perlhowto.com/encode_and_decode_url_strings
man URI::Escape Bi-hourly tests.py
On dap-dev in /var/www/html/dap/tests/. It used the Polyglot GET API
using the URL to convert, got a returned URL, then downloaded that URL to
"tmp/<count>_<file_basename>.<output_format>", then checked
whether that file existed and was not empty.
Python requests:
r = requests.get(api_call, auth=(username, password),
headers=headers, timeout=timeout)
result = r.text
contains only the text, not the HTML tags.
That is, the actual content returned is:
<a href=http://dap-dev.ncsa.illinois.edu:8184/file/200598567_browndog.jpg>http://dap-dev.ncsa.illinois.edu:8184/file/200598567_browndog.jpg</a>
but result is only:
http://dap-dev.ncsa.illinois.edu:8184/file/200598567_browndog.jpg
python requests API:
http://docs.python-requests.org/en/master/api/
Restlet Documentation
In pom.xml: org.restlet 2.3.1.
- User guide.
https://restlet.com/technical-resources/restlet-framework/guide/2.3
Resource package Overview:
https://restlet.com/technical-resources/restlet-framework/guide/2.3/core/resource/overview
Annotation Get.
https://restlet.com/technical-resources/restlet-framework/javadocs/2.1/jse/api/org/restlet/resource/Get.html
Response: redirectTemporary. Used in url redirection.
https://restlet.com/technical-resources/restlet-framework/javadocs/2.1/jse/api/org/restlet/Response.html
https://restlet.com/technical-resources/restlet-framework/javadocs/2.1/jee/api/org/restlet/resource/ServerResource.htm
- User guide.
Installation of Required Software Packages on Ubuntu Trusty (14.04)
Need to install the programs used in Softwareserver scripts:
convert, unoconv, daffodil, ffmpeg, 7z and 7za, libreoffice, avconv, xvfb-run, eog, flac, ps2pdf, gthumb, htmldoc, kabeja, rar (requires "multiverse"), unzip, unrar, cabextract, ncdump, pdf2djvu, prince, soundconverter, TeighaFileConverter, txt2html, unrtf, cvlc, ebook-convert
which are in the following Ubuntu packages:
imagemagick, unoconv, (daffodil/), ffmpeg, p7zip-full, libreoffice*, libav-tools, xvfb, eog, flac, gthumb, htmldoc, (kabeja?), rar, unrar, cabextract, netcdf-bin, pdf2djvu, (prince?), soundconverter, (TeighaFileConverter?), txt2html, unrtf, vlc-nox, calibre
unzip was already installed. ps2pdf is in ghostscript, already installed.
To install those that are available on Ubuntu 14.04, Daffodil and Kabeja, do: