As the name "Brown Dog" suggests the project aims at bringing together a number of external tools as part of the two services being constructed. For the DAP, which handles conversions, these tools are incorporated as scripts to the Software Servers in Polyglot or as DFDL schemas for Daffodil. For the DTS, which handles the automatic extraction of metadata and content signatures, tools are incorporated as either extractors for Medici or extractors for Versus. Below we show examples for incorporating each of these components. This overview assumes a basic level of knowledge about the four main components of the Brown Dog software platform, i.e. Polyglot, Medici, Versus, and Daffodil. For a more in depth overview of each of these components and their function it is recommended that you first read through their online documentation and/or go through one of the online tutorial videos:

The purpose of this document is to provide quick examples of each means of incorporating tools so as to bootstrap ones ability to include their code within one of the two services.

Anchor
StartHere
StartHere
Start Here

To begin does your code, software, or tool carry out a data conversion or a data extraction? If a conversion the tool should be included in the Data Access Proxy. If an extraction the tool should be included in the Data Tilling Service.

Anchor
DAP
DAP
The Data Access Proxy (DAP)

The Data Access Proxy handles data conversions. If a piece of software or tool exists to carry out the conversion its incorporation into the DAP will be through Polyglot. If the specification of the file format is known then in can be incorporated as a DFDL schema within Daffodil.

Anchor
Polyglot
Polyglot
Polyglot Software Server Scripts

Software Server scripts are used by Polyglot to automate the interaction with software that is capable of converting from one file format to another. These scripts can directly wrap command line utilities that carry out conversions for use in Polyglot or split the steps of opening a file in one format and saving a file in a different format, typical of GUI driven applications. These wrapper scripts can be written in pretty much any text based scripting language. Below we show a few simple examples. Full details on the creation of these wrapper scripts, the required naming conventions, and required header conventions please refer to the the Scripting Manual.

Anchor
CommandLIne
CommandLIne
Command Line Applications

Bash Script

The following is an example of a bash wrapper script for ImageMagick. Note that it is fairly straight forward. The comments at the top contain the information Polyglot needs to use the application: the name and version of the application, they type of data it supports, the input formats it supports, and the output formats it supports.

Code Block

title	ImgMgk_convert.sh

#!/bin/sh
#ImageMagick (v6.5.2)
#image
#bmp, dib, eps, fig, gif, ico, jpg, jpeg, pdf, pgm, pict, pix, png, pnm, ppm, ps, rgb, rgba, sgi, sun, svg, tga, tif, tiff, ttf, x, xbm, xcf, xpm, xwd, yuv
#bmp, dib, eps, gif, jpg, jpeg, pdf, pgm, pict, png, pnm, ppm, ps, rgb, rgba, sgi, sun, svg, tga, tif, tiff, ttf, x, xbm, xpm, xwd, yuv

convert $1 $2

Batch File

Some GUI based applications are capable of being called in a headless mode. The following is an example wrapper script for OpenOffice called in its headless mode.

Code Block

title	OpenOffice_convert.bat

REM OpenOffice (v3.1.0)
REM document
REM doc, odt, rtf, txt
REM doc, odt, pdf, rtf, txt

"C:\Program Files\OpenOffice.org 3\program\soffice.exe" -headless -norestore "-accept=socket`,host=localhost`,port=8100;urp;StarOffice.ServiceManager"
"C:\Program Files\OpenOffice.org 3\program\python.exe" "C:\Converters\DocumentConverter.py" "%1%" "%2%"

Anchor
GUI
GUI
GUI Applications

AutoHotKey

The following is an example of an AutoHotKey script to convert files with Adobe Acrobat, a GUI driven application. Note it contains a similar header in the comments at the beginning of the script. Also note that the open and save operation can be broken into two separate scripts.

Code Block

title	Acrobat_open.ahk

;Adobe Acrobat (v9.3.0 Pro Extended)
;document
;pdf

;Parse input filename
arg1 = %1%
StringGetPos, index, arg1, \, R
ifLess, index, 0, ExitApp
index += 2
input_filename := SubStr(arg1, index)

;Run program if not already running
IfWinNotExist, Adobe 3D Reviewer
{
  Run, C:\Program Files\Adobe\Acrobat 9.0\Acrobat\Acrobat.exe
  WinWait, Adobe Acrobat Pro Extended
}

;Activate the window
WinActivate, Adobe Acrobat Pro Extended
WinWaitActive, Adobe Acrobat Pro Extended

;Open document
Send, ^o
WinWait, Open
ControlSetText, Edit1, %1%
ControlSend, Edit1, {Enter}

;Make sure model is loaded before exiting
Loop
{
  IfWinExist, %input_filename% - Adobe Acrobat Pro Extended
  {
    break
  }

  Sleep, 500
}

Code Block

title	Acrobat_save.ahk

;Adobe Acrobat (v9.3.0 Pro Extended)
;document
;doc, html, jpg, pdf, ps, rtf, txt

;Parse output format
arg1 = %1%
StringGetPos, index, arg1, ., R
ifLess, index, 0, ExitApp
index += 2
out := SubStr(arg1, index)

;Parse filename root
StringGetPos, index, arg1, \, R
ifLess, index, 0, ExitApp
index += 2
name := SubStr(arg1, index)
StringGetPos, index, name, ., R
ifLess, index, 0, ExitApp
name := SubStr(name, 1, index)

;Activate the window
WinActivate, %name%.pdf - Adobe Acrobat Pro Extended
WinWaitActive, %name%.pdf - Adobe Acrobat Pro Extended

;Save document
Send, ^S
WinWait, Save As

if(out = "doc"){
  ControlSend, ComboBox3, m
}else if(out = "html"){
  controlSend, ComboBox3, h
}else if(out = "jpg"){
  controlSend, ComboBox3, j
}else if(out = "pdf"){
  controlSend, ComboBox3, a
}else if(out = "ps"){
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
}else if(out = "rtf"){
  controlSend, ComboBox3, r
}else if(out = "txt"){
  controlSend, ComboBox3, t
  controlSend, ComboBox3, t
}

ControlSetText, Edit1, %1%
ControlSend, Edit1, {Enter}

;Return to main window before exiting
Loop
{
  ;Continue on if main window is active
  IfWinActive, %name%.pdf - Adobe Acrobat Pro Extended
  { 
    break
  }

  ;Click "Yes" if asked to overwrite files
  IfWinExist, Save As
  {
    ControlGetText, tmp, Button1, Save As

    if(tmp = "&Yes")
    {
      ControlClick, Button1, Save As
    }
  }

  Sleep, 500
}

;Wait a lit bit more just in case
Sleep, 1000

;Close whatever document is currently open
Send, ^w

;Make sure it actually closed before exiting
Loop
{
  ;Continue on if main window is active
  IfWinActive, Adobe Acrobat Pro Extended
  { 
    break
  }

  Sleep, 500
}

Anchor
DFDL
DFDL
DFDL Schemas

The Data Format Description Language (DFDL) allows one to write an XML schema definition which defines how to automatically parse a file in that format into an XML representation of the data. DFDL provides an ideal means of preserving the many ad hoc formats created and used in labs. The DFDL schema below is a simple example that parses the data from a PGM image file.

Code Block

title	pgm.dfdl.xsd

<?xml version="1.0" encoding="UTF-8"?>

<!--
Load image data from a PGM file and represent the data as a sequence of pixels in row major order.
-->

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ex="http://example.com" targetNamespace="http://example.com">
  <xs:include schemaLocation="xsd/built-in-formats.xsd"/>

  <xs:annotation>
    <xs:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="ex:daffodilTest1" separator="" initiator="" terminator="" leadingSkip='0' textTrimKind="none" initiatedContent="no"
        alignment="implicit" alignmentUnits="bits" trailingSkip="0" ignoreCase="no" separatorPolicy="suppressed" 
        separatorPosition="infix" occursCountKind="parsed" emptyValueDelimiterPolicy="both" representation="text" 
        textNumberRep="standard" lengthKind="delimited" encoding="ASCII"/>
    </xs:appinfo>
  </xs:annotation>

  <xs:element name="file">
    <xs:complexType>
      <xs:sequence>

        <xs:element name="header" dfdl:lengthKind="implicit" maxOccurs="1">
          <xs:complexType>
            <xs:sequence dfdl:sequenceKind="ordered" dfdl:separator="%NL;" dfdl:separatorPosition="postfix">
              <xs:element name="type" type="xs:string"/>
              <xs:element name="dimensions" maxOccurs="1" dfdl:occursCountKind="implicit">
                <xs:complexType>
                  <xs:sequence dfdl:sequenceKind="ordered" dfdl:separator="%SP;">
                    <xs:element name="width" type="xs:integer"/>
                    <xs:element name="height" type="xs:integer"/>
                  </xs:sequence>
                </xs:complexType>
              </xs:element>
              <xs:element name="depth" type="xs:integer"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>

        <xs:element name="pixels" dfdl:lengthKind="implicit" maxOccurs="1">
          <xs:complexType>
            <xs:sequence dfdl:separator="%SP; %NL; %SP;%NL;" dfdl:separatorPosition="postfix" dfdl:separatorSuppressionPolicy="anyEmpty">
              <xs:element name="pixel" type="xs:integer" maxOccurs="unbounded" dfdl:occursCountKind="expression"
                dfdl:occursCount="{../../ex:header/ex:dimensions/ex:width * ../../ex:header/ex:dimensions/ex:height }"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>

      </xs:sequence>
    </xs:complexType>
  </xs:element>

</xs:schema>

An example of the XML produced the above DFDL schema when applied to a PGM file is shown below.

Code Block

<ex:file xmlns:ex="http://example.com">
  <ex:header>
    <ex:type>P2</ex:type>
    <ex:dimensions>
      <ex:width>16</ex:width>
      <ex:height>16</ex:height>
    </ex:dimensions>
    <ex:depth>255</ex:depth>
  </ex:header>
  <ex:pixels>
    <ex:pixel>136</ex:pixel>
    <ex:pixel>136</ex:pixel>
    <ex:pixel>136</ex:pixel>
    ...
    <ex:pixel>136</ex:pixel>
    <ex:pixel>136</ex:pixel>
  </ex:pixels>
</ex:file>

Anchor
DTS
DTS
The Data Tilling Service (DTS)

The Data Tilling Services handles data extractions. If your code, tool, or software extracts information such as keywords from a file or its contents then it should be included in the DTS as a Medici extractor. If your code, tool, or software extracts a signature from the file's contents which in turn can be compared to the signatures of other files via some distance measure to find similar pieces of data, then, it should be included in the DTS as a Versus extractor.

Anchor
Medici
Medici
Medici Extractors

Medici extractors typically serve to automatically extract some new kind of information from a file's content when it is uploaded into Medici. These extractors do this by connecting to a shared RabbitMQ bus. When a new file is uploaded to Medici it is announced on this bus. Extractors that can handle a file of the type posted on the bus are triggered and the data they in turn create is returned to Medici as derived data to be associated with that file. The extractors themselves can be implemented in a variety of languages. Examples of these extractors in different languages can be found in the extractors-templates code repository.

Anchor
Java
Java
Java

Java extractors can be created using the amqp-client jar file. This allows you to connect to the RabitMQ bus and received messages. The easiest way to get up and running is to use maven with java to add all required dependencies. An example of an extractor written in java can be found at Medici Extractor in Java.

Anchor
Python
Python
Python

Python extractors will often be based on the packages pika and requests. This allows you to connect to the RabittMQ message bus and easily send requests to medici. A complete example of the python extractor can be found at Medici Extractor in Python

Calling R Scripts from Python

Coming soon...

Anchor
Versus
Versus
Versus Extractors

Versus extractors serve to extract a signature from a file's content. These signatures, effectively a hash for the data, are typically numerical vectors which capture some semantically meaningful aspect of the content so that two such signatures can then be compared using some distance measure. Within Versus extractors operate on a data structure representing the content of a file, produced a Versus adapter, and the returned signatures compared by either a Versus similarity or distance measure. The combination of these adapters, extractors, and measures in turn compose a comparison which can be used for relating files according their contents.

Anchor
Java Measure
Java Measure
Java

The main class sets up the comparison, this is done by adding the two files that need to be compared, as well as the adapter to load the file, the extractor to extract a feature from the file, and a measurement to compare the two features.

Code Block

language	java
title	Main

    static public void main(String[] args) {         
        PairwiseComparison comparison = new PairwiseComparison();
        comparison.setId(UUID.randomUUID().toString());
        comparison.setFirstDataset(new File("data/test1.txt"));
        comparison.setSecondDataset(new File("data/test2.txt"));
        comparison.setAdapterId(TextAdapter.class.getName());
        comparison.setExtractorId(TextHistogramExtractor.class.getName());
        comparison.setMeasureId(LabelHistogramEuclidianDistanceMeasure.class.getName());

        ExecutionEngine ee = new ExecutionEngine(); 
        ee.submit(comparison, new ComparisonStatusHandler() {
            @Override
            public void onStarted() {
                System.out.println("STARTED : ");
            }

            @Override
            public void onFailed(String msg, Throwable e) {
                System.out.println("FAILED  : " + msg);
                e.printStackTrace();
                System.exit(0);
            }

            @Override
            public void onDone(double value) {
                System.out.println("DONE    : " + value);
                System.exit(0);
            }

            @Override
            public void onAborted(String msg) {
                System.out.println("ABORTED : " + msg);
                System.exit(0);
            }
        });
    }

The text adapter will take a text file, and load all the file, splitting the text into words and return a list of all words in the text. The words are still in the right order, and it is possible to read the original information of the file by reading the words in the order as they are returned by getWords().

Code Block

language	java
title	Text Adapter

public class TextAdapter implements FileLoader, HasText {
    private File         file;
    private List<String> words;

    public TextAdapter() {}

    // ----------------------------------------------------------------------
    // FileLoader
    // ----------------------------------------------------------------------
    @Override
    public void load(File file) {
        this.file = file;
    }

    @Override
    public String getName() {
        return "Text Document";
    }

    @Override
    public List<String> getSupportedMediaTypes() {
        List<String> mediaTypes = new ArrayList<String>();
        mediaTypes.add("text/*");
        return mediaTypes;
    }

    // ----------------------------------------------------------------------
    // HasText
    // ----------------------------------------------------------------------
    @Override
    public List<String> getWords() {
        if (words == null) {
            words = new ArrayList<String>();
            try {
                BufferedReader br = new BufferedReader(new FileReader(file));
                String line;
                while((line = br.readLine()) != null) {
                    String[] w = line.split(" ");
                    words.addAll(Arrays.asList(w));
                }
                br.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        return words;
    }
 }

The extractor will take the words returned by the adapter and count the occurrence of each word. At this point we are left with a histogram with all words and how often they occur in the text, we can no longer read the text since the information about the order of the words is lost.

Code Block

language	java
title	Text Histogram Extractor

public class TextHistogramExtractor implements Extractor
{
    @Override
    public Adapter newAdapter() {
        throw (new RuntimeException("Not supported."));
    }

    @Override
    public String getName() {
        return "Text Histogram Extractor";
    }

    @Override
    public Set<Class<? extends Adapter>> supportedAdapters() {
        Set<Class<? extends Adapter>> adapters = new HashSet<Class<? extends Adapter>>();
        adapters.add(HasText.class);
        return adapters;
    }

    @Override
    public Class<? extends Descriptor> getFeatureType() {
        return LabelHistogramDescriptor.class;
    }

    @Override
    public Descriptor extract(Adapter adapter) throws Exception {
        if (adapter instanceof HasText) {
            LabelHistogramDescriptor desc = new LabelHistogramDescriptor();

            for (String word : ((HasText) adapter).getWords()) {
                desc.increaseBin(word);
            }

            return desc;
        } else {
            throw new UnsupportedTypeException();
        }
    }
    
    @Override
    public boolean hasPreview(){
        return false;
    }
    
    @Override
    public String previewName(){
        return null;
    }
}

To compare two texts we use the euclidian distance measure of two histograms. First we normalize each histogram, so we can compare a large text with a small text, next we compare each big of the two histograms. If the bin is missing from either histogram it is assumed to have a value of 0.

Code Block

language	java
title	Euclidian Distance Measure

public class LabelHistogramEuclidianDistanceMeasure implements Measure 
{
    @Override
    public SimilarityPercentage normalize(Similarity similarity) {
        return new SimilarityPercentage(1 - similarity.getValue());
    }

    @Override
    public String getFeatureType() {
        return LabelHistogramDescriptor.class.getName();
    }

    @Override
    public String getName() {
        return "Histogram Distance";
    }

    @Override
    public Class<LabelHistogramEuclidianDistanceMeasure> getType() {
        return LabelHistogramEuclidianDistanceMeasure.class;
    }

    // correlation

    @Override
    public Similarity compare(Descriptor desc1, Descriptor desc2) throws Exception {
        if ((desc1 instanceof LabelHistogramDescriptor) && (desc2 instanceof LabelHistogramDescriptor)) {
            LabelHistogramDescriptor lhd1 = (LabelHistogramDescriptor) desc1;
            LabelHistogramDescriptor lhd2 = (LabelHistogramDescriptor) desc2;

            // get all possible labels
            Set<String> labels = new HashSet<String>();
            labels.addAll(lhd1.getLabels());
            labels.addAll(lhd2.getLabels());

            // normalize
            lhd1.normalize();
            lhd2.normalize();
            
            // compute distance
            double sum = 0;

            for (String s : labels) {
                Double b1 = lhd1.getBin(s);
                Double b2 = lhd2.getBin(s);
             
                if (b1 == null) {
                    sum += b2 * b2;
                } else if (b2 == null) {
                    sum += b1 * b1;
                } else {
                    sum += (b1 - b2) * (b1 - b2);
                }
            }

            return new SimilarityNumber(Math.sqrt(sum), 0, 1, 0);
        } else {
            throw new UnsupportedTypeException();
        }
    }
}

Adding Tools to the DAP and DTS, Overview and Examples

Image Removed

Introduction

This guide is intended as an introduction for new users working with the Brown Dog software platform. An introduction to the 3 main components of the platform, Polyglot, Medici, and Versus will be presented, and examples of scripts and code are provided. These 3 tools can be leveraged to add tools to the Data Access Proxy (DAP) and Data Tilling Service (DTS).

Table of Contents
- Prerequisites
- Polyglot
  - Overview
  - Brief History
    - Information Loss
    - File Formats
  - Scripting
    - AutoHotKey
    - AppleScript
    - Python
    - Bash
- Medici
  - Overview
  - Example Extractors
    - Java
    - C++
    - Python
- Versus
  - Overview
  - Example Measures
    - Java

...

This overview assumes a basic level of knowledge about the three main components of the Brown Dog software platform, Polyglot, Medici, and Versus. Some background information will be provided, however, for a more in depth overview of each of these components and their function, it is recommended that you take the opportunity to view the provided online tutorial sessions that may be found on the ISDA's YouTube account: http://www.youtube.com/channel/UCGIXAeNEa2v7Gt-tvfdJPvw.

...

Image Removed

...

Polyglot is intended to be a universal and scalable file format converter. Data preservation and curation is an extremely difficult problem faced by many within the scientific community. One of the most difficult issues faced by those hoping to preserve data is that over time the file formats used to store important scientific data may become unreadable. This is a significant problem in the scientific community, as preservation of data is necessary to ensure reproducible science. A significant problem addressed by Polyglot is that there exist a multitude of different file formats that represent the same data (e.g. .mp3, .flac, .wav for audio) as well as numerous different software programs to view that data. This diminishes the lifespan of that data, as the software available to utilize the files can become unsupported by the creator, or it may be the case that it is not supported by software that functions on the user's current system. It can also be the case that the software required to read a particular file format is unavailable to a particular user due to lack of access to the relevant program or licensing issues. When this occurs accessing data stored in these file formats requires the user to find a different program to read the files, or in many cases to convert the data to a different file format that is usable by some other program. In addition to this, some proprietary file formats may be unreadable to those who might benefit from the data if access to the appropriate software is not available. Again, in this case the user is forced to find a different program to read the files, or to convert them to a different format. Polyglot seeks to allow the user to convert from any file format to one that is supported by the software available to the user. In this way, Polyglot preserves data, allowing data that might otherwise become unusable to persist over time.

...

Image Removed

...

Code Block

title	Open

;Adobe Acrobat (v9.3.0 Pro Extended)
;document
;pdf

;Parse input filename
arg1 = %1%
StringGetPos, index, arg1, \, R
ifLess, index, 0, ExitApp
index += 2
input_filename := SubStr(arg1, index)

;Run program if not already running
IfWinNotExist, Adobe 3D Reviewer
{
  Run, C:\Program Files\Adobe\Acrobat 9.0\Acrobat\Acrobat.exe
  WinWait, Adobe Acrobat Pro Extended
}

;Activate the window
WinActivate, Adobe Acrobat Pro Extended
WinWaitActive, Adobe Acrobat Pro Extended

;Open document
Send, ^o
WinWait, Open
ControlSetText, Edit1, %1%
ControlSend, Edit1, {Enter}

;Make sure model is loaded before exiting
Loop
{
  IfWinExist, %input_filename% - Adobe Acrobat Pro Extended
  {
    break
  }

  Sleep, 500
}

Code Block

title	Save

;Adobe Acrobat (v9.3.0 Pro Extended)
;document
;doc, html, jpg, pdf, ps, rtf, txt

;Parse output format
arg1 = %1%
StringGetPos, index, arg1, ., R
ifLess, index, 0, ExitApp
index += 2
out := SubStr(arg1, index)

;Parse filename root
StringGetPos, index, arg1, \, R
ifLess, index, 0, ExitApp
index += 2
name := SubStr(arg1, index)
StringGetPos, index, name, ., R
ifLess, index, 0, ExitApp
name := SubStr(name, 1, index)

;Activate the window
WinActivate, %name%.pdf - Adobe Acrobat Pro Extended
WinWaitActive, %name%.pdf - Adobe Acrobat Pro Extended

;Save document
Send, ^S
WinWait, Save As

if(out = "doc"){
  ControlSend, ComboBox3, m
}else if(out = "html"){
  controlSend, ComboBox3, h
}else if(out = "jpg"){
  controlSend, ComboBox3, j
}else if(out = "pdf"){
  controlSend, ComboBox3, a
}else if(out = "ps"){
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
  controlSend, ComboBox3, p
}else if(out = "rtf"){
  controlSend, ComboBox3, r
}else if(out = "txt"){
  controlSend, ComboBox3, t
  controlSend, ComboBox3, t
}

ControlSetText, Edit1, %1%
ControlSend, Edit1, {Enter}

;Return to main window before exiting
Loop
{
  ;Continue on if main window is active
  IfWinActive, %name%.pdf - Adobe Acrobat Pro Extended
  { 
    break
  }

  ;Click "Yes" if asked to overwrite files
  IfWinExist, Save As
  {
    ControlGetText, tmp, Button1, Save As

    if(tmp = "&Yes")
    {
      ControlClick, Button1, Save As
    }
  }

  Sleep, 500
}

;Wait a lit bit more just in case
Sleep, 1000

;Close whatever document is currently open
Send, ^w

;Make sure it actually closed before exiting
Loop
{
  ;Continue on if main window is active
  IfWinActive, Adobe Acrobat Pro Extended
  { 
    break
  }

  Sleep, 500
}

Code Block

title	Kill

;Adobe Acrobat (v9.3.0 Pro Extended)

;Kill any scripts that could be using this application first
RunWait, taskkill /f /im Acrobat_open.exe
RunWait, taskkill /f /im Acrobat_save.exe

;Kill the application
RunWait, taskkill /f /im Acrobat.exe

...

Code Block

theme	Emacs
language	java
title	Connecting to RabbitMQ

protected void startExtractor(String rabbitMQUsername,
	String rabbitMQpassword) {
	try{ 
 		//Open channel and declare exchange and consumer
		ConnectionFactory factory = new ConnectionFactory();
		factory.setHost(serverAddr);
		factory.setUsername(rabbitMQUsername);
		factory.setPassword(rabbitMQpassword);
		Connection connection = factory.newConnection();

 		final Channel channel = connection.createChannel();
		channel.exchangeDeclare(EXCHANGE_NAME, "topic", true);

		channel.queueDeclare(QUEUE_NAME,DURABLE,EXCLUSIVE,AUTO_DELETE,null);
		channel.queueBind(QUEUE_NAME, EXCHANGE_NAME, "*.file.text.plain.#");
 
 		this.channel = channel;

 		// create listener
		channel.basicConsume(QUEUE_NAME, false, CONSUMER_TAG, new DefaultConsumer(channel) {
 			@Override
 			public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body) throws IOException {
				messageReceived = new String(body);
 				long deliveryTag = envelope.getDeliveryTag();
 				// (process the message components here ...)
				System.out.println(" {x} Received '" + messageReceived + "'");
 
				replyProps = new AMQP.BasicProperties.Builder().correlationId(properties.getCorrelationId()).build();
				replyTo = properties.getReplyTo();
 
				processMessageReceived();
				System.out.println(" [x] Done");
				channel.basicAck(deliveryTag, false);
			}
		});

 		// start listening 
		System.out.println(" [*] Waiting for messages. To exit press CTRL+C");
 		while (true) {
			Thread.sleep(1000);
		}
	}
 	catch(Exception e){
		e.printStackTrace();
		System.exit(1);
	} 
}

...

Code Block

language	cpp
title	Connecting to RabbitMQ

#include <amqpcpp.h>

namespace CPPExample {

  class RabbitMQConnectionHandler : public AMQP::ConnectionHandler {
      /**
      *  Method that is called by the AMQP library every time it has data
      *  available that should be sent to RabbitMQ. 
      *  @param  connection  pointer to the main connection object  
      *  @param  data        memory buffer with the data that should be sent to RabbitMQ
      *  @param  size        size of the buffer
      */
     virtual void onData(AMQP::Connection *connection, const char *data, size_t size)
     {
         // @todo 
         //  Add your own implementation, for example by doing a call to the
         //  send() system call. But be aware that the send() call may not
         //  send all data at once, so you also need to take care of buffering
         //  the bytes that could not immediately be sent, and try to send 
         //  them again when the socket becomes writable again
     }

      /**
      *  Method that is called by the AMQP library when the login attempt 
      *  succeeded. After this method has been called, the connection is ready 
      *  to use.
      *  @param  connection      The connection that can now be used
      */
      virtual void onConnected(Connection *connection)
      {
         // @todo
         //  add your own implementation, for example by creating a channel 
         //  instance, and start publishing or consuming
      }

      /**
      *  Method that is called by the AMQP library when a fatal error occurs
      *  on the connection, for example because data received from RabbitMQ
      *  could not be recognized.
      *  @param  connection      The connection on which the error occured
      *  @param  message         A human readable error message
      */
      virtual void onError(Connection *connection, const std::string &message)
      {
        // @todo
        //  add your own implementation, for example by reporting the error
        //  to the user of your program, log the error, and destruct the 
        //  connection object because it is no longer in a usable state
      }
  };

}

Code Block

language	cpp
title	Receiver

namespace CPPExample {

  /**
   *  Parse data that was recevied from RabbitMQ
   *  
   *  Every time that data comes in from RabbitMQ, you should call this method to parse
   *  the incoming data, and let it handle by the AMQP-CPP library. This method returns the number
   *  of bytes that were processed.
   *
   *  If not all bytes could be processed because it only contained a partial frame, you should
   *  call this same method later on when more data is available. The AMQP-CPP library does not do
   *  any buffering, so it is up to the caller to ensure that the old data is also passed in that
   *  later call.
   *
   *  @param  buffer      buffer to decode
   *  @param  size        size of the buffer to decode
   *  @return             number of bytes that were processed
   */
  size_t parse(char *buffer, size_t size)
  {
     return _implementation.parse(buffer, size);
  }
}

...

Code Block

theme	Emacs
language	py
title	Instantiating the logger and starting the extractor

def main():
 global logger

 # name of receiver
receiver='ExamplePythonExtractor'

 # configure the logging system
logging.basicConfig(format="%(asctime)-15s %(name)-10s %(levelname)-7s : %(message)s", level=logging.WARN)
logger = logging.getLogger(receiver)
logger.setLevel(logging.DEBUG)
 
 if len(sys.argv) != 4:
logger.info("Input RabbitMQ username, followed by RabbitMQ password and Medici REST API key.")
sys.exit()
 
 global playserverKey
playserverKey = sys.argv[3]

Code Block

theme	Emacs
language	py
title	Connecting to RabbitMQ

# connect to rabbitmq using input username and password 
credentials = pika.PlainCredentials(sys.argv[1], sys.argv[2])
parameters = pika.ConnectionParameters(credentials=credentials)
connection = pika.BlockingConnection(parameters)
 
 # connect to channel
channel = connection.channel()

 # declare the exchange
channel.exchange_declare(exchange='medici', exchange_type='topic', durable=True)

 # declare the queue
channel.queue_declare(queue=receiver, durable=True)

 # connect queue and exchange
channel.queue_bind(queue=receiver, exchange='medici', routing_key='*.file.text.plain')

 # create listener
channel.basic_consume(on_message, queue=receiver, no_ack=False)

 # start listening
logger.info("Waiting for messages. To exit press CTRL+C")
 try:
channel.start_consuming()
 except KeyboardInterrupt:
channel.stop_consuming()

...

Code Block

language	java
title	Measure

public class WordCountMeasure implements Serializable,Measure {

	private static final long SLEEP = 10000;

	@Override
	public Similarity compare(Descriptor feature1, Descriptor feature2)
			throws Exception {
		Thread.sleep(SLEEP);
		return new SimilarityNumber(0);
	}

	@Override
	public SimilarityPercentage normalize(Similarity similarity) {
		return null;
	}

	@Override
	public String getFeatureType() {
		return WordCountMeasure.class.getName();
	}

	@Override
	public String getName() {
		return "Word Count Measure";
	}

	@Override
	public Class<WordCountMeasure> getType() {
		return WordCountMeasure.class;
	}

}

...

Page tree

Versions Compared

Old Version 53

New Version Current

Key

Anchor
StartHere
StartHere
Start Here

Anchor
DAP
DAP
The Data Access Proxy (DAP)

Anchor
Polyglot
Polyglot
Polyglot Software Server Scripts

Anchor
CommandLIne
CommandLIne
Command Line Applications

Bash Script

Batch File

Anchor
GUI
GUI
GUI Applications

AutoHotKey

Anchor
DFDL
DFDL
DFDL Schemas

Anchor
DTS
DTS
The Data Tilling Service (DTS)

Anchor
Medici
Medici
Medici Extractors

Anchor
Java
Java
Java

Anchor
Python
Python
Python

Calling R Scripts from Python

Anchor
Versus
Versus
Versus Extractors

Anchor
Java Measure
Java Measure
Java

Adding Tools to the DAP and DTS, Overview and Examples

Introduction

Page tree

Page History

Versions Compared

Old Version 53

New Version Current

Key

AnchorStartHereStartHereStart Here

AnchorDAPDAPThe Data Access Proxy (DAP)

AnchorPolyglotPolyglotPolyglot Software Server Scripts

AnchorCommandLIneCommandLIneCommand Line Applications

Bash Script

Batch File

AnchorGUIGUIGUI Applications

AutoHotKey

AnchorDFDLDFDLDFDL Schemas

AnchorDTSDTSThe Data Tilling Service (DTS)

AnchorMediciMediciMedici Extractors

AnchorJavaJavaJava

AnchorPythonPythonPython

Calling R Scripts from Python

AnchorVersusVersusVersus Extractors

AnchorJava MeasureJava MeasureJava

Adding Tools to the DAP and DTS, Overview and Examples

Introduction

Anchor
StartHere
StartHere
Start Here

Anchor
DAP
DAP
The Data Access Proxy (DAP)

Anchor
Polyglot
Polyglot
Polyglot Software Server Scripts

Anchor
CommandLIne
CommandLIne
Command Line Applications

Anchor
GUI
GUI
GUI Applications

Anchor
DFDL
DFDL
DFDL Schemas

Anchor
DTS
DTS
The Data Tilling Service (DTS)

Anchor
Medici
Medici
Medici Extractors

Anchor
Java
Java
Java

Anchor
Python
Python
Python

Anchor
Versus
Versus
Versus Extractors

Anchor
Java Measure
Java Measure
Java