You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 49 Next »

Adding Tools to the DAP and DTS, Overview and Examples


Introduction

This guide is intended as an introduction for new users working with the Brown Dog software platform.  An introduction to the 3 main components of the platform, Polyglot, Medici, and Versus will be presented, and examples of scripts and code are provided.  These 3 tools can be leveraged to add tools to the Data Access Proxy (DAP) and Data Tilling Service (DTS).

 

Prerequisites

This overview assumes a basic level of knowledge about the three main components of the Brown Dog software platform, Polyglot, Medici, and Versus.  Some background information will be provided, however, for a more in depth overview of each of these components and their function, it is recommended that you take the opportunity to view the provided online tutorial sessions that may be found on the ISDA's YouTube account: http://www.youtube.com/channel/UCGIXAeNEa2v7Gt-tvfdJPvw.

Polyglot

Overview

Polyglot is intended to be a universal and scalable file format converter.  Data preservation and curation is an extremely difficult problem faced by many within the scientific community.  One of the most difficult issues faced by those hoping to preserve data is that over time the file formats used to store important scientific data may become unreadable as software becomes obsolete, or perhaps the software required to read a particular file format is unavailable to a particular user due to lack of access to the relevant program or licensing issues.  When this occurs accessing data stored in these file formats requires the user to find a different program to read the files, or in many cases to convert the data to a different file format.  In addition to this, some proprietary file formats may be unreadable to those who might benefit from the data if access to the appropriate software is not available.  Again, in this case the user is forced to find a different program to read the files, or to convert them to a different format.  Polyglot seeks to allow the user to convert from any file format to one that is supported by the software available to the user.  In this way, Polyglot preserves data, allowing data that might otherwise become unusable to persist over time.

Brief History

Information Loss

File Formats

Scripting

  • AutoHotKey
    • Open
      ;Adobe Acrobat (v9.3.0 Pro Extended)
      ;document
      ;pdf
      
      ;Parse input filename
      arg1 = %1%
      StringGetPos, index, arg1, \, R
      ifLess, index, 0, ExitApp
      index += 2
      input_filename := SubStr(arg1, index)
      
      ;Run program if not already running
      IfWinNotExist, Adobe 3D Reviewer
      {
        Run, C:\Program Files\Adobe\Acrobat 9.0\Acrobat\Acrobat.exe
        WinWait, Adobe Acrobat Pro Extended
      }
      
      ;Activate the window
      WinActivate, Adobe Acrobat Pro Extended
      WinWaitActive, Adobe Acrobat Pro Extended
      
      ;Open document
      Send, ^o
      WinWait, Open
      ControlSetText, Edit1, %1%
      ControlSend, Edit1, {Enter}
      
      ;Make sure model is loaded before exiting
      Loop
      {
        IfWinExist, %input_filename% - Adobe Acrobat Pro Extended
        {
          break
        }
      
        Sleep, 500
      }
      Save
      ;Adobe Acrobat (v9.3.0 Pro Extended)
      ;document
      ;doc, html, jpg, pdf, ps, rtf, txt
      
      ;Parse output format
      arg1 = %1%
      StringGetPos, index, arg1, ., R
      ifLess, index, 0, ExitApp
      index += 2
      out := SubStr(arg1, index)
      
      ;Parse filename root
      StringGetPos, index, arg1, \, R
      ifLess, index, 0, ExitApp
      index += 2
      name := SubStr(arg1, index)
      StringGetPos, index, name, ., R
      ifLess, index, 0, ExitApp
      name := SubStr(name, 1, index)
      
      ;Activate the window
      WinActivate, %name%.pdf - Adobe Acrobat Pro Extended
      WinWaitActive, %name%.pdf - Adobe Acrobat Pro Extended
      
      ;Save document
      Send, ^S
      WinWait, Save As
      
      if(out = "doc"){
        ControlSend, ComboBox3, m
      }else if(out = "html"){
        controlSend, ComboBox3, h
      }else if(out = "jpg"){
        controlSend, ComboBox3, j
      }else if(out = "pdf"){
        controlSend, ComboBox3, a
      }else if(out = "ps"){
        controlSend, ComboBox3, p
        controlSend, ComboBox3, p
        controlSend, ComboBox3, p
        controlSend, ComboBox3, p
        controlSend, ComboBox3, p
      }else if(out = "rtf"){
        controlSend, ComboBox3, r
      }else if(out = "txt"){
        controlSend, ComboBox3, t
        controlSend, ComboBox3, t
      }
      
      ControlSetText, Edit1, %1%
      ControlSend, Edit1, {Enter}
      
      ;Return to main window before exiting
      Loop
      {
        ;Continue on if main window is active
        IfWinActive, %name%.pdf - Adobe Acrobat Pro Extended
        { 
          break
        }
      
        ;Click "Yes" if asked to overwrite files
        IfWinExist, Save As
        {
          ControlGetText, tmp, Button1, Save As
      
          if(tmp = "&Yes")
          {
            ControlClick, Button1, Save As
          }
        }
      
        Sleep, 500
      }
      
      ;Wait a lit bit more just in case
      Sleep, 1000
      
      ;Close whatever document is currently open
      Send, ^w
      
      ;Make sure it actually closed before exiting
      Loop
      {
        ;Continue on if main window is active
        IfWinActive, Adobe Acrobat Pro Extended
        { 
          break
        }
      
        Sleep, 500
      }
      Kill
      ;Adobe Acrobat (v9.3.0 Pro Extended)
      
      ;Kill any scripts that could be using this application first
      RunWait, taskkill /f /im Acrobat_open.exe
      RunWait, taskkill /f /im Acrobat_save.exe
      
      ;Kill the application
      RunWait, taskkill /f /im Acrobat.exe
  • AppleScript
  • Python
  • Bash

Medici

Overview

Example Extractors

  • Java

    Connecting to RabbitMQ
    protected void startExtractor(String rabbitMQUsername,
    	String rabbitMQpassword) {
    	try{ 
     		//Open channel and declare exchange and consumer
    		ConnectionFactory factory = new ConnectionFactory();
    		factory.setHost(serverAddr);
    		factory.setUsername(rabbitMQUsername);
    		factory.setPassword(rabbitMQpassword);
    		Connection connection = factory.newConnection();
    
     		final Channel channel = connection.createChannel();
    		channel.exchangeDeclare(EXCHANGE_NAME, "topic", true);
    
    		channel.queueDeclare(QUEUE_NAME,DURABLE,EXCLUSIVE,AUTO_DELETE,null);
    		channel.queueBind(QUEUE_NAME, EXCHANGE_NAME, "*.file.text.plain.#");
     
     		this.channel = channel;
    
     		// create listener
    		channel.basicConsume(QUEUE_NAME, false, CONSUMER_TAG, new DefaultConsumer(channel) {
     			@Override
     			public void handleDelivery(String consumerTag, Envelope envelope, AMQP.BasicProperties properties, byte[] body) throws IOException {
    				messageReceived = new String(body);
     				long deliveryTag = envelope.getDeliveryTag();
     				// (process the message components here ...)
    				System.out.println(" {x} Received '" + messageReceived + "'");
     
    				replyProps = new AMQP.BasicProperties.Builder().correlationId(properties.getCorrelationId()).build();
    				replyTo = properties.getReplyTo();
     
    				processMessageReceived();
    				System.out.println(" [x] Done");
    				channel.basicAck(deliveryTag, false);
    			}
    		});
    
     		// start listening 
    		System.out.println(" [*] Waiting for messages. To exit press CTRL+C");
     		while (true) {
    			Thread.sleep(1000);
    		}
    	}
     	catch(Exception e){
    		e.printStackTrace();
    		System.exit(1);
    	} 
    }

     

  • C++

    Connecting to RabbitMQ
    #include <amqpcpp.h>
    
    namespace CPPExample {
    
      class RabbitMQConnectionHandler : public AMQP::ConnectionHandler {
          /**
          *  Method that is called by the AMQP library every time it has data
          *  available that should be sent to RabbitMQ. 
          *  @param  connection  pointer to the main connection object  
          *  @param  data        memory buffer with the data that should be sent to RabbitMQ
          *  @param  size        size of the buffer
          */
         virtual void onData(AMQP::Connection *connection, const char *data, size_t size)
         {
             // @todo 
             //  Add your own implementation, for example by doing a call to the
             //  send() system call. But be aware that the send() call may not
             //  send all data at once, so you also need to take care of buffering
             //  the bytes that could not immediately be sent, and try to send 
             //  them again when the socket becomes writable again
         }
    
          /**
          *  Method that is called by the AMQP library when the login attempt 
          *  succeeded. After this method has been called, the connection is ready 
          *  to use.
          *  @param  connection      The connection that can now be used
          */
          virtual void onConnected(Connection *connection)
          {
             // @todo
             //  add your own implementation, for example by creating a channel 
             //  instance, and start publishing or consuming
          }
    
          /**
          *  Method that is called by the AMQP library when a fatal error occurs
          *  on the connection, for example because data received from RabbitMQ
          *  could not be recognized.
          *  @param  connection      The connection on which the error occured
          *  @param  message         A human readable error message
          */
          virtual void onError(Connection *connection, const std::string &message)
          {
            // @todo
            //  add your own implementation, for example by reporting the error
            //  to the user of your program, log the error, and destruct the 
            //  connection object because it is no longer in a usable state
          }
      };
    
    }
    Receiver
    namespace CPPExample {
    
      /**
       *  Parse data that was recevied from RabbitMQ
       *  
       *  Every time that data comes in from RabbitMQ, you should call this method to parse
       *  the incoming data, and let it handle by the AMQP-CPP library. This method returns the number
       *  of bytes that were processed.
       *
       *  If not all bytes could be processed because it only contained a partial frame, you should
       *  call this same method later on when more data is available. The AMQP-CPP library does not do
       *  any buffering, so it is up to the caller to ensure that the old data is also passed in that
       *  later call.
       *
       *  @param  buffer      buffer to decode
       *  @param  size        size of the buffer to decode
       *  @return             number of bytes that were processed
       */
      size_t parse(char *buffer, size_t size)
      {
         return _implementation.parse(buffer, size);
      }
    }
  • Python

     
    Instantiating the logger and starting the extractor
    def main():
     global logger
    
     # name of receiver
    receiver='ExamplePythonExtractor'
    
     # configure the logging system
    logging.basicConfig(format="%(asctime)-15s %(name)-10s %(levelname)-7s : %(message)s", level=logging.WARN)
    logger = logging.getLogger(receiver)
    logger.setLevel(logging.DEBUG)
     
     if len(sys.argv) != 4:
    logger.info("Input RabbitMQ username, followed by RabbitMQ password and Medici REST API key.")
    sys.exit()
     
     global playserverKey
    playserverKey = sys.argv[3]
    Connecting to RabbitMQ
    # connect to rabbitmq using input username and password 
    credentials = pika.PlainCredentials(sys.argv[1], sys.argv[2])
    parameters = pika.ConnectionParameters(credentials=credentials)
    connection = pika.BlockingConnection(parameters)
     
     # connect to channel
    channel = connection.channel()
    
     # declare the exchange
    channel.exchange_declare(exchange='medici', exchange_type='topic', durable=True)
    
     # declare the queue
    channel.queue_declare(queue=receiver, durable=True)
    
     # connect queue and exchange
    channel.queue_bind(queue=receiver, exchange='medici', routing_key='*.file.text.plain')
    
     # create listener
    channel.basic_consume(on_message, queue=receiver, no_ack=False)
    
     # start listening
    logger.info("Waiting for messages. To exit press CTRL+C")
     try:
    channel.start_consuming()
     except KeyboardInterrupt:
    channel.stop_consuming()
    

Versus

Overview

Example Measures

  • Java

    Measure
    public class WordCountMeasure implements Serializable,Measure {
    
    	private static final long SLEEP = 10000;
    
    	@Override
    	public Similarity compare(Descriptor feature1, Descriptor feature2)
    			throws Exception {
    		Thread.sleep(SLEEP);
    		return new SimilarityNumber(0);
    	}
    
    	@Override
    	public SimilarityPercentage normalize(Similarity similarity) {
    		return null;
    	}
    
    	@Override
    	public String getFeatureType() {
    		return WordCountMeasure.class.getName();
    	}
    
    	@Override
    	public String getName() {
    		return "Word Count Measure";
    	}
    
    	@Override
    	public Class<WordCountMeasure> getType() {
    		return WordCountMeasure.class;
    	}
    
    }

 

  • No labels