You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

This document contains discussions / plans about moving computation towards data.

Moving the computation, i.e. data manipulation or analysis code, closer to the data is becoming a much more frequently utilized approach when dealing with large data sets. For example, if A hosts a data set and the analysis code on that data is running on machine B, as the size of the data gets larger it becomes increasingly impractical to move the data from A to B for the analysis code to run.  The more frequently used alternative in these cases, especially as portable containerized code has become more practical with technologies such as docker, is to move the containerized analysis code over to the the machine hosting the data and executing it their as opposed to moving the data (given that the containers are significantly smaller than the datasets and assuming some computational resource is also avaialble on or near the server hosting the data).

Rough Outline of Steps

  1. From a site where data is residing (Site A), a Brown Dog client application (BD-client) will first open the local file that needs to be processed
  2. BD-client will read the local file's file type
  3. BD-client then hits an endpoint for finding out the extractors that are running at that moment and which can process the file type
  4. The client can query the extractors (here detailed information is needed) to find out what dependencies it has, installs them, and submits the file for extraction at Site A.

Endpoint Tasks

  1. The endpoint first queries the RabbitMQ server to get all the available the queues (/api/queues). This can be done based on specific virtual hosts.

 

  • No labels