You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

This document contains discussions / plans about moving computation towards data.

Introduction

Moving computer closer to data is a well-known paradigm in the realm of Big Data. Suppose A is the site where data is hosted and B the site were computer / processing programs are hosted, transferring data from A to B for processing and processed data or metadata back from B to A will be a time consuming task as the amount of data increases. So, instead moving data around, a better approach is to move computer or the processing programs towards data. This is based on the assumption that generally executables or source code will use much lesser disk space when compared with data.

Rough Outline of Steps

  1. From a site where data is residing (Site A), a Brown Dog client application (BD-client) will first open the local file that needs to be processed
  2. BD-client will read the local file's file type
  3. BD-client then hits an endpoint for finding out the extractors that are running at that moment and which can process the file type
  4. The client can query the extractors (here detailed information is needed) to find out what dependencies it has, installs them, and submits the file for extraction at Site A.

Endpoint Tasks

  1. The endpoint first queries the RabbitMQ server to get all the available the queues (/api/queues). This can be done based on specific virtual hosts.
  • No labels