Moving Computation Towards Data

This document contains discussions / plans about moving computation towards data.

Introduction

Moving computer closer to data is a well-known paradigm in the realm of Big Data. Suppose A is the site where data is hosted and B the site were computer / processing programs are hosted, transferring data from A to B for processing and processed data or metadata back from B to A will be a time consuming task as the amount of data increases. So, instead moving data around, a better approach is to move computer or the processing programs towards data. This is based on the assumption that generally executables or source code will use much lesser disk space when compared with data.

Rough Outline of Steps

From a site where data is residing (Site A), a Brown Dog client application (BD-client) will first open the local file that needs to be processed
BD-client will read the local file's file type
BD-client then hits an endpoint for finding out the extractors that are running at that moment and which can process the file type
The client can query the extractors (here detailed information is needed) to find out what dependencies it has, installs them, and submits the file for extraction at Site A.

Endpoint Tasks

The endpoint first queries the RabbitMQ server to get all the available the queues (/api/queues). This can be done based on specific virtual hosts.

Page tree

Moving Computation Towards Data

This document contains discussions / plans about moving computation towards data.

Introduction

Rough Outline of Steps

Endpoint Tasks