Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Moving the computation, i.e. data manipulation or analysis code, closer to the data is becoming a much more frequently utilized approach when dealing with large data sets. For example, if A hosts a data set and the analysis code on that data is running on machine B, as the size of the data gets larger it becomes increasingly impractical to move the data from A to B for the analysis code to run.  The more frequently used alternative in these cases, especially as portable containerized code has become more practical with technologies such as docker, is to move the containerized analysis code over to the the machine hosting the data and executing it their as opposed to moving the data (given that the containers are significantly smaller than the datasets and assuming some computational resource is also avaialble available on or near the server hosting the data).

...