Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added the beginning part of the algorithm part, also added some sections below it, to be filled in later.

...

To support auto-scaling of the system resources to adapt to the load of external requests to the Brown Dog Data Tiling Service. In general, this includes Medici, MongoDB, RabbitMQ, and the extractors, currently the design focuses only on auto-scaling the extractors. Specifically, the system needs to start or use more extractors when certain criterion is met, such as the number of outstanding requests exceed certain thresholds, and suspend or stop extractors when other criteria are met. The intention of scaling down part is mainly to save resources (CPU, memory, etc.) for other purposes.

 

  • Algorithm / Logic

The following assumptions are made in the design:

  1. the extractor is installed as a service on a VM, so when a VM starts, all the extractors that the VM contains as services will start automatically;
  2. the resource limitation of using extractors to process input data is CPU processing, not memory, hard disk I/O, or network I/O, so the design is only for scaling for CPU usage;
  3. we need to support multiple OS types, including both Linux and Windows;
  4. we assume that the entire Brown Dog system will be using RabbitMQ as the messaging technology.

 

This VM elasticity system / module has the following data as inputs:

  1. RabbitMQ queue lengths and the number of consumers for the queues;
  2. for each queue, the corresponding extractor name;
  3. for a given extractor, the list of running VMs where an instance of the extractor is running, and the list of suspended VMs where it was running;
  4. the number of vCPUs of the VMs;
  5. the load averages of the VMs;
  6. for a given extractor, the list of VM images where the extractor is available.

In the above data, items 2, 4 and 6 are static (or near static), the others are dynamic, changing at run time.

Algorithm:

The system watches for the above data:

Periodically (configurable, such as every minute), check whether we need to scale up, and whether we need to scale down. These 2 checks can be in parallel, but if in parallel, need to protect and synchronize shared data, such as the list of running VMs.

 

The criterion for the need of scaling up is: RabbitMQ queue lengths > pre-defined thresholds, such as 100 or 1000.

If the threshold is reached

  • Programming Language
  • Testing

How to test.