Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Added content for "scaling up" and "scaling down" algorithms

...

Periodically (configurable, such as every minute), check whether we need to scale up, and whether we need to scale down. These 2 checks can be in parallel, but if in parallel, need to protect and synchronize shared data, such as the list of running VMs. 

Scaling up:

The criterion for the need of scaling up is: RabbitMQ queue lengths > pre-defined thresholds, such as 100 or 1000.

Get the list of running queues, and iterate through them:

  1. If the threshold is reached for a given queue, say, q1, then use the data item 2 above, find the corresponding extractor (e1). Currently this is hardcoded in the extractors, so that queue name == extractor name.
  2. Look up e1 to find the corresponding running VM list, say, (vm1, vm2, vm3).
  3. Go through the list one by one. If there's an open slot in the VM, meaning its #vCPUs > loadavg + <cpu_buffer_room> (configurable, such as 0.5), for example, vm1 #vCPUs == 2, loadavg = 1.2, then start another instance of e1 on vm1. Return. If there's no open slot on vm1, look at the next VM in the list. Return if an open slot is found and another instance of e1 is started.
  4. If we go through the entire list and there's no open slot, or the list is empty, then look up e1 to find the corresponding suspended VM list, say, (vm4, vm5).  If the list is not empty, resume the first VM in the list. Return.
  5. If the above suspended VM list is empty, then we need to start a new VM to have more e1 instances. Look up e1 to find a VM image that contains it. Start a new VM using that image.

Scaling down:

Get the list of IPs of the running VMs. Iterate through them:

If the number of messages in the past time period (configurable, say, 1 hour) is 0 for a given VM, summed across all extractors running on it, then it indicates that there is no work for the extractors on it to do, so we can suspend the VM to save resources for other tasks. Note that the threshold is 0. If there is any work for any extractor running on it, we'll keep the VM running.

In the future, we could make improvements to migrate extractors, for example, if 4 extractors run on vm1, and only extractor 1 has work to do, and there is an open slot to run extractor 1 on vm2 (where extractor 1 is already running), we could migrate the extractor 1 instance from vm1 to vm2, and suspend vm1 – but this is considered lower priority.

  • Programming Language
  • Testing

...