Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Updated the design according to Dr. McHenry's comments regarding configurable min num of instances for extractor types, and based on Smruti's suggestions in 2014 Q3 Brown Dog report.

...

  1. the length of the RabbitMQ queue > a pre-defined threshold, such as 100 or 1000, or
  2. the number of consumers (extractors) for this queue is 0below the configured minimum number.

 

Get the list of running queues, and iterate through them:

  1. If the threshold above criterion is reached for a given queue, say, q1, then use the data item 2 above, find the corresponding extractor (e1). Currently this is hardcoded in the extractors, so that queue name == extractor name. If the number of consumers / extractors for this queue is 0, find e1, and go straight to step 4 below to start a VM that contains e1.
  2. Look up e1 to find the corresponding running VM list, say, (vm1, vm2, vm3).
  3. Go through the list one by one. If there's an open slot in the VM, meaning its #vCPUs > loadavg + <cpu_buffer_room> (configurable, such as 0.5), for example, vm1 #vCPUs == 2, loadavg = 1.2, then start another instance of e1 on vm1. Return Finish working on this queue and go back to Step 1 for the next queue. If there's no open slot on vm1, look at the next VM in the list. Return Finish working on this queue and go back to Step 1 for the next queue if an open slot is found and another instance of e1 is started.
  4. If we go through the entire list and there's no open slot, or the list is empty, then look up e1 to find the corresponding suspended VM list, say, (vm4, vm5).  If the list is not empty, resume the first VM in the list. If unsuccessful, go to the next VM in the list.  After a successful resumption, look up and find the other extractors running in the VM, and set a mark for them so that this scaling logic will skip these other extractors, as resuming this VM would also resume them. Return Finish working on this queue and go back to Step 1 for the next queue.
  5. If the above suspended VM list is empty, then we need to start a new VM to have more e1 instances. Look up e1 to find a VM image that contains it. Start a new VM using that image. Similar to the above step, after this, look up and find the other extractors available in the VM, and set a mark for them so that this scaling logic will skip these other extractors, as starting this VM would also resume them.

...

    1. Stop idle extractor instances:
      Find out idle queues (no data / activity for a configurable period of time). For each such queue, find out the running VMs and the number of extractor instances. We allow a user to specify the minimum number of total running instances for an extractor type in the config file. If the number of extractor instances is > 1, stop all of theminstances that still keep the min number of running instances for the extractor type, leaving the first instance running on each machine.

    2. Suspend idle VMs.

Get the list of IPs of the running VMs. Iterate through them:

If the number of messages in the past there is no RabbitMQ activity on a VM in a configurable time period (configurable, say, 1 hour) is 0 for a given VM, summed across all extractors running on it, then it indicates that there is no work for the extractors on it to do, so we can suspend the VM to save resources for other tasks. Note that the threshold is 0. If there is any work However, if suspending this VM would decrease the number of running instances for any extractor running that runs on it , we'll keep the VM below the minimum number configured for that extractor type, we do not suspend it and will leave it running.

Notes:

  1. This logic is suitable for a production environment. For a testing environment or a system that's not busy, this logic could suspend many or even all VMs since there is not much or no request, and lead to a slow start – only the next time the check is done (say, every minute), this module will notice that the number of extractors for the queues are 0 and resume the VMs. We could make it configurable whether or not to maintain at least one extractor running for each queue – it's a balance of potential waste of system resource vs. fast processing startup time.
  2. In the future, we could support migrating extractors. For example, if 4 extractors run on vm1, and only extractor 1 has work to do, and there is an open slot to run extractor 1 on vm2 (where extractor 1 is already running), we could migrate the extractor 1 instance from vm1 to vm2, and suspend vm1 – but this is considered lower priority.

...