...
- the extractor is installed as a service on a VM, so when a VM starts, all the extractors that the VM contains as services will start automatically and successfully;
- the resource limitation of using extractors to process input data is CPU processing, not memory, hard disk I/O, or network I/O, so the design is only for scaling for CPU usage;
- the system needs to support multiple OS types, including both Linux and Windows;
- the system uses RabbitMQ as the messaging technology.
Algorithm:
This The VM elasticity system / module maintains and uses the following data:
- RabbitMQ queue lengths and the number of consumers for the queues;
Can be obtained using RabbitMQ management API. The number of consumers can be used to verify that the action to scale up/down succeeded. - for each queue, the corresponding extractor name;
Currently hard coded in the extractor code, so that queue name == extractor name. - for a given extractor, the list of running VMs where an instance of the extractor is running, and the list of suspended VMs where it was running;
Running VM list: can be obtained using RabbitMQ management API, queue --> connections --> IP.
Suspended VM list: when suspending a VM, update the mapping for the given extractor, remove the entry from the running VM list and add it to the suspended VM list. - the number of vCPUs of the VMs;
This info is fixed for a given OpenStack flavor. The flavor must be specified when starting a VM, and this data can be stored at that time. - the load averages of the VMs;
For Linux, can be obtained by executing a command ("uptime" or "cat /proc/loadavg") with ssh (a bit long, last testing took 12 seconds from devstack host to a ubuntu machine, using ssh public key). If needed, can use a separate thread to get this data. - for a given extractor, the list of VM images where the extractor is available.
This is manual and static data. Can be stored in a config file, a MongoDB collection, or using other ways.
...