Three main ideas: Internet Suspend/Resume (allow a user to start using a VM before it is fully downloaded), indexing and searching of VM images, incrementally composing VM images. The last 2 have been done in Docker, below.
Impression: academic quality, feature and quality do not seem a good fit for Brown Dog project.
Impression: server consolidation, web hosting. Seems to be production quality. For Linux only, so at least does not seem a good fit for the high-level VM architecture technology.
Impression: Similar to OpenVZ, an OS-level virtualization technology: a container runs the same kernel as the host, so at least does not seem a good fit for the high-level VM architecture technology. OpenVZ functionality + distributed architecture to set up repositories and pull images from and push images to the repositories. Popular, under active development.
As a hardware virtualization technology, OpenStack supports multiple OSes, such as Linux and Windows.
Brown Dog VM elasticity project needs to support multiple OSes, so OpenStack seems a viable high level solution. Currently considering using OpenStack. May consider using Docker on the VMs if needed.
To support auto-scaling of the system resources to adapt to the load of external requests to the Brown Dog Data Tiling Service. In general, this includes Medici, MongoDB, RabbitMQ, and the extractors, currently the design focuses only on auto-scaling the extractors. Specifically, the system needs to start or use more extractors when certain criterion is met, such as the number of outstanding requests exceed certain thresholds, and suspend or stop extractors when other criteria are met. The intention of scaling down part is mainly to save resources (CPU, memory, etc.) for other purposes.
The following assumptions are made in the design:
This VM elasticity system / module has the following data as inputs:
In the above data, items 2, 4 and 6 are static (or near static), the others are dynamic, changing at run time.
The system watches for the above data:
Periodically (configurable, such as every minute), check whether we need to scale up, and whether we need to scale down. These 2 checks can be in parallel, but if in parallel, need to protect and synchronize shared data, such as the list of running VMs.
The criterion for the need of scaling up is: RabbitMQ queue lengths > pre-defined thresholds, such as 100 or 1000.
Get the list of running queues, and iterate through them:
Get the list of IPs of the running VMs. Iterate through them:
If the number of messages in the past time period (configurable, say, 1 hour) is 0 for a given VM, summed across all extractors running on it, then it indicates that there is no work for the extractors on it to do, so we can suspend the VM to save resources for other tasks. Note that the threshold is 0. If there is any work for any extractor running on it, we'll keep the VM running.
In the future, we could make improvements to migrate extractors, for example, if 4 extractors run on vm1, and only extractor 1 has work to do, and there is an open slot to run extractor 1 on vm2 (where extractor 1 is already running), we could migrate the extractor 1 instance from vm1 to vm2, and suspend vm1 – but this is considered lower priority.
How to test.