Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Small changes in content in "Design goal", "Algorithm" sections

...

To support auto-scaling of the system resources to adapt to the load of external requests to the Brown Dog Data Tiling Service. In general, this includes Medici, MongoDB, RabbitMQ, and the extractors, currently . Currently the design focuses only on auto-scaling the extractors. Specifically, the system needs to start or use more extractors when certain criterion is met, such as the number of outstanding requests exceed certain thresholdsexceeds a certain threshold – scaling up, and suspend or stop extractors when other criteria are met – scaling down. The intention of scaling down part is mainly to save resources (CPU, memory, etc.) for other purposes.

...

Three main ideas: Internet Suspend/Resume (allow a user to start using a VM before it is fully downloaded), indexing and searching of VM images, incrementally composing VM images. The last 2 have been done in Docker, below.

Impression: academic quality, feature and quality do not seem a good fit for Brown Dog project.

...

Brown Dog VM elasticity project needs to support multiple OSes, so OpenStack seems a viable high level solution. Currently considering using OpenStack.  May consider using Docker on the VMs at a low level if needed.

  • Algorithm / Logic

...

  1. the extractor is installed as a service on a VM, so when a VM starts, all the extractors that the VM contains as services will start automatically and successfully;
  2. the resource limitation of using extractors to process input data is CPU processing, not memory, hard disk I/O, or network I/O, so the design is only for scaling for CPU usage;
  3. we need the system needs to support multiple OS types, including both Linux and Windows;
  4. we assume that the entire Brown Dog system will be using system uses RabbitMQ as the messaging technology.

...

  1. RabbitMQ queue lengths and the number of consumers for the queues;
    Can be obtained using RabbitMQ management API. The number of consumers can be used to verify that the action to scale up/down succeeded.
  2. for each queue, the corresponding extractor name;
    Currently hard coded in the extractor code, so that queue name == extractor name.
  3. for a given extractor, the list of running VMs where an instance of the extractor is running, and the list of suspended VMs where it was running;
    Running VM list: can be obtained using RabbitMQ management API, queue --> connections --> IP.
    Suspended VM list: when suspending a VM, update the mapping for the given extractor, remove the entry from the running VM list and add it to the suspended VM list.
  4. the number of vCPUs of the VMs;
    This info is fixed for a given OpenStack flavor. The flavor must be specified when starting a VM, and this data can be stored at that time.
  5. the load averages of the VMs;
    For Linux, can be obtained by executing a command ("uptime" or "cat /proc/loadavg") with ssh (a bit long, last testing took 12 seconds from devstack host to a ubuntu machine, using ssh public key).
  6. for a given extractor, the list of VM images where the extractor is available.
    This is manual and static data. Can be stored in a config file, a MongoDB collection, or using other ways.

...