Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: small wording change
  • Design Goal / Requirement

    BD-703. To add the use of Docker containers as a new granularity of deployment in the VM elasticity module,

...

  • to expand its functionality.

  • Design

    questions

    Questions and

    answers:

    Answers

  1. Should an extractor type be managed by both an VM image and dockerDocker, or only one of them?
    A: only one for simplicity. So need to specify this piece of information in the config file.
  2. Support managing the extractors both using VM images and using Docker containers at the same time?
    A: Yes.
  3. Separate Docker machines than the other machines to host Docker containers, or on the same machines?

    A: Separate. 

  4. Docker image storage: docker hub, or a private registry?
    A:

    go with the docker

    Docker hub for now. Setting up a private registry takes time and a secure one requires getting a certificate from a CA. Can do it later when needed. Use ncsa/clowder-ocr, ncsa/clowder-python-base, ncsa/clowder-opencv-closeups, etc. for now.

    Done.

  5. How to do we restart a docker container if the application crashed/stopped?
    A: docker run --restart=always ...
    This will retry indefinitely, but with a delay that doubles before each retry, starting from 100 ms (0.1 second), to avoid flooding the server. Can also consider using   using "--restart=on-failure[:max-retries]" to limit the number of   of retries, but then that could leave a container in the stopped   stopped state, without any component to restart it. Usually a RabbitMQ server restart would cause an error, and the error was observed to persist for about 2 minutes.
  6. How to do we scale up an application managed by docker?
    A: see below.
  7. How do we How to scale down?  
    A: see abovebelow.
    1. Do we suspend and resume docker VMs, or always keep them running?  
      A: We suspend and resume docker VMs, but keep at least 1.
  8. We need to run some VMs exclusively A manually created and updated VM image is used to start "Docker machines" or "Docker VMs" to host the docker containers. How do we start them? Externally the Docker machines – externally bootstrap, or start them using the elasticity module?  
    A: need to add Use the elasticity module. Add the docker VM image info in the config file, so the   module knows how to start a new docker VM. Can start one at the   beginning of the elasticity. Later on as needed start more.set a min # of 1. Later on the scaling-up logic will start more as needed. Need special handling: the dockerized extractors depend on its existence, so if not already existing, a Docker machine needs to be started first.
  9. How do we how to detect idle extractors managed by docker?  
    A: use Same logic using the RabbitMQ API as before. After detection, perform docker-specific commands to stop the idle extractors.
  10. How do we how to detect idle docker machines if no container runs on them?   A: add data structure for docker machines. Find docker VMs that  
    A: Since a Docker machine itself does not make RabbitMQ connections.If no extractor runs on it, it won't show up in the extr->VM or VM->extr maps, so need to maintain separate maps for the Docker machines. Find the Docker machines that have no extractors running on them, add them to the idle machine   list, or somehow signal that they can be suspended.VM list.
  11. How do we specify mapping between docker images and the corresponding extractors?
    how to specify mapping of docker images with extractors?   A: add a [Docker] section in the config file. extr1 -> dockerimg1.   When starting , 1 to 1 mapping: "extr1: dockerimg1". When starting the elasticity module, load the config file, and check   check for errors: one extractor type should be managed only by one   one method: either docker or a VM image. If such configuration errors   errors exist, print out, and use docker (a default type such as docker – also make this choice configurable too).
  • Algorithm / Logic

Assumptions:

The following assumptions are made in the design:

  1. an extractor is installed as a service on a VM, so when a VM starts, all the extractors that the VM contains as services will start automatically and successfully; if services do not fulfill all requirements, we might have to look into alternatives;
  2. the resource limitation of using extractors to process input data is CPU processing, not memory, disk I/O, or network I/O, so the design is only for scaling for CPU usage;
  3. the system needs to support multiple OS types, including both Linux and Windows;
  4. the system uses RabbitMQ as the messaging technology.

Algorithm:

The VM elasticity module maintains and uses the following data:

  1. RabbitMQ queue lengths and the number of consumers for the queues;
    Can be obtained using RabbitMQ management API. The number of consumers can be used to verify that the action to scale up/down succeeded.
  2. for each queue, the corresponding extractor name;
    Currently hard coded in the extractor code, so that queue name == extractor name.
  3. for a given extractor, the list of running VMs where an instance of the extractor is running, and the list of suspended VMs where it was running;
    Running VM list: can be obtained using RabbitMQ management API, queue --> connections --> IP.
    Suspended VM list: when suspending a VM, update the mapping for the given extractor, remove the entry from the running VM list and add it to the suspended VM list.
    Also maintain the data of mapping from a running/suspended VM to the extractors that it contains. This is useful in the scaling up part.
  4. the number of vCPUs of the VMs;
    This info is fixed for a given OpenStack flavor. The flavor must be specified when starting a VM, and this data can be stored at that time.
  5. the load averages of the VMs;
    For Linux, can be obtained by executing a command ("uptime" or "cat /proc/loadavg") with ssh. Verified that using ssh connection multiplexing (SSH ControlMaster), we can get it quickly in <1 second, usually 0.3 second. If needed, can use a separate thread to get this data, instead of in-line in the execution flow.
  6. for a given extractor type, the list of VM images where the extractor is available, and the service name to start another extractor instance, i.e., a pair of (VM image name, service name). The service name is needed only for running additional extractor instances, since the first instance of that extractor will be started automatically as a service.
    Also maintain the data of mapping from a given VM image to the extractors it contains. This is useful in the scaling up part.
    This is manual and static data. Can be stored in a config file, a MongoDB collection, or using other ways.
  7. the last times a request is processed by the VMs, and in the queues.
    The VM part can be obtained using the RabbitMQ management API, /api/channels/: "idle_since" and "peer_host". Need to aggregate the channels that have the same peer_host IP, and skip the ones on the localhost. This info is used in the scaling down part for suspending a VM.
    The queue part can be obtained using the RabbitMQ management API, /api/queues: "idle_since". Used in the scaling down part for stopping extractor instances.

In the above data, items 2, 4 and 6 are static (or near static), the others are dynamic, changing at run time.

 

Periodically (configurable, such as every minute), the system checks whether we need to scale up, and whether we need to scale down. These 2 checks can be done in parallel, but if so, the system needs to protect and synchronize the shared data, such as the list of running VMs.

Scaling up:

The criterion for the need of scaling up – to start more extractors for a queue – is:

  1. the length of the RabbitMQ queue > a pre-defined threshold, such as 100 or 1000, or
  2. the number of consumers (extractors) for this queue is below the configured minimum number.

 

Get the list of running queues, and iterate through them:

  1. If the above criterion is reached for a given queue, say, q1, then use the data item 2 above, find the corresponding extractor (e1). Currently this is hardcoded in the extractors, so that queue name == extractor name.
  2. Look up e1 to find the corresponding running VM list, say, (vm1, vm2, vm3).
  3. Go through the list one by one. If there's an open slot in the VM, meaning its #vCPUs > loadavg + <cpu_buffer_room> (configurable, such as 0.5), for example, vm1 #vCPUs == 2, loadavg = 1.2, then start another instance of e1 on vm1. Finish working on this queue and go back to Step 1 for the next queue. If there's no open slot on vm1, look at the next VM in the list. Finish working on this queue and go back to Step 1 for the next queue if an open slot is found and another instance of e1 is started.
  4. If we go through the entire list and there's no open slot, or the list is empty, then look up e1 to find the corresponding suspended VM list, say, (vm4, vm5).  If the list is not empty, resume the first VM in the list. If unsuccessful, go to the next VM in the list.  After a successful resumption, look up and find the other extractors running in the VM, and set a mark for them so that this scaling logic will skip these other extractors, as resuming this VM would also resume them. Finish working on this queue and go back to Step 1 for the next queue.
  5. If the above suspended VM list is empty, then we need to start a new VM to have more e1 instances. Look up e1 to find a VM image that contains it. Start a new VM using that image. Similar to the above step, after this, look up and find the other extractors available in the VM, and set a mark for them so that this scaling logic will skip these other extractors, as starting this VM would also resume them.

At the end of the above iterations, we could consider verifying whether the expected increase in the number of extractors actually occurred or not, and print the result out.

Scaling down:

...

  1. a configurable item in the config file.
  2. Details of the Docker VM image?
    A: Ubuntu 14.04 base image + Docker installed.
    In the config file [OpenStack Image Info] section:
         docker-ubuntu-trusty = docker, m1.large, ubuntu, NCSA-Nebula, ''
    Use a larger flavor (4 or 8 CPUs), since one docker VM hosts multiple containers. Pull all needed docker images for the extractors for faster container start time at run time. Future enhancement: when starting the module, ensure that the docker images specified in the config file are valid and available, and pull them on to all Docker machines if not already.
  • Algorithm / Logic

    • Assumptions:
  1. The module manages the extractors using both VM images and Docker containers at the same time;
  2. A manually created and updated VM image is used to start "Docker machines" or "Docker VMs".
  3. Dockerized extractors run only in the Docker machines, the extractors managed by VM images do not run in the Docker machines.
  • Additional data structure to add support for Docker:
  1. a map to look up whether an extractor is managed by Docker or a VM image;
  2. a list/map of Docker machines, to start/suspend//resume Docker machines;
  3. a map of extractor to Docker images, to be used in adding extractor instances;
    • Scaling up
      Similar logic as that when managing with VM images. Differences:
      1. Split each of add_extractor_instance(), resume_VM_containing_this_extractor_successful() and start_new_VM_containing_this_extractor_successful() into 2 methods: one uses services, the other uses docker, and when entering, pick one depending on the extractor type.
      2. In resuming suspended VMs and starting new VMs, need to wait it to finish, then run a Docker command on it to add an extractor instance.
      3. Obviously, add new code with Docker commands to start/stop/remove the containers. Need new logic for naming the docker containers, possibly similar to that for naming the VMs.
    • Scaling down
    1. Stop idle extractor instances:
      for extr1 that's managed by docker:
      Loop through docker VMs:
             remove all extr1 containers, barring min #;
    2. Suspend idle VMs.
      If a docker VM has at least one app running on it,
          the RabbitMQ managment API will detect such an idle docker VM, and the existing logic will suspend it;
      otherwise, loop through docker VMs:
          suspend it if it does not have any containers running.
  • Other Considerations

  1. Threads:
    Due to time constraint, no threads will be used for now. May need to use multiple threads in the future. When scaling up, after resuming a suspended VM or starting a new docker machine, the module needs to log into it to start a new container, so need to block waiting for the resuming or starting to finish, and this may take a while – starting a new Docker VM takes 1–2 minutes.
  2. Remove stopped containers?
    Since starting a new container is fast, simplify the design of scaling down by removing the containers instead of leaving them around. Later can explore keeping them as stopped when scaling down, and start the stopped ones when scaling up – a possible future improvement

...

Get the list of IPs of the running VMs. Iterate through them:

If there is no RabbitMQ activity on a VM in a configurable time period (say, 1 hour), then there is no work for the extractors on it to do, so we can suspend the VM to save resources for other tasks. However, if suspending this VM would decrease the number of running instances for any extractor that runs on it below the minimum number configured for that extractor type, we do not suspend it and will leave it running.

Notes:

  1. This logic is suitable for a production environment. For a testing environment or a system that's not busy, this logic could suspend many or even all VMs since there is not much or no request, and lead to a slow start – only the next time the check is done (say, every minute), this module will notice that the number of extractors for the queues are 0 and resume the VMs. We could make it configurable whether or not to maintain at least one extractor running for each queue – it's a balance of potential waste of system resource vs. fast processing startup time.
  2. In the future, we could support migrating extractors. For example, if 4 extractors run on vm1, and only extractor 1 has work to do, and there is an open slot to run extractor 1 on vm2 (where extractor 1 is already running), we could migrate the extractor 1 instance from vm1 to vm2, and suspend vm1 – but this is considered lower priority.
  • Programming Language and Application Type

Continue with the existing use of Python and a stand-alone program.

  • Testing

  • Input
    Use a script to generate high request rates with OCR , and OpenCV extractors to test the scaling up part.   Stop sending the requests to test the scaling down part.
  • Output
    Use the OpenStack CLI / web UI for to monitor the VM partVMs, and use the RabbitMQ web UI and SSH ssh into the docker machines for to monitor the extractor partextractors, to verify that the docker containers are started/stopped, and the docker VMs are started/resumed/suspended as expected.

...