Uploaded image for project: 'BrownDog'
  1. BrownDog
  2. BD-1044

elasticity: need to detect and remove Docker containers that lost RabbitMQ connections

XMLWordPrintableJSON

      Found in dts-dev bi-hourly testing that on a cloud VM there were two OpenCV faces extractors, where the containers lost RabbitMQ connections, but were still running. We need to find a way to detect such idle extractors/converters and remove them.

      Root cause and analysis: Clowder on dts-dev was unresponsive. The faces extractor received a msg, tried to download the file from dts-dev Clowder, but did not get a reply, so they waited there. After a while, the RabbitMQ connection was closed due to heartbeat. The container still ran, and occupied a slot, so the elasticity module would not start more containers on the VM.

      Normal extractors have connections to the RabbitMQ server at port 5672:

      ubuntu@dts-dev-docker-4:~$ docker exec -i -t opencv-faces-3 netstat -an
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address Foreign Address State
      tcp 0 0 172.17.0.3:57330 141.142.227.65:5672 ESTABLISHED

      The extractors that lost RabbitMQ connection did not have a conn to port 5672, but were connected to Clowder at port 9000:

      ubuntu@dts-dev-docker-4:~$ docker exec -i -t opencv-faces-2 netstat -an
      Active Internet connections (servers and established)
      Proto Recv-Q Send-Q Local Address Foreign Address State
      tcp 0 0 172.17.0.2:46349 141.142.227.82:9000 ESTABLISHED

              Unassigned Unassigned
              ruiliu Rui Liu
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: