Uploaded image for project: 'National Data Service'
  1. National Data Service
  2. NDS-705

Kubernetes services/ingress are sometimes not cleaned up properly on delete


    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Minor Minor
    • None
    • None
    • Backend, Infrastructure
    • None
    • NDS Sprint 19, NDS Sprint 21

      Logged into my test node (where I did the bulk of the Selenium testing) and, despite having deleted all of my applications, there are many lingering services left in Kubernetes:

      core@lambert8-dev ~ $ kubectl get svc --namespace=lambert8
      NAME                    CLUSTER-IP   EXTERNAL-IP   PORT(S)              AGE
      s3mcak-rabbitmq    <none>        5672/TCP,15672/TCP   19d
      s48qeo-rabbitmq   <none>        5672/TCP,15672/TCP   18d
      s7txia-rabbitmq    <none>        5672/TCP,15672/TCP   19d
      sa1l1c-rabbitmq    <none>        5672/TCP,15672/TCP   19d
      sahwe4-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      sbvgh5-rabbitmq    <none>        5672/TCP,15672/TCP   18d
      scjwha-rabbitmq    <none>        5672/TCP,15672/TCP   17d
      scsxiq-rabbitmq   <none>        5672/TCP,15672/TCP   17d
      sii2gy-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      sjh1ry-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      skl5hm-cloudbrowserui    <none>        80/TCP               19d
      sl21j2-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      ssblma-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      sw3ulv-rabbitmq    <none>        5672/TCP,15672/TCP   18d
      szn5g5-rabbitmq   <none>        5672/TCP,15672/TCP   19d
      szraf2-rabbitmq   <none>        5672/TCP,15672/TCP   19d

      This is likely due to the e2e test suite creating and deleting them in rapid succession, but that is behavior we would like to correct if we should desire to scale this platform up any further.

      This feels like a thread problem (locking issue, race condition, etc), but I am not as familiar as I am in Java with Go and how it handles such things at a low level.

      This ticket is complete when the above behavior is diagnosed and the potential damages mitigated to the best of our ability.

              willis8 Craig Willis
              lambert8 Sara Lambert
              0 Vote for this issue
              2 Start watching this issue


                  Original Estimate - 2 hours
                  Time Spent - 1 hour, 45 minutes Remaining Estimate - 15 minutes
                  Time Spent - 1 hour, 45 minutes Remaining Estimate - 15 minutes
                  1h 45m