You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

Load Balancer

This section documents the results of NDS-239 - Getting issue details... STATUS . The goal of this ticket was to determine whether the Nginx ingress controller would be a performance bottleneck for the NDS Labs system.

Baseline service: Nginx

This test uses the nginx-ingress-controller as the loadbalancer and a simple Nginx webserver as the backend service. An ingress rule was created manually to map perf-nginx.cluster.ndslabs.org to the backend service.

Load generation: boom

Use the boom load test generator to scale up concurrent requests on a Nebula m1.medium VM:

for i in  `seq 1 10`
do
   req=$((100*$i))
   echo "bin/boom -cpus 4 -n 1000 -c $req http://perf-nginx.iassist.ndslabs.org/"
   bin/boom -cpus 4 -n 1000 -c $req http://perf-nginx.iassist.ndslabs.org/
   sleep 1
done

Measuring latency and resource usage

Measuring latency: boom

The boom utility produces response time output including a summary of the average response time for each request as well as the distribution of response times and latency.

bin/boom -cpus 4 -n 1000 -c 500 http://perf-nginx.iassist.ndslabs.org/
Summary:
  Total:	0.1539 secs
  Slowest:	0.1335 secs
  Fastest:	0.0193 secs
  Average:	0.0685 secs
  Requests/sec:	4842.2840

Status code distribution:
  [200]	745 responses

Response time histogram:
  0.019 [1]	|
  0.031 [28]	|∎∎∎∎∎∎
  0.042 [110]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.054 [69]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.065 [161]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.076 [157]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.088 [60]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.099 [38]	|∎∎∎∎∎∎∎∎∎
  0.111 [49]	|∎∎∎∎∎∎∎∎∎∎∎∎
  0.122 [37]	|∎∎∎∎∎∎∎∎∎
  0.134 [35]	|∎∎∎∎∎∎∎∎

Latency distribution:
  10% in 0.0394 secs
  25% in 0.0502 secs
  50% in 0.0652 secs
  75% in 0.0808 secs
  90% in 0.1103 secs
  95% in 0.1217 secs
  99% in 0.1293 secs

 

Below is a plot of average response time with increasing concurrent requests (-n 1000 requests) and replicas. Average response times increase as the number of concurrent requests increase, but still remain below 1 second. Adding more replicas does not have an apparent effect, suggesting that the response time is related to the ingress load-balancer, not the backend service.

 

Below is a plot of the latency distribution at 25%, 50%, 75%, and 90% with increasing concurrent requests. At 600 concurrent requests, the number of requests with longer latency periods increases.

 

Measuring CPU/Memory utilization

Memory and CPU utilization was measured using pidstat. The nginx ingress controller has two worker threads in this test, labeled as proc1 and proc2 (process).

 

CPU utilization

The following table reports CPU utilization for each process during the boom test. %CPU peaks at 12%.

 %usr %system%guest%CPU 
 proc1proc2proc1proc2proc1proc2proc1proc2
15:56:1000000000
15:56:11060600012
15:56:1230300060
15:56:1330300060
15:56:14505000100
15:56:1500100010
15:56:1640400080
15:56:1700100010
15:56:18506000110
15:56:1910100020
15:56:2020400060
15:56:2110100020
15:56:2230400070
15:56:2310000010
15:56:2404050009
15:56:2501010002
15:56:26051600111
15:56:2700000000
15:56:28406000100
15:56:2900100010
15:56:3000000000

 

Memory utilization

The following table reports memory utilization for each process during the boom test. %MEM remains relatively stable throughout the test.

 

 minflt/s majflt/s VSZ RSS %MEM 
 proc1proc2proc1proc2proc1proc2proc1proc2proc1proc2
15:56:52000032613232599215208150680.380.37
15:56:53000032613232599215208150680.380.37
15:56:54300032613232599215208150680.380.37
15:56:55292990032532832599214404150680.360.37
15:56:5604770032532832757614404163600.360.4
15:56:57000032532832576814404148440.360.37
15:56:5806480032532832841614404172160.360.42
15:56:59000032532832532814404144040.360.36
15:57:00010210032532832942014404183600.360.45
15:57:01000032532832614014404152160.360.38
15:57:02000032532832614014404152160.360.38
15:57:0306300032532832676414404158400.360.39
15:57:04000032532832580814404148840.360.37
15:57:05010020032532832990814404188400.360.46
15:57:060470032532832562814404147040.360.36
15:57:07112750032532833078414404197160.360.49
15:57:08000032532832588414404149600.360.37
15:57:09015020032532833196014404207560.360.51
15:57:10000032532832532814404144040.360.36
15:57:11012580032532832912814404182040.360.45
15:57:12000032532832532814404144040.360.36
15:57:13000032532832532814404144040.360.36

Killing the loadbalancer

Running kubectl delete pod on the nginx-ilb pod, the running pod is in a terminating state for ~30 seconds. During this time, the replication controller creates a new pod, but it remains in a pending state for the 30 second period.  Some responses are handled, but there is the risk of ~30 seconds of downtime between pod restarts. This may be related to the shutdown of the default-http-backend, but this isn't clear.

 

  • No labels