Page History

...

This section documents the results of

Jira

server	JIRA
serverId	b14d4ad9-eb00-3a94-88ac-a843fb6fa1ca
key	NDS-239

. The goal of this ticket was to determine whether the Nginx ingress controller would be a performance bottleneck for the NDS Labs system.

Baseline service: Nginx

This test uses the nginx-ingress-controller as the loadbalancer and a simple Nginx webserver as the backend service. An ingress rule was created manually to map perf-nginx.cluster.ndslabs.org to the backend service.

Load generation: boom

Use the boom load test generator to scale up concurrent requests on a Nebula m1.medium VM:

...

Measuring latency and resource usage

Measuring latency: boom

Boom The boom utility produces response time output , for exampleincluding a summary of the average response time for each request as well as the distribution of response times and latency.

Code Block

bin/boom -cpus 4 -n 1000 -c 500 http://perf-nginx.iassist.ndslabs.org/
Summary:
  Total:	0.1539 secs
  Slowest:	0.1335 secs
  Fastest:	0.0193 secs
  Average:	0.0685 secs
  Requests/sec:	4842.2840

Status code distribution:
  [200]	745 responses

Response time histogram:
  0.019 [1]	|
  0.031 [28]	|∎∎∎∎∎∎
  0.042 [110]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.054 [69]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.065 [161]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.076 [157]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.088 [60]	|∎∎∎∎∎∎∎∎∎∎∎∎∎∎
  0.099 [38]	|∎∎∎∎∎∎∎∎∎
  0.111 [49]	|∎∎∎∎∎∎∎∎∎∎∎∎
  0.122 [37]	|∎∎∎∎∎∎∎∎∎
  0.134 [35]	|∎∎∎∎∎∎∎∎

Latency distribution:
  10% in 0.0394 secs
  25% in 0.0502 secs
  50% in 0.0652 secs
  75% in 0.0808 secs
  90% in 0.1103 secs
  95% in 0.1217 secs
  99% in 0.1293 secs

Measuring latency: netperf

Measure latency and throughput to services inside kubernetes

Measuring CPU/Memory/IO utilization

Results

Image RemovedImage Removed

Below is a plot of average response time with increasing concurrent requests (-n 1000 requests) and replicas. Average response times increase as the number of concurrent requests increase, but still remain below 1 second. Adding more replicas does not have an apparent effect, suggesting that the response time is related to the ingress load-balancer, not the backend service.

Image Added

Below is a plot of the latency distribution at 25%, 50%, 75%, and 90% with increasing concurrent requests. At 600 concurrent requests, the number of requests with longer latency periods increases.

Image Added

Measuring CPU/Memory utilization

Memory and CPU utilization was measured using pidstat. The nginx ingress controller has two worker threads in this test, labeled as proc1 and proc2 (process).

CPU utilization

The following table reports CPU utilization for each process during the boom test. %CPU peaks at 12%.

	%usr		%system		%guest		%CPU		minflt/s		majflt/s		VSZ		RSS	%MEM
	proc1	proc2	proc1	proc2	proc1	proc2	proc1	proc2	proc1	proc2
15:56:5210	0	0	0	0	3261320	325992	152080	15068	0.38	0.37
15:56:5311	0	6	0	6	0	0326132	325992	15208	15068	0.380.37	12
15:56:5412	3	0	3	0	0	326132	325992	15208	15068	0	60.38	0.37
15:56:5513	3	290	2993	0	0	3253280	3259926	14404	15068	0.360.37
15:56:5614	5	0	4775	0	0	3253280	32757610	14404	16360	0.36	0.4
15:56:5715	0	0	1	0	0	3253280	325768	144041	14844	0.36	0.37
15:56:5816	4	0	6484	0	0	325328	328416	14404	17216	0.36	0	8	00.42
15:56:5917	0	0	1	0	0	325328	325328	14404	0	114404	0.36	0.36
15:5756:0018	5	0	10216	0	0	325328	329420	14404	0	1118360	0.36	0.45
15:5756:0119	1	0	1	0	0	0	325328	326140	14404	15216	0.36	0.38	2	0
15:56:20	215:57:02	0	4	0	0	0	325328	326140	14404	15216	0.36	6	00.38
15:5756:0321	1	0	6301	0	0	3253280	3267642	14404	15840	0.36	0.39
15:5756:0422	3	0	4	0	0	0	325328	325808	14404	14884	0.36	0.37	7	0
15:56:23	115:57:05	0	10020	0	0	3253280	3299081	14404	18840	0.360.46
15:5756:0624	0	474	0	5	0	325328	325628	14404	147040	0.360.36	9
15:5756:0725	0	1	0	12751	0	0	325328	330784	14404	19716	0.360.49	2
15:5756:0826	0	5	1	60	0	0	325328	325884	14404	14960	0.36	1	110.37
15:5756:0927	0	15020	0	0	3253280	331960	144040	20756	0.36	0.51
15:5756:1028	4	0	6	0	0	0	325328	325328	14404	14404	0.3610	0.36
15:5756:1129	0	0	12581	0	0	325328	329128	14404	18204	0	10.36	0.45
15:5756:1230	0	0	0	0	325328	325328	14404	14404	0.36	0.36	15:57:13	0	00

Memory utilization

The following table reports memory utilization for each process during the boom test. %MEM remains relatively stable throughout the test.

	minflt/s		majflt/s		VSZ		RSS		%MEM
	proc1	proc2
0	325328	325328	14404	14404	0.36	0.36
	%usr		%system		%guest		%CPU
	proc1	proc2	proc1	proc2	proc1	proc2	proc1	proc2
15:56:1052	0	0	0	0	0	326132	325992	15208	150680	0.38	0.37
15:56:1153	0	0	60	0	326132	325992	152086	015068	0.38	012.37
15:56:1254	3	0	30	00	326132	325992	15208	15068	06.38	0.37
15:56:1355	29	299	03	0	325328	3325992	0144040	15068	0.36	6	0.37
15:56:1456	5	0	5477	0	0	325328	327576	14404	16360	0.36	10	0.4
15:56:1557	0	0	10	00	325328	325768	14404	14844	0.36	1	0.37
15:56:1658	4	0	4648	0	0	325328	328416	14404	17216	0.36	8	0.42
15:56:1759	0	0	10	00	325328	325328	14404	14404	0.36	1	0.36
15:5657:1800	5	0	61021	0	0	325328	0	329420	14404	18360	0.3611	0.45
15:5657:1901	10	0	10	00	325328	326140	14404	15216	0.36	2	0.38
15:5657:2002	20	0	40	00	325328	326140	14404	15216	06.36	0.38
15:5657:2103	1	0	1630	0	0	325328	326764	14404	15840	0.36	2	0.39
15:5657:2204	30	0	40	00	325328	325808	14404	14884	0.36	7	0.37
15:5657:2305	0	11002	0	0	325328	329908	1440400	18840	0.36	1	0.46
15:5657:2406	0	47	04	0	325328	325628	514404	014704	0.36	09.36
15:56:2557:07	1	1275	0	0	1325328	0330784	114404	019716	0.36	02.49
15:56:2657:08	0	0	0	0	325328	3258845	114404	614960	0.36	0	1	11.37
15:5657:2709	0	1502	0	00	325328	331960	0144040	20756	0.36	0.51
15:5657:2810	40	0	60	00	325328	325328	14404	14404	0.36	10	0.36
15:56:2957:11	0	1258	0	0	325328	3291281	0144040	18204	01.36	0.45
15:5657:3012	0	0	0	0	325328	325328	14404	14404	0.36	0.36
15:57:13	0	0

...

0

...

325328

...

325328

...

14404

Scaling services

Large-file upload/download

...

14404

0.36

Killing the loadbalancer

Running kubectl delete pod on the nginx-ilb pod, the running pod is in a terminating state for ~30 seconds. During this time, the replication controller creates a new pod, but it remains in a pending state for the 30 second period. Some responses are handled, but there is the risk of ~30 seconds of downtime between pod restarts. This may be related to the shutdown of the default-http-backend, but this isn't clear.

Space shortcuts

Page tree

Versions Compared

Old Version 5

New Version 6

Key