Skip to content

Commit 41ce270

Browse files
github-actions[bot]nginx-botciarams87
committed
NFR Test Results for NGF version 2.2.0 (#4124)
* NFR Test Results for NGF version 2.2.0 * Update summaries, add longevity results --------- Co-authored-by: nginx-bot <integrations@nginx.com> Co-authored-by: Ciara Stacke <c.stacke@f5.com>
1 parent 9fbef71 commit 41ce270

File tree

82 files changed

+1678
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1678
-0
lines changed
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: false
6+
7+
NGINX Gateway Fabric:
8+
9+
- Commit: 9fbef714ea22a35c4f1a8c97bd5b4e406ae0c1e9
10+
- Date: 2025-10-21T10:57:37Z
11+
- Dirty: false
12+
13+
GKE Cluster:
14+
15+
- Node count: 12
16+
- k8s version: v1.33.5-gke.1080000
17+
- vCPUs per node: 16
18+
- RAM per node: 65851524Ki
19+
- Max pods per node: 110
20+
- Zone: us-west1-b
21+
- Instance Type: n2d-standard-16
22+
23+
## Summary:
24+
25+
- 4 out of 5 tests showed slight latency increases, consistent with the trend noted in the 2.1.0 summary
26+
- The latency differences are minimal overall, with most changes under 1%.
27+
- The POST method routing increase of ~2.2% is the most significant change, though still relatively small in absolute terms (~21µs).
28+
- All tests maintained 100% success rates with similar throughput (~1000 req/s), indicating that the slight latency variations are likely within normal performance variance.
29+
30+
## Test1: Running latte path based routing
31+
32+
```text
33+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
34+
Duration [total, attack, wait] 30s, 29.999s, 925.889µs
35+
Latencies [min, mean, 50, 90, 95, 99, max] 681.943µs, 926.463µs, 901.993µs, 1.011ms, 1.053ms, 1.244ms, 30.638ms
36+
Bytes In [total, mean] 4770000, 159.00
37+
Bytes Out [total, mean] 0, 0.00
38+
Success [ratio] 100.00%
39+
Status Codes [code:count] 200:30000
40+
Error Set:
41+
```
42+
43+
## Test2: Running coffee header based routing
44+
45+
```text
46+
Requests [total, rate, throughput] 30000, 1000.01, 999.98
47+
Duration [total, attack, wait] 30.001s, 30s, 905.82µs
48+
Latencies [min, mean, 50, 90, 95, 99, max] 733.55µs, 951.898µs, 926.202µs, 1.037ms, 1.082ms, 1.248ms, 24.506ms
49+
Bytes In [total, mean] 4800000, 160.00
50+
Bytes Out [total, mean] 0, 0.00
51+
Success [ratio] 100.00%
52+
Status Codes [code:count] 200:30000
53+
Error Set:
54+
```
55+
56+
## Test3: Running coffee query based routing
57+
58+
```text
59+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
60+
Duration [total, attack, wait] 30s, 29.999s, 885.866µs
61+
Latencies [min, mean, 50, 90, 95, 99, max] 742.259µs, 965.539µs, 933.535µs, 1.04ms, 1.087ms, 1.345ms, 26.261ms
62+
Bytes In [total, mean] 5040000, 168.00
63+
Bytes Out [total, mean] 0, 0.00
64+
Success [ratio] 100.00%
65+
Status Codes [code:count] 200:30000
66+
Error Set:
67+
```
68+
69+
## Test4: Running tea GET method based routing
70+
71+
```text
72+
Requests [total, rate, throughput] 30000, 1000.01, 999.98
73+
Duration [total, attack, wait] 30.001s, 30s, 879.736µs
74+
Latencies [min, mean, 50, 90, 95, 99, max] 732.423µs, 938.723µs, 917.416µs, 1.022ms, 1.066ms, 1.241ms, 21.039ms
75+
Bytes In [total, mean] 4710000, 157.00
76+
Bytes Out [total, mean] 0, 0.00
77+
Success [ratio] 100.00%
78+
Status Codes [code:count] 200:30000
79+
Error Set:
80+
```
81+
82+
## Test5: Running tea POST method based routing
83+
84+
```text
85+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
86+
Duration [total, attack, wait] 30s, 29.999s, 880.839µs
87+
Latencies [min, mean, 50, 90, 95, 99, max] 725.559µs, 962.748µs, 938.978µs, 1.053ms, 1.098ms, 1.261ms, 23.289ms
88+
Bytes In [total, mean] 4710000, 157.00
89+
Bytes Out [total, mean] 0, 0.00
90+
Success [ratio] 100.00%
91+
Status Codes [code:count] 200:30000
92+
Error Set:
93+
```
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: true
6+
7+
NGINX Gateway Fabric:
8+
9+
- Commit: 9fbef714ea22a35c4f1a8c97bd5b4e406ae0c1e9
10+
- Date: 2025-10-21T10:57:37Z
11+
- Dirty: false
12+
13+
GKE Cluster:
14+
15+
- Node count: 12
16+
- k8s version: v1.33.5-gke.1080000
17+
- vCPUs per node: 16
18+
- RAM per node: 65851524Ki
19+
- Max pods per node: 110
20+
- Zone: us-west1-b
21+
- Instance Type: n2d-standard-16
22+
23+
## Summary:
24+
25+
- Average latency increased across all tests
26+
- Largest Increase: Header-based routing (+76.461µs, +8.60%)
27+
- Smallest Increase: Path-based routing (+28.988µs, +3.26%)
28+
- Average Overall Increase: ~51.1µs (+5.69% average across all tests)
29+
- Most Impacted: Header and query-based routing (8.60% and 5.91% respectively)
30+
- Method Routing: GET and POST both increased by ~5.3%
31+
- All tests maintained 100% success rate, similar throughput and similar max latencies
32+
33+
## Test1: Running latte path based routing
34+
35+
```text
36+
Requests [total, rate, throughput] 30000, 1000.09, 1000.06
37+
Duration [total, attack, wait] 29.998s, 29.997s, 893.093µs
38+
Latencies [min, mean, 50, 90, 95, 99, max] 702.667µs, 917.554µs, 892.32µs, 1.016ms, 1.066ms, 1.254ms, 21.001ms
39+
Bytes In [total, mean] 4740000, 158.00
40+
Bytes Out [total, mean] 0, 0.00
41+
Success [ratio] 100.00%
42+
Status Codes [code:count] 200:30000
43+
Error Set:
44+
```
45+
46+
## Test2: Running coffee header based routing
47+
48+
```text
49+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
50+
Duration [total, attack, wait] 30s, 29.999s, 883.984µs
51+
Latencies [min, mean, 50, 90, 95, 99, max] 752.053µs, 964.976µs, 939.422µs, 1.067ms, 1.123ms, 1.313ms, 16.259ms
52+
Bytes In [total, mean] 4770000, 159.00
53+
Bytes Out [total, mean] 0, 0.00
54+
Success [ratio] 100.00%
55+
Status Codes [code:count] 200:30000
56+
Error Set:
57+
```
58+
59+
## Test3: Running coffee query based routing
60+
61+
```text
62+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
63+
Duration [total, attack, wait] 30s, 29.999s, 916.972µs
64+
Latencies [min, mean, 50, 90, 95, 99, max] 745.707µs, 955.274µs, 931.109µs, 1.052ms, 1.102ms, 1.287ms, 17.84ms
65+
Bytes In [total, mean] 5010000, 167.00
66+
Bytes Out [total, mean] 0, 0.00
67+
Success [ratio] 100.00%
68+
Status Codes [code:count] 200:30000
69+
Error Set:
70+
```
71+
72+
## Test4: Running tea GET method based routing
73+
74+
```text
75+
Requests [total, rate, throughput] 30000, 1000.01, 999.98
76+
Duration [total, attack, wait] 30.001s, 30s, 938.936µs
77+
Latencies [min, mean, 50, 90, 95, 99, max] 723.854µs, 955.401µs, 930.464µs, 1.057ms, 1.114ms, 1.306ms, 18.287ms
78+
Bytes In [total, mean] 4680000, 156.00
79+
Bytes Out [total, mean] 0, 0.00
80+
Success [ratio] 100.00%
81+
Status Codes [code:count] 200:30000
82+
Error Set:
83+
```
84+
85+
## Test5: Running tea POST method based routing
86+
87+
```text
88+
Requests [total, rate, throughput] 30000, 1000.04, 1000.01
89+
Duration [total, attack, wait] 30s, 29.999s, 888.406µs
90+
Latencies [min, mean, 50, 90, 95, 99, max] 736.512µs, 956.475µs, 925.958µs, 1.049ms, 1.105ms, 1.293ms, 21.232ms
91+
Bytes In [total, mean] 4680000, 156.00
92+
Bytes Out [total, mean] 0, 0.00
93+
Success [ratio] 100.00%
94+
Status Codes [code:count] 200:30000
95+
Error Set:
96+
```
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: false
6+
7+
NGINX Gateway Fabric:
8+
9+
- Commit: e4eed2dad213387e6493e76100d285483ccbf261
10+
- Date: 2025-10-17T14:41:02Z
11+
- Dirty: false
12+
13+
GKE Cluster:
14+
15+
- Node count: 3
16+
- k8s version: v1.33.5-gke.1080000
17+
- vCPUs per node: 2
18+
- RAM per node: 4015668Ki
19+
- Max pods per node: 110
20+
- Zone: europe-west2-a
21+
- Instance Type: e2-medium
22+
23+
## Summary:
24+
25+
- Still a lot of non-2xx or 3xx responses, but vastly improved on the last test run.
26+
- This indicates that while most of the Agent - control plane connection issues have been resolved, some issues remain.
27+
- All the observed 502s happened within the one window of time, which at least indicates the system was able to recover - although it is unclear what triggered Agent
28+
- The increase in memory usage for NGF seen in the previous test run appears to have been resolved.
29+
- We observe a steady increase in NGINX memory usage over time which could indicate a memory leak.
30+
- CPU usage remained consistent with past results.
31+
- Errors seem to be related to cluster upgrade or some other external factor (excluding the resolved inferences pool status error).
32+
33+
## Traffic
34+
35+
HTTP:
36+
37+
```text
38+
Running 5760m test @ http://cafe.example.com/coffee
39+
2 threads and 100 connections
40+
Thread Stats Avg Stdev Max +/- Stdev
41+
Latency 202.19ms 150.51ms 2.00s 83.62%
42+
Req/Sec 272.67 178.26 2.59k 63.98%
43+
183598293 requests in 5760.00m, 62.80GB read
44+
Socket errors: connect 0, read 338604, write 82770, timeout 57938
45+
Non-2xx or 3xx responses: 33893
46+
Requests/sec: 531.24
47+
Transfer/sec: 190.54KB
48+
```
49+
50+
HTTPS:
51+
52+
```text
53+
Running 5760m test @ https://cafe.example.com/tea
54+
2 threads and 100 connections
55+
Thread Stats Avg Stdev Max +/- Stdev
56+
Latency 189.21ms 108.25ms 2.00s 66.82%
57+
Req/Sec 271.64 178.03 1.96k 63.33%
58+
182905321 requests in 5760.00m, 61.55GB read
59+
Socket errors: connect 10168, read 332301, write 0, timeout 96
60+
Requests/sec: 529.24
61+
Transfer/sec: 186.76KB
62+
```
63+
64+
## Key Metrics
65+
66+
### Containers memory
67+
68+
![oss-memory.png](oss-memory.png)
69+
70+
### Containers CPU
71+
72+
![oss-cpu.png](oss-cpu.png)
73+
74+
## Error Logs
75+
76+
### nginx-gateway
77+
78+
- msg: Config apply failed, rolling back config; error: error getting file data for name:"/etc/nginx/conf.d/http.conf" hash:"Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew=" permissions:"0644" size:5430: rpc error: code = NotFound desc = file not found -> happened twice in the 4 days, related to agent reconciliation during token rotation
79+
- {hashFound: jmeyy1p+6W1icH2x2YGYffH1XtooWxvizqUVd+WdzQ4=, hashWanted: Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew=, level: debug, logger: nginxUpdater.fileService, msg: File found had wrong hash, ts: 2025-10-18T18:11:24Z}
80+
- The error indicates Agent requested a file that had since changed
81+
82+
- msg: Failed to update lock optimistically: the server was unable to return a response in the time allotted, but may still be processing the request (put leases.coordination.k8s.io ngf-longevity-nginx-gateway-fabric-leader-election), falling back to slow path -> same leader election error as on plus, seems out of scope of our product
83+
84+
- msg: no matches for kind "InferencePool" in version "inference.networking.k8s.io/v1" -> Thousands of these, but fixed in PR 4104
85+
86+
### nginx
87+
88+
Traffic: nearly 34000 502s
89+
90+
- These all happened in the same window of less than a minute (approx 2025-10-18T18:11:11 - 2025-10-18T18:11:50), and resolved once NGINX restarted
91+
- It's unclear what triggered NGINX to restart, though it does appear a memory spike was observed around this time
92+
- The outage correlates with the config apply error seen in the control plane logs
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Results
2+
3+
## Test environment
4+
5+
NGINX Plus: true
6+
7+
NGINX Gateway Fabric:
8+
9+
- Commit: e4eed2dad213387e6493e76100d285483ccbf261
10+
- Date: 2025-10-17T14:41:02Z
11+
- Dirty: false
12+
13+
GKE Cluster:
14+
15+
- Node count: 3
16+
- k8s version: v1.33.5-gke.1080000
17+
- vCPUs per node: 2
18+
- RAM per node: 4015668Ki
19+
- Max pods per node: 110
20+
- Zone: europe-west2-a
21+
- Instance Type: e2-medium
22+
23+
## Summary:
24+
25+
- Total of 5 502s observed across the 4 days of the test run
26+
- The increase in memory usage for NGF seen in the previous test run appears to have resolved.
27+
- We observe a steady increase in NGINX memory usage over time which could indicate a memory leak.
28+
- CPU usage remained consistant with past results.
29+
- Errors seem to be related to cluster upgrade or some other external factor (excluding the resolved inferences pool status error).
30+
31+
## Key Metrics
32+
33+
### Containers memory
34+
35+
![plus-memory.png](oss-memory.png)
36+
37+
### Containers CPU
38+
39+
![plus-cpu.png](oss-cpu.png)
40+
41+
## Traffic
42+
43+
HTTP:
44+
45+
```text
46+
Running 5760m test @ http://cafe.example.com/coffee
47+
2 threads and 100 connections
48+
Thread Stats Avg Stdev Max +/- Stdev
49+
Latency 203.71ms 108.67ms 2.00s 66.92%
50+
Req/Sec 257.95 167.36 1.44k 63.57%
51+
173901014 requests in 5760.00m, 59.64GB read
52+
Socket errors: connect 0, read 219, write 55133, timeout 27
53+
Non-2xx or 3xx responses: 4
54+
Requests/sec: 503.19
55+
Transfer/sec: 180.96KB
56+
```
57+
58+
HTTPS:
59+
60+
```text
61+
Running 5760m test @ https://cafe.example.com/tea
62+
2 threads and 100 connections
63+
Thread Stats Avg Stdev Max +/- Stdev
64+
Latency 203.89ms 108.72ms 1.89s 66.92%
65+
Req/Sec 257.52 167.02 1.85k 63.64%
66+
173632748 requests in 5760.00m, 58.61GB read
67+
Socket errors: connect 7206, read 113, write 0, timeout 0
68+
Non-2xx or 3xx responses: 1
69+
Requests/sec: 502.41
70+
Transfer/sec: 177.84KB
71+
```
72+
73+
74+
## Error Logs
75+
76+
### nginx-gateway
77+
78+
msg: Failed to update lock optimistically: the server was unable to return a response in the time allotted, but may still be processing the request (put leases.coordination.k8s.io ngf-longevity-nginx-gateway-fabric-leader-election), falling back to slow path -> same leader election error as on oss, seems out of scope of our product
79+
80+
msg: Get "https://34.118.224.1:443/apis/gateway.networking.k8s.io/v1beta1/referencegrants?allowWatchBookmarks=true&resourceVersion=1760806842166968999&timeout=10s&timeoutSeconds=435&watch=true": context canceled -> possible cluster upgrade?
81+
82+
msg: no matches for kind "InferencePool" in version "inference.networking.k8s.io/v1" -> Thousands of these, but fixed in PR 4104
83+
84+
### nginx
85+
86+
Traffic: 5 502s
87+
88+
```
89+
INFO 2025-10-19T00:12:04.220541710Z [resource.labels.containerName: nginx] 10.154.15.240 - - [19/Oct/2025:00:12:04 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
90+
INFO 2025-10-19T18:38:18.651520548Z [resource.labels.containerName: nginx] 10.154.15.240 - - [19/Oct/2025:18:38:18 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
91+
INFO 2025-10-20T21:49:05.008076073Z [resource.labels.containerName: nginx] 10.154.15.240 - - [20/Oct/2025:21:49:04 +0000] "GET /tea HTTP/1.1" 502 150 "-" "-"
92+
INFO 2025-10-21T06:43:10.256327990Z [resource.labels.containerName: nginx] 10.154.15.240 - - [21/Oct/2025:06:43:10 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
93+
INFO 2025-10-21T12:13:05.747098022Z [resource.labels.containerName: nginx] 10.154.15.240 - - [21/Oct/2025:12:13:05 +0000] "GET /coffee HTTP/1.1" 502 150 "-" "-"
94+
```
95+
96+
No other errors identified in this test run.
136 KB
Loading
163 KB
Loading
135 KB
Loading
124 KB
Loading

0 commit comments

Comments
 (0)