-
Notifications
You must be signed in to change notification settings - Fork 7
Description
25 failures out of 200 runs (12.5% failure rate, excluding 18 cancelled).
Breakdown by error type
1. failed to execute command from device: command failed with exit code 1 — 7 runs
Primarily affects TestE2E_DeviceTelemetry (6x). The underlying error is typically TWAMP sender bind: cannot assign requested address. Has been happening since at least Dec 18.
- Jan 27 —
DeviceTelemetry - Jan 20 —
DeviceTelemetry,Multicast_Publisher - Jan 20 —
DeviceTelemetry - Jan 20 —
DeviceTelemetry - Jan 9 —
DeviceTelemetry - Jan 8 —
DeviceTelemetry,IBRL_WithAllocatedIP - Dec 18 —
DeviceTelemetry
2. failed to wait for client tunnel status BGP Session Up: polling cancelled or timed out — 5 runs
Only affects TestE2E_DeviceMaxusersRollover. Client correctly targets the second device after max-users is set to 0 on device1, but BGP session never establishes on device2 within the 90s timeout. Started Jan 22 — zero occurrences in any test before that date.
3. Condition never satisfied (route convergence timeout) — 3 runs
Affects TestE2E_MultiClient/ibrl_with_allocated_ip. Route polling times out waiting for expected routes to appear.
4. failed to start ledger: ... context deadline exceeded — 4 runs
Ledger (Solana) container fails to start in time. Affects different tests each time — likely CI resource exhaustion.
- Jan 21 —
MultiClient - Jan 20 —
Multicast_Publisher(co-occurring with controlplane: open source controller/agent #1) - Jan 15 —
SDK_Serviceability - Nov 27 —
SDK_Telemetry_InternetLatencySamples
5. failed to execute command from client: command failed with exit code 1 — 2 runs
Affects TestE2E_IBRL_WithAllocatedIP.
6. TestE2E_IBRL_WithAllocatedIP/doublezero_user_list — 3 runs
Test fails at doublezero_user_list check but no clear error extracted (assertion failures without detailed message).
7. Docker infra failure — 1 run
Massive failure: pull access denied for dz-local/client and multiple network not found errors. Almost every test failed. Likely a CI build/infra issue.
- Dec 3 — 8 tests failed
Key observations
- The top 3 error types (controlplane: open source controller/agent #1, activator: open source activator #2, client: open source doublezero client #3) account for 15 of the 25 failures and all look like timing/resource issues — commands timing out, BGP sessions not establishing, routes not converging.
- Error activator: open source activator #2 (BGP Session Up timeout) started on Jan 22 and has never occurred before that date across any test. It only hits
DeviceMaxusersRollover. The test itself wasn't changed around that time. - Error #4 (ledger startup timeout) is a clear CI resource exhaustion signal — spans the full date range.
TestE2E_IBRL_WithAllocatedIPhas been a long-running flake (since Nov 27) with various error types, mostly at thedoublezero_user_list/ban_userpost-connect checks.- Some runs have multiple tests failing simultaneously (e.g. Jan 8, Jan 20, Dec 3), reinforcing CI resource contention as a contributing factor.