fix(ci): test stability improvements by lidel · Pull Request #2466 · ipfs/ipfs-webui

lidel · 2026-01-24T21:04:30Z

CI and E2E tests were completely broken for the past 7 months. This PR fixes them and makes the test infrastructure more main

What changed

E2E tests now pass reliably:
- Modernized all tests to use Playwright's locator API instead of brittle CSS selectors
- Centralized test fixtures (test/e2e/setup/fixtures.js) for page navigation and peer node management
- Added semantic locators (test/e2e/setup/locators.js) so tests read more clearly and break less often
- Upgraded Playwright to 1.58.0
- Fixed Kubo daemon cleanup in test teardown
CI runs faster:
- Consolidated 4 separate workflow files into one ci.yml
- Added npm/Playwright cache with smart invalidation (skip browser download on cache hit)
- Pinned Node.js and Go versions in .tool-versions to avoid surprise breakage from upstream (i remember this happening MORE THAN ONCE in past 5 years, due to NodeJS "nuances")
Better docs for contributors:
- Slimmed down README to focus on what the project does and quick start
- Moved detailed setup, CORS config, and test debugging to docs/developer-notes.md
- Added docs/RELEASING.md with release checklist

removed 10-shard matrix that was adding complexity and CI time overhead without providing proportional benefits. tests now run in a single job with a 10-minute timeout, which is sufficient given test suite completes in ~15 seconds locally. also simplified the two conditional test runs (repeated vs non-repeated) into a single step that always uses --reporter=list for clearer output.

removed conditional logic that changed behavior based on process.env.CI. this was causing inconsistency between local and CI test runs, making it harder to reproduce CI failures locally. now uses consistent settings: 30s timeout per test, 5-minute global timeout for entire suite, no retries, and always starts fresh server.

added timestamped logging to global-setup.js and ipfs-backend.js to help diagnose CI hangs when they occur. each major step now logs progress. added timeout wrappers around async operations that could hang indefinitely: - ipfs-backend startup: 60s timeout - kubo daemon spawn: 30s timeout also fixed two issues: - disabled DHT bootstrapping (Bootstrap: []) for faster daemon startup - changed addInitScript to page.evaluate so localStorage values are captured by storageState() before browser closes

files.test.js: - changed file verification to only check the two files we uploaded instead of iterating all MFS files. other tests may have added files that would cause unexpected matches. grid-view.test.js: - added focusGrid() helper that tries multiple approaches to establish keyboard focus on the grid container. this fixes intermittent failures where arrow key navigation would not work because focus was not set. - simplified test assertions to use playwright's built-in waiters instead of manual count checks. grid.js helper: - selectViewMode now waits for files view to be ready before checking current mode, preventing race conditions during page load.

the global teardown only removed the JSON config file but never called ipfsd.stop() on the spawned Kubo daemon. this left orphaned processes accumulating on CI runners, causing port conflicts and resource exhaustion. - export stop() function from ipfs-backend.js - call stop() in global-teardown.js before removing config file - add logging for teardown progress

trace: 'on-first-retry' was ineffective because retries=0, meaning traces would never be captured. changed to 'retain-on-failure' so traces are available when debugging test failures.

migrate from deprecated waitForSelector() pattern to modern locator API which provides better error messages and auto-waiting behavior. changes include: - replace page.waitForSelector() with page.locator().waitFor() - remove force:true clicks, use proper waits instead - fix missing await on click operations (files.test.js, ipns.test.js) - replace custom checkClassWithTimeout polling with waitForFunction - use .first() where multiple elements match to satisfy strict mode - use more specific selectors (button#id, [role="menuitem"]) to avoid ambiguous matches

the `promise/param-names` rule requires Promise constructor parameters to match `^_?resolve$` and `^_?reject$` patterns. changed `_` to `_resolve` in the timeout wrapper functions.

always run `playwright install --with-deps` regardless of cache status to ensure system dependencies are present. previously this was only run on cache miss, which could cause failures if deps were missing.

tests pass locally (~17s) but hang on CI until 10-minute timeout. added comprehensive timestamped logging at every async operation: - global-setup.js: log each step (port check, daemon spawn, browser launch, navigation) - ipfs-backend.js: log kubo lifecycle (factory, spawn, identity, config write) - global-teardown.js: log cleanup operations - test-e2e.yml: add shell timestamps, enable DEBUG=pw:api for Playwright logging logs output to both stdout and stderr to ensure CI captures output. timeout wrappers now warn at 80% of timeout before failing. after CI run, last log message before timeout identifies the hanging operation.

address CI hang by: - make webui port configurable via WEBUI_PORT env var - use dynamic port allocation in CI workflow - increase webServer timeout from 5s to 30s for CI - add stdout/stderr piping to capture webServer output - add build directory check before running tests - add config logging to track port and cwd - use 127.0.0.1 instead of localhost for consistency the CI was hanging with no output because Playwright initialization was blocking before globalSetup even ran. these changes will help identify exactly where the hang occurs.

previous CI runs showed 8-minute hangs with zero output from Playwright. this indicates cross-env or npm is buffering stdout. changes: - run playwright directly in CI instead of via npm script - add playwright version check before running tests - use fs.writeSync for config logging to bypass Node.js buffering - log environment variables to verify they're passed correctly

trying to isolate the CI hang: - removed DEBUG=pw:api which might cause infinite output buffering - removed NODE_OPTIONS which might affect behavior - added direct node test to load playwright.config.js independently - added timeout wrapper around playwright test command - added chromium installation dry-run check if config load test fails, the issue is in config/node setup. if config loads but playwright test hangs, issue is in playwright runner.

the CI was hanging because npx http-server was not starting. replaced it with a simple inline node http server that: - serves files from ./build directory - handles common MIME types - starts immediately without npx overhead also fixed eslint single-quote error.

replace inline node command with dedicated serve-build.js script to avoid issues with shell escaping and ES module requirements

- remove timeout wrapper that was hiding failures (exit code 124) - remove || echo that swallowed error codes - change webServer stdout/stderr from pipe to inherit for visibility - clean up unnecessary diagnostic steps

- remove serve-build.js, restore npx http-server - remove withTimeout() wrappers (not needed with Node.js fix) - keep stop() export and proper daemon shutdown - fix grid.js to use 127.0.0.1 (matches playwright config) - keep: locator API, Bootstrap:[], cache optimization, Node pin

- add data-testid attributes to File, FilesList, FilesGrid, GridFile - create fixtures.js with worker-scoped peerNode fixture for speed - create locators.js with centralized selector definitions - replace brittle CSS selectors with getByRole/getByTestId - replace waitFor() calls with web-first assertions (toBeVisible, toHaveClass) - update coverage.js to re-export from fixtures.js for backward compat - modernize all test files to use shared locators and fixtures - add test/e2e/test/ to gitignore (artifact from running tests)

…ions - add .tool-versions for nodejs 24.11.0 and golang 1.25 - upgrade actions/checkout v4 to v6 - upgrade actions/cache v4 to v5 - upgrade actions/setup-node v4 to v6 - upgrade actions/upload-artifact v4 to v6 - upgrade actions/download-artifact v4 to v7 - upgrade actions/setup-go v5 to v6 - switch setup-node and setup-go to use go-version-file/node-version-file - remove NODE_VERSION env and node-version inputs from reusable workflows - update README to point to .tool-versions for version info

- forbidOnly now only applies in CI, allowing local debugging with .only - restored "Bulk import" menu item assertion in files test

- pass secrets to reusable workflows so CODECOV_TOKEN is available - add node_modules caching with conditional npm ci to skip install on cache hit - include patches/** and .tool-versions in cache key to invalidate on changes - add missing npm install step to e2e-coverage job

… hit - on cache miss: run `playwright install --with-deps` (full install) - on cache hit: run `playwright install-deps` (only OS deps, ~45s faster)

- @playwright/test: 1.48.2 -> 1.58.0 - playwright-chromium: 1.48.2 -> 1.58.0 no breaking changes affect this codebase

- reorganize README with cleaner layout and navigation links - clarify Web UI is specifically for Kubo nodes - add features list and "Getting Help" section - move detailed dev docs to docs/developer-notes.md - move release instructions to docs/RELEASING.md - replace Matrix badge with Discourse forum badge

## [4.11.0](v4.10.0...v4.11.0) (2026-02-05) CID `bafybeidfgbcqy435sdbhhejifdxq4o64tlsezajc272zpyxcsmz47uyc64` --- ### Features * Add search/filter functionality to Files UI ([#2451](#2451)) ([c866be6](c866be6)), closes [#2447](#2447) * DHT Provide Sweep Diagnostic Screen ([#2463](#2463)) ([fb22ea6](fb22ea6)) * **files:** resolve paths before inspect and support protocol URL ([#2465](#2465)) ([74a44d8](74a44d8)) * **files:** support additional image file extensions ([#2347](#2347)) ([371341a](371341a)) ### Bug Fixes * **ci:** test stability improvements ([#2466](#2466)) ([d11475a](d11475a)) * CLI tutor commands missing some parameters ([#2470](#2470)) ([ed8ad6a](ed8ad6a)) * **diagnostics:** handle Go zero time in DHT provide screen ([dc51cd4](dc51cd4)) * **files:** not found page ([#2455](#2455)) ([18b9b0d](18b9b0d)) * show proper error state in import notifications ([#2452](#2452)) ([391470e](391470e)), closes [#2448](#2448) ### Trivial Changes * **ci:** skip publishPreview for dependabot PRs ([17f675e](17f675e)) * pull new translations ([#2467](#2467)) ([cc569f4](cc569f4)) * pull transifex translations ([#2464](#2464)) ([8d7a17f](8d7a17f))

ipfs-gui-bot · 2026-02-05T03:10:45Z

🎉 This PR is included in version 4.11.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

lidel added 4 commits January 24, 2026 22:00

lidel temporarily deployed to Deploy January 24, 2026 21:04 — with GitHub Actions Inactive

lidel mentioned this pull request Jan 24, 2026

feat: DHT Provide Sweep Diagnostic Screen #2463

Merged

7 tasks

lidel added 3 commits January 24, 2026 23:15

fix(e2e): enable trace capture on test failure

f2ae4ae

trace: 'on-first-retry' was ineffective because retries=0, meaning traces would never be captured. changed to 'retain-on-failure' so traces are available when debugging test failures.

lidel temporarily deployed to Deploy January 24, 2026 22:16 — with GitHub Actions Inactive

fix(e2e): use proper Promise parameter names for eslint

f39bad2

the `promise/param-names` rule requires Promise constructor parameters to match `^_?resolve$` and `^_?reject$` patterns. changed `_` to `_resolve` in the timeout wrapper functions.

lidel temporarily deployed to Deploy January 24, 2026 23:21 — with GitHub Actions Inactive

lidel force-pushed the fix/e2e-test-improvements branch from e6cd362 to 38393f3 Compare January 24, 2026 23:39

lidel temporarily deployed to Deploy January 24, 2026 23:39 — with GitHub Actions Inactive

ci(e2e): always install playwright deps

b9ed885

always run `playwright install --with-deps` regardless of cache status to ensure system dependencies are present. previously this was only run on cache miss, which could cause failures if deps were missing.

lidel force-pushed the fix/e2e-test-improvements branch from 38393f3 to b9ed885 Compare January 24, 2026 23:56

lidel temporarily deployed to Deploy January 24, 2026 23:57 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 25, 2026 00:51 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 25, 2026 01:49 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 25, 2026 02:08 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 25, 2026 02:25 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 25, 2026 02:48 — with GitHub Actions Inactive

test(e2e): add dedicated server script for static files

9b4cb09

replace inline node command with dedicated serve-build.js script to avoid issues with shell escaping and ES module requirements

lidel temporarily deployed to Deploy January 25, 2026 04:42 — with GitHub Actions Inactive

test(e2e): fix CI error masking and show server output

7fd1b42

- remove timeout wrapper that was hiding failures (exit code 124) - remove || echo that swallowed error codes - change webServer stdout/stderr from pipe to inherit for visibility - clean up unnecessary diagnostic steps

lidel temporarily deployed to Deploy January 26, 2026 02:45 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 26, 2026 03:59 — with GitHub Actions Inactive

lidel force-pushed the fix/e2e-test-improvements branch from 37ee0ab to 1e6740d Compare January 26, 2026 04:40

lidel temporarily deployed to Deploy January 26, 2026 04:40 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 26, 2026 05:28 — with GitHub Actions Inactive

lidel marked this pull request as ready for review January 26, 2026 05:39

lidel requested a review from a team as a code owner January 26, 2026 05:39

lidel mentioned this pull request Jan 26, 2026

feat(files): resolve paths before inspect and support protocol URL #2465

Merged

test(e2e): allow .only locally and restore bulk import check

054bff3

- forbidOnly now only applies in CI, allowing local debugging with .only - restored "Bulk import" menu item assertion in files test

lidel temporarily deployed to Deploy January 26, 2026 17:01 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 26, 2026 17:24 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 26, 2026 17:36 — with GitHub Actions Inactive

ci: optimize playwright install by skipping browser download on cache…

31014c6

… hit - on cache miss: run `playwright install --with-deps` (full install) - on cache hit: run `playwright install-deps` (only OS deps, ~45s faster)

lidel force-pushed the fix/e2e-test-improvements branch from 805ba53 to 31014c6 Compare January 26, 2026 17:40

lidel temporarily deployed to Deploy January 26, 2026 17:40 — with GitHub Actions Inactive

chore(deps): update playwright to 1.58.0

94c2348

- @playwright/test: 1.48.2 -> 1.58.0 - playwright-chromium: 1.48.2 -> 1.58.0 no breaking changes affect this codebase

lidel temporarily deployed to Deploy January 26, 2026 17:49 — with GitHub Actions Inactive

lidel temporarily deployed to Deploy January 26, 2026 18:12 — with GitHub Actions Inactive

lidel merged commit d11475a into main Jan 26, 2026
12 checks passed

lidel deleted the fix/e2e-test-improvements branch January 26, 2026 18:27

This was referenced Jan 26, 2026

fix: solve flaky e2e tests #2123

Closed

Flaky E2E tests #2065

Closed

ipfs-gui-bot added the released label Feb 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): test stability improvements#2466

fix(ci): test stability improvements#2466
lidel merged 35 commits intomainfrom
fix/e2e-test-improvements

lidel commented Jan 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

ipfs-gui-bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lidel commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Uh oh!

Uh oh!

ipfs-gui-bot commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lidel commented Jan 24, 2026 •

edited

Loading