Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incident: prombench can't build Prometheus from main (stuck on react-app npm install) #832

Closed
bwplotka opened this issue Feb 5, 2025 · 12 comments
Labels

Comments

@bwplotka
Copy link
Member

bwplotka commented Feb 5, 2025

Every prombench start or restart will fail with Prometheus stuck in the init container building. prombench cancel will fail due to #831

The only way to unstuck is logging to GKE cluster and manual force delete of prom pod and nodepools.

See the discussion https://cloud-native.slack.com/archives/C01AUBA4PFE/p1738749878969879

Incident investigation: https://cloud-native.slack.com/archives/C07TT6DTQ02/p1738749348012559

@bwplotka bwplotka added the bug label Feb 5, 2025
@bwplotka bwplotka changed the title prombench can't Prometheus from main (stuck on react-app npm install) incident: prombench can't build Prometheus from main (stuck on react-app npm install) Feb 5, 2025
@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2025

Trying to bisect and do v3.2.0-rc.1 vs v3.0.0 prombench prometheus/prometheus#15973

EDIT: same failure https://github.com/prometheus/prometheus/actions/runs/13155085596/job/36710311857

@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2025

@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2025

Commits to check (if #832 (comment) will succed)

96f31e370 doc: clarify `rate` values are averaged (#14045)
23b8dfb35 chore(deps): bump github.com/prometheus/common (#15963)
742775392 scrape: Add realistic data case for scrape loop append bench. (#15966)
8be416a67 [FIX] PromQL: Updates annotation for bin op between incompatible histograms (#15895)
130cd024e [FEATURE] PromQL: Implements `idelta` and `irate` with histograms (#15853)
cb3b17a14 fix: os.MkdirTemp with t.TempDir (#15860)
3389cdf95 Merge pull request #15903 from prometheus/beorn7/promql2
9f6c1d9cd promqltest: Small formatting improvement for native histograms
a8235d5df Merge pull request #15894 from prometheus/superq/audit_fix
e87e308f9 chore(deps): bump github.com/prometheus/alertmanager (#15878)
007d1115e chore(deps): bump golang.org/x/tools from 0.28.0 to 0.29.0 (#15876)
bd55cdcf8 chore(deps): bump github.com/ionos-cloud/sdk-go/v6 from 6.3.0 to 6.3.2 (#15879)
6c1162c03 chore(deps): bump google.golang.org/protobuf from 1.36.1 to 1.36.4 (#15874)
ccdc108f0 Apply npm audit fix to react-app
95fe7fc81 chore(deps): bump github.com/hetznercloud/hcloud-go/v2 (#15881)
a36c73c94 chore(deps): bump github.com/KimMachineGun/automemlimit (#15891)
5d8af4e5f chore(deps): bump github.com/envoyproxy/go-control-plane/envoy (#15880)
172028c71 chore(deps): bump github.com/aws/aws-sdk-go from 1.55.5 to 1.55.6 (#15884)
4aebe745b chore(deps): bump google.golang.org/grpc from 1.69.4 to 1.70.0 (#15883)
e0a6f2224 chore(deps): bump google.golang.org/api from 0.216.0 to 0.218.0 (#15885)
1efff3f9d Merge pull request #15558 from shiftstack/g2
05cbbdf09 Merge pull request #15890 from prometheus/dependabot/go_modules/github.com/miekg/dns-1.1.63
9f9f4f059 chore(deps): bump @fortawesome/fontawesome-svg-core from 6.5.2 to 6.7.2 in /web/ui/react-app (#15755)
c242a53f5 chore(deps): bump github.com/miekg/dns from 1.1.62 to 1.1.63
d734afab4 Merge pull request #15870 from NeerajGartia21/promql/func_over_time
319211372 chore(deps): bump github.com/envoyproxy/protoc-gen-validate (#15877)
d5a93ed0d chore(deps): bump github.com/Azure/azure-sdk-for-go/sdk/azidentity (#15882)
0aaa9cb9f chore(deps): bump github.com/linode/linodego from 1.44.0 to 1.46.0 (#15888)
bb30a871a deps: Use Gophercloud v2
d8eab3f1c chore(deps): bump the go-opentelemetry-io group with 14 updates (#15875)
96ef262fe chore(deps-dev): bump eslint-plugin-prettier from 4.2.1 to 5.2.1 in /web/ui/react-app (#15605)
0d26aa457 Merge pull request #15802 from KofClubs/upgrade-influxdb-client-v2
ffea9f005 Merge pull request #15539 from paulojmdias/openstack-loadbalancer-discovery
7f37a008c Merge pull request #15540 from mmorel-35/prometheus/common@v0.61.0
9d1abbb9e Call PostCreation callback only after the new series is added to the mempotings (#15579)
6823f58e5 Merge pull request #15732 from bboreham/benchmark-setup-append-periodically
b9fcc8169 adds tests for timestamp()
21afc0beb adds tests for sum_over_time and avg_over_time
2ae706be8 Merge pull request #13197 from bboreham/tsdb-lint
6ba25ba93 tsdb tests: avoid 'defer' till end of function
2f615a200 tsdb tests: restrict some 'defer' operations
f4fbe4725 tsdb tests: avoid capture-by-reference in goroutines
54cf0d687 Merge pull request #15472 from tjhop/ref/jsonfilelogger-slog-handler
6ede90050 Merge pull request #15851 from prometheus/upgradeclient
e722a3923 upgrade influxdb client to v2
2830cbacb Merge pull request #15866 from arturmelanchyk/op-mem-alloc
36cf85fc1 Addressed comments.
2915b1977 Merge pull request #15859 from prometheus/krajo/abs-test
bd0d9e7a0 Update model/rulefmt/rulefmt.go
30112f6ed promtool: optimize labels slice allocation
9097f8f4e test(promql): some functions silently ignore native histograms
8846e4252 Merge pull request #15849 from linasm/use-histogram-stats-decoder-for-histogram_avg
81484701a Merge pull request #15854 from prometheus/beorn7/doc2
80d702afd Fixed rulefmt UTF-8 expectations.
7263dfe50 Fixed relabelling; allowing UTF-8 in targetLabel.
940016e00 promql: use histogram stats decoder for histogram_avg
203371375 docs: Improve documentation of promql-delayed-name-removal flag
dd5ab743e chore(deps): use version.PrometheusUserAgent
de0060803 Upgrade client_golang to 1.21.0-rc.0
803b1565a fix: fix network endpoint id
1d49d1178 fix: fix testing
cddf729ca Merge branch 'main' of github.com:prometheus/prometheus into openstack-loadbalancer-discovery
c803f7e82 Merge branch 'openstack-loadbalancer-discovery' of github.com:paulojmdias/prometheus into openstack-loadbalancer-discovery
816a5c94b fix: fix docs typo
36ccf6269 Merge branch 'prometheus:main' into openstack-loadbalancer-discovery
d40e99c2e Merge branch 'openstack-loadbalancer-discovery' of github.com:paulojmdias/prometheus into openstack-loadbalancer-discovery
cb7254158 feat: rename status to provisioning_status and add operating_status
b2fa1c952 TSDB benchmarks: Commit periodically to speed up init
c5bb06586 Merge branch 'prometheus:main' into openstack-loadbalancer-discovery
a5c20713d Merge branch 'prometheus:main' into openstack-loadbalancer-discovery
713903fe4 fix: fix configuration and remove uneeded libs
a90aa34e7 fix: fix docs typo
d136e4310 fix: fix comment
9e9929c42 fix: remove new line
fc0141aec discovery: add openstack load balancer discovery
53eb1fe71 test: make promql engine_test use files to exercise query logger
4d54c304f ref: make query logger more efficient by building list of attrs
e0104a6b7 ref: JSONFileLogger slog handler, add scrape.FailureLogger interface

@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2025

Ok, it’s not something we did on main I think as the old commit that used to work, now doesn’t prometheus/prometheus#15861 (comment)

This is env issue, either GKE, npm or prombench config.

@bwplotka
Copy link
Member Author

bwplotka commented Feb 5, 2025

Handy script to resolve the situation for the PR:

export NS=15861 && gcloud container clusters get-credentials test-infra --zone europe-west3-a --project macro-mile-203600 && kubectl delete deployment prometheus-test-pr-$NS 
-n prombench-$NS && kubectl delete pod -l prometheus=test-pr-$NS --namespace prombench-$NS --force

@bwplotka

This comment has been minimized.

bwplotka added a commit to prometheus/prometheus that referenced this issue Feb 6, 2025
Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Feb 6, 2025

Debugging https://github.com/prometheus/test-infra/blob/master/tools/prometheus-builder/Dockerfile closer.

 npm version
{
  npm: '9.8.0',
  node: '20.5.1',
  acorn: '8.10.0',
  ada: '2.5.1',
  ares: '1.19.1',
  base64: '0.5.0',
  brotli: '1.0.9',
  cjs_module_lexer: '1.2.2',
  cldr: '43.1',
  icu: '73.2',
  llhttp: '8.1.1',
  modules: '115',
  napi: '9',
  nghttp2: '1.55.1',
  nghttp3: '0.7.0',
  ngtcp2: '0.8.1',
  openssl: '3.0.10+quic',
  simdutf: '3.2.14',
  tz: '2023c',
  undici: '5.22.1',
  unicode: '15.0',
  uv: '1.46.0',
  uvwasi: '0.0.18',
  v8: '11.3.244.8-node.10',
  zlib: '1.2.13.1-motley'
}

Perhaps too old npm and node versions?

I noticed that

  • master image is using quay.io/prometheus/golang-builder:1.23-main
  • builder.sh is checking out Prometheus from the PR_NUMBER and runs make build PROMU_BINARIES="prometheus";

This is not entirely consistent with how Prometheus is building binaries on main and CI:

  • For tests we do make build but only with older version quay.io/prometheus/golang-builder:1.22-base

Let's see if it will work with 1.23-main:
https://github.com/prometheus/prometheus/actions/runs/13177507305/job/36780192769?pr=15981

@bwplotka
Copy link
Member Author

bwplotka commented Feb 6, 2025

If I understand right Prom CI will do this to publish images:

  build_all:
    name: Build Prometheus for all architectures
    runs-on: ubuntu-latest
    if: |
      (github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v2.'))
      ||
      (github.event_name == 'push' && startsWith(github.ref, 'refs/tags/v3.'))
      ||
      (github.event_name == 'pull_request' && startsWith(github.event.pull_request.base.ref, 'release-'))
      ||
      (github.event_name == 'push' && github.event.ref == 'refs/heads/main')
    strategy:
      matrix:
        thread: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]

    # Whenever the Go version is updated here, .promu.yml
    # should also be updated.
    steps:
      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
      - uses: prometheus/promci@52c7012f5f0070d7281b8db4a119e21341d43c91 # v0.4.5
      - uses: ./.github/promci/actions/build
        with:
          parallelism: 12
          thread: ${{ matrix.thread }}

Which means running:

Not sure how it gets UI assets though 🤔

@bwplotka
Copy link
Member Author

bwplotka commented Feb 6, 2025

prometheus/prometheus#15981 passed, so on GH action quay.io/prometheus/golang-builder:1.23-main image works? 🤔

bwplotka added a commit to prometheus/prometheus that referenced this issue Feb 6, 2025
Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Feb 6, 2025

Ok, no idea what was happenig but node upgrade helped.

e.g. prometheus/prometheus#15982

bwplotka added a commit that referenced this issue Feb 7, 2025
…ch runs

This custom image is locally build from prometheus/golang-builder#296
that we agree to pause until Prometheus release. However, Prometheus release
needs working prombench. This is a tmp solution until we have an official image.

Mitigates #832

Signed-off-by: bwplotka <bwplotka@gmail.com>
bwplotka added a commit that referenced this issue Feb 7, 2025
…ch runs (#833)

This custom image is locally build from prometheus/golang-builder#296
that we agree to pause until Prometheus release. However, Prometheus release
needs working prombench. This is a tmp solution until we have an official image.

Mitigates #832

Signed-off-by: bwplotka <bwplotka@gmail.com>
@bwplotka
Copy link
Member Author

bwplotka commented Feb 7, 2025

Should be mitigated by #833

@bwplotka
Copy link
Member Author

bwplotka commented Feb 7, 2025

Fixed on master.

@bwplotka bwplotka closed this as completed Feb 7, 2025
kushalShukla-web pushed a commit to kushalShukla-web/test-infra that referenced this issue Feb 8, 2025
…ch runs (prometheus#833)

This custom image is locally build from prometheus/golang-builder#296
that we agree to pause until Prometheus release. However, Prometheus release
needs working prombench. This is a tmp solution until we have an official image.

Mitigates prometheus#832

Signed-off-by: bwplotka <bwplotka@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant