Skip to content

Conversation

@chudilka1
Copy link
Contributor

@chudilka1 chudilka1 commented Oct 3, 2025

CRE-1028: Optimize beholder validator in system tests: structured init and message handling (part 2)

In beholder.go:

  • Added exponential backoff retry logic to protobuf registration on error (which usually takes place in CI, the first test runs)
  • Improved error handling for schema registration failures

In beholder_provider.go:

  • Code structure: modularized; extracted more helpers; improved naming.
  • Configuration: Added ConsumerOptions; larger buffers (40→200 / 1→100); session timeout 10s→20s; async/sync commit modes; nanosecond group IDs.
  • Reliability: Added Kafka + heartbeat preflight checks; heartbeat retries (3×, 5s); exponential backoff (2s→30s, 10% jitter); safe cleanup & error-checked closes; fixed timer leaks.
  • Consumer Behavior: Manual commits; configurable isolation; client.id added; stronger metadata validation; uses SubscribeTopics.
  • Message Processing: Blocking reads with timeout; removed old timestamp filter; only UserLogs reset timer; richer logs (offsets, partitions).
  • Observability: Detailed structured logs (subscription, partitions, retries, backoff); improved error wrapping; clear success/failure reporting.
  • API: Same signature; internal defaults; fail-fast validation; startup timeout (2 min).
  • Performance: Larger buffers, efficient blocking reads, reduced retry churn.
  • Removed: Panic recovery, channel-full retry, old timestamp filtering, ticker polling.
  • New: Heartbeat validation (msg="heartbeat"), multi-topic support, full metadata validation, startup timeout guard.

@chudilka1 chudilka1 force-pushed the test-beholder-validator branch 12 times, most recently from 7d122aa to f28ae23 Compare October 5, 2025 13:17
@chudilka1 chudilka1 changed the title Test beholder validator Optimize beholder validator in system tests: structured init and message handling (part 2) Oct 5, 2025
@chudilka1 chudilka1 changed the title Optimize beholder validator in system tests: structured init and message handling (part 2) [CRE-1028] Optimize beholder validator in system tests: structured init and message handling (part 2) Oct 5, 2025
@chudilka1 chudilka1 changed the base branch from CRE-935-rb-bindings-in-por-workflow to develop October 5, 2025 13:25
@chudilka1 chudilka1 force-pushed the test-beholder-validator branch from f28ae23 to d543f09 Compare October 5, 2025 15:51
@chudilka1 chudilka1 marked this pull request as ready for review October 6, 2025 08:42
@chudilka1 chudilka1 requested review from a team as code owners October 6, 2025 08:42
@chudilka1 chudilka1 requested review from Tofel and krehermann October 6, 2025 08:42
ettec
ettec previously approved these changes Oct 6, 2025
Copy link
Contributor

@Tofel Tofel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proto registration retry is and should be handled in the CTF, not here.

sleepDuration := backoff + jitter
if sleepDuration > maxBackoffTimeout {
sleepDuration = maxBackoffTimeout
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use avast's retry?

Copy link
Contributor Author

@chudilka1 chudilka1 Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this specific case, I have several concerns, as I am uncertain whether the retry library can handle the following:

  • When using retry.Do(), we need to carefully determine when to close the output channel (e.g. defer close(out)) - only after all retries are exhausted or the context is canceled.
  • A jitter feature to prevent thundering herd problems will be lost, along with some essential debug logging.
  • The readyCh must be carefully managed - it should only signal once, at the first successful connection, not on every retry. This requires state tracking outside the retry function.

mchain0
mchain0 previously approved these changes Oct 6, 2025
@chudilka1 chudilka1 force-pushed the test-beholder-validator branch 2 times, most recently from 049cd4e to b3026d4 Compare October 6, 2025 11:20
@chudilka1 chudilka1 requested review from Tofel, ettec and mchain0 October 6, 2025 11:36
mchain0
mchain0 previously approved these changes Oct 6, 2025
@chudilka1 chudilka1 requested a review from Copilot October 6, 2025 12:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements part 2 of the Beholder validator optimization in system tests, focusing on enhanced reliability, error handling, and message processing. It significantly restructures the Kafka consumer implementation to address timer leaks, improve connection handling, and add comprehensive validation.

  • Enhanced Kafka consumer with exponential backoff, heartbeat validation, and structured error handling
  • Added fail-fast validation for Beholder subscription errors during initialization
  • Updated dependencies and improved documentation/comments

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
system-tests/tests/test-helpers/t_helpers.go Added fail-fast error checking for Beholder subscription initialization
system-tests/tests/test-helpers/beholder_provider.go Complete restructure with improved Kafka consumer, validation, and error handling
system-tests/tests/regression/cre/v2_evm_regression_test.go Fixed typo and increased timeout duration
system-tests/tests/regression/cre/cre_regression_suite_test.go Updated documentation comments
system-tests/tests/go.mod Moved retry-go dependency from indirect to direct
go.md Added dependency relationship for chainlink-evm/gethwrappers
core/scripts/go.mod Moved retry-go dependency from indirect to direct
core/scripts/cre/environment/examples/workflows/v2/proof-of-reserve/cron-based/main.go Replaced hardcoded ABI with generated bindings
core/scripts/cre/environment/examples/workflows/v2/proof-of-reserve/cron-based/go.mod Added chainlink-evm/gethwrappers dependency
core/scripts/cre/environment/environment/beholder.go Added retry logic for protobuf registration
.changeset/slick-drinks-like.md Changeset for PoR workflow binding updates
.changeset/eight-tips-bathe.md Changeset for Beholder optimization

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

…it and message handling (part 2)

Add retries with exponential backoff to a proto registration function

Refactor beholder validator

Add Kafka listener reconnection when UserLogs are empty within timeout

Add Beholder heartbit and consumer connectivity validation before starting system tests

Add Beholder heartbit and consumer connectivity validation before starting system tests
@chudilka1 chudilka1 force-pushed the test-beholder-validator branch 2 times, most recently from bf55c1c to 439c04b Compare October 7, 2025 13:56
@chudilka1 chudilka1 force-pushed the test-beholder-validator branch from 439c04b to c955503 Compare October 7, 2025 14:29
@cl-sonarqube-production
Copy link

@chudilka1 chudilka1 requested a review from mchain0 October 7, 2025 14:58
@chudilka1 chudilka1 added this pull request to the merge queue Oct 7, 2025
Merged via the queue into develop with commit 02b4069 Oct 7, 2025
192 of 193 checks passed
@chudilka1 chudilka1 deleted the test-beholder-validator branch October 7, 2025 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants