Skip to content

Conversation

@MStreet3
Copy link
Contributor

@MStreet3 MStreet3 commented Oct 24, 2025

the localNode, which carries the state of a workflow DON, is only set once when creating a workflow engine. this can lead to DON level issues if the state of the DON config differs between nodes. specifically, capability requests will differ, which will result in remote executable servers not reaching a quorum of requests.

this fix injects a subscriber loop that listens for DON updates. a new node state is read from the cap registry on each update. node state is made into an atomic pointer.

Requires

Supports

@MStreet3 MStreet3 force-pushed the street_donSubscriber branch from 7a9f9bd to ffbdd38 Compare October 24, 2025 18:10
@MStreet3 MStreet3 force-pushed the street_donSubscriber branch 3 times, most recently from 3d5c171 to df3f43c Compare October 27, 2025 19:41
@MStreet3 MStreet3 requested a review from krehermann October 27, 2025 19:44
@MStreet3 MStreet3 marked this pull request as ready for review October 27, 2025 20:03
@MStreet3 MStreet3 requested review from a team as code owners October 27, 2025 20:03
platform.WorkflowRegistryAddress, cfg.WorkflowRegistryAddress,
platform.WorkflowRegistryChainSelector, cfg.WorkflowRegistryChainSelector,
platform.EngineVersion, platform.ValueWorkflowVersionV2,
platform.DonVersion, strconv.FormatUint(uint64(localNode.WorkflowDON.ConfigVersion), 10),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickhuie19 please advise.. realizing that dynamic label update each time we get a new workflow DON is not yet handled in this PR. we could split to something different, seems like an atomic pointer on the beholder logger would be useful for a refactor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the labeler isn't static, so if we wire an accessor or a pointer to the engine labeler around to the right places, you could update the label. I'd prefer an accessor, or potentially even an update label functional hook provided by the eng to keep access locked down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jmank88
jmank88 previously approved these changes Oct 27, 2025
jmank88
jmank88 previously approved these changes Oct 27, 2025
krehermann
krehermann previously approved these changes Oct 27, 2025
@MStreet3 MStreet3 dismissed stale reviews from krehermann and jmank88 via d45e5e8 October 28, 2025 14:05
if h.OnRateLimited == nil {
h.OnRateLimited = func(executionID string) {}
}
if h.OnNodeSynced == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

how is it not-nil IRL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use it to verify the state in tests, in practice there's no other use case for it. But we could put a metric counter in there in theory.

@MStreet3 MStreet3 force-pushed the street_donSubscriber branch from 634d141 to 88dd221 Compare October 28, 2025 18:24
@smartcontractkit smartcontractkit deleted a comment from trunk-io bot Oct 28, 2025
@cl-sonarqube-production
Copy link

@MStreet3 MStreet3 added this pull request to the merge queue Oct 28, 2025
@trunk-io
Copy link

trunk-io bot commented Oct 28, 2025

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

Merged via the queue into develop with commit 1bcbaa4 Oct 28, 2025
304 of 307 checks passed
@MStreet3 MStreet3 deleted the street_donSubscriber branch October 28, 2025 20:22
MStreet3 added a commit that referenced this pull request Oct 28, 2025
…0059)

* refactor(chainlink): app spacing

* refactor(capabilities): adds subscribe method to notifier

* refactor: passes don notifier to engine

* fix: pass args to tests

* refactor: uses atomic pointer for node state

* deps: changeset

* refactor: use local limit for local node timeout

* refactor(workflows): adds tests for local node sync

* refactor: adds additional test

* refactor(v2/engine): simplifies node sync functions

* chore: lint

* refactor(capabilities): removes manual mutexes and resolve deadlock in test

* fix(cre/utils): wires in mock subscriber to standalone engine

* refactor(workflows/v2): load local node only once

* fix(workflows/v1): pass don subscriber to v2 engine

* fix: lint

* Revert "fix: lint"

This reverts commit f52cf5a.

* refactor: moves struct

* fix: removes unused

* fix(capabilities): resolves race on channel read and close
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants