Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add allocation_fallback_strategy option as fallback strategy for per-node strategy #3482

Conversation

lhp-nemlig
Copy link
Contributor

@lhp-nemlig lhp-nemlig commented Nov 21, 2024

Description:

Adds allocation_fallback_strategy option as fallback strategy for per-node strategy, such that targets that are not related to a specific node can be assigned.

The fallback strategy is configurable via the config option allocation_fallback_strategy

Link to tracking Issue(s):

Testing:

Fixed per-node tests to expect non-node target to be assigned according to fallback strategy

Documentation:

@lhp-nemlig lhp-nemlig requested a review from a team as a code owner November 21, 2024 11:26
Copy link

linux-foundation-easycla bot commented Nov 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@swiatekm swiatekm added the discuss-at-sig This issue or PR should be discussed at the next SIG meeting label Nov 21, 2024
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change, so we either have to mark it as such, or hide it behind a feature gate. I'd prefer the latter, but am willing to be convinced otherwise. We'll discuss it during today's SIG meeting, feel free to join if you'd like to contribute @lhp-nemlig.

On a more technical note, you'll need to sign the CLA and add a changelog entry before we can merge this PR.

cmd/otel-allocator/allocation/strategy.go Outdated Show resolved Hide resolved
@lhp-nemlig lhp-nemlig changed the title Add least-weighted as fallback strategy for per-node strategy Add consistent-hashing as fallback strategy for per-node strategy Nov 21, 2024
@lhp-nemlig
Copy link
Contributor Author

lhp-nemlig commented Nov 21, 2024

This is a breaking change, so we either have to mark it as such, or hide it behind a feature gate. I'd prefer the latter, but am willing to be convinced otherwise. We'll discuss it during today's SIG meeting, feel free to join if you'd like to contribute @lhp-nemlig.

On a more technical note, you'll need to sign the CLA and add a changelog entry before we can merge this PR.

@swiatekm I have changed the PR to add fallback strategy as a config option, making this a non-breaking change as the default is NO fallback.
That makes the changes a little more extensive of course, but I think I have implemented this in a mostly reasonable way :)

@lhp-nemlig lhp-nemlig changed the title Add consistent-hashing as fallback strategy for per-node strategy Add allocation_fallback_strategy option as fallback strategy for per-node strategy Nov 21, 2024
@swiatekm
Copy link
Contributor

This is a breaking change, so we either have to mark it as such, or hide it behind a feature gate. I'd prefer the latter, but am willing to be convinced otherwise. We'll discuss it during today's SIG meeting, feel free to join if you'd like to contribute @lhp-nemlig.
On a more technical note, you'll need to sign the CLA and add a changelog entry before we can merge this PR.

@swiatekm I have changed the PR to add fallback strategy as a config option, making this a non-breaking change as the default is NO fallback. That makes the changes a little more extensive of course, but I think I have implemented this in a mostly reasonable way :)

I'm ok with this, but keep in mind that in order for this to actually be usable, you'll also need to add an attribute to the OpenTelemetryCollector CRD. That's not a problem, but since in that case you're changing the public API, it's going to invite more scrutiny during review. Just FYI.

@lhp-nemlig
Copy link
Contributor Author

This is a breaking change, so we either have to mark it as such, or hide it behind a feature gate. I'd prefer the latter, but am willing to be convinced otherwise. We'll discuss it during today's SIG meeting, feel free to join if you'd like to contribute @lhp-nemlig.
On a more technical note, you'll need to sign the CLA and add a changelog entry before we can merge this PR.

@swiatekm I have changed the PR to add fallback strategy as a config option, making this a non-breaking change as the default is NO fallback. That makes the changes a little more extensive of course, but I think I have implemented this in a mostly reasonable way :)

I'm ok with this, but keep in mind that in order for this to actually be usable, you'll also need to add an attribute to the OpenTelemetryCollector CRD. That's not a problem, but since in that case you're changing the public API, it's going to invite more scrutiny during review. Just FYI.

Ok @swiatekm, I expect to join the SIG meeting and you can tell me what I need to do in order to complete this

@swiatekm
Copy link
Contributor

@lhp-nemlig as per our discussion during the SIG meeting, the simplest way to proceed with this change without needing to add attributes to the CRDs, would be to:

  1. Keep your current changes making it configurable on the target allocator.
  2. Add a feature flag which, when enabled, would set the fallback to the consistent-hashing strategy.

You can add feature flags here and the target allocator ConfigMap is generated here. Feature flags are global, so you can just import the package, and add the config value for the fallback strategy if the flag is enabled. Does that make sense?

@swiatekm swiatekm removed the discuss-at-sig This issue or PR should be discussed at the next SIG meeting label Nov 21, 2024
@lhp-nemlig
Copy link
Contributor Author

@swiatekm Thanks for the clear instructions, I believe everything should be in order now.

Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, had some minor comments and nitpicks. I'm fine merging this PR as-is when they're addressed, but in an ideal world we'd also include an e2e test showing the fallback strategy assigning targets without Nodes when the flag is enabled. I'm ok with that happening in a follow-up though, as this is an off-by-default feature flag, and writing these kinds of tests in chainsaw isn't very straightforward.

cmd/otel-allocator/allocation/per_node_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/per_node_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/per_node_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/allocation/per_node_test.go Outdated Show resolved Hide resolved
cmd/otel-allocator/main.go Outdated Show resolved Hide resolved
.chloggen/add_fallback_strategy_for_per_node_strategy.yaml Outdated Show resolved Hide resolved
lhp-nemlig and others added 5 commits November 22, 2024 13:09
Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Co-authored-by: Mikołaj Świątek <mail@mikolajswiatek.com>
Copy link
Contributor

@swiatekm swiatekm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for the contribution!

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for your contribution 🙇

@jaronoff97 jaronoff97 merged commit 9a5a8ce into open-telemetry:main Nov 22, 2024
38 checks passed
araiu added a commit to araiu/opentelemetry-helm-charts that referenced this pull request Dec 6, 2024
Feature flag available since `v0.114.0` (Add allocation_fallback_strategy option as fallback strategy for per-node strategy [#3482](open-telemetry/opentelemetry-operator#3482))
araiu added a commit to araiu/opentelemetry-helm-charts that referenced this pull request Dec 6, 2024
Feature flag available since `v0.114.0` (Add allocation_fallback_strategy option as fallback strategy for per-node strategy [#3482](open-telemetry/opentelemetry-operator#3482))
TylerHelmuth pushed a commit to open-telemetry/opentelemetry-helm-charts that referenced this pull request Dec 6, 2024
* Release operator 0.114.1

Signed-off-by: Alex Raiu <raiu.andrei@protonmail.com>

* Update operator clusterrole

* `secrets`:  Add TLS support to auto-instrumentation  [#3338](open-telemetry/opentelemetry-operator#3338)
* `targetallocators`: Generate only TargetAllocator CR from Collector CR [#3402](open-telemetry/opentelemetry-operator#3402)

* Add `targetAllocatorFallbackStrategy` feature flag

Feature flag available since `v0.114.0` (Add allocation_fallback_strategy option as fallback strategy for per-node strategy [#3482](open-telemetry/opentelemetry-operator#3482))

* Update operator-test with manager label

Adding `control-plane: controller-manager` label to match [operator-restart e2e test](open-telemetry/opentelemetry-operator#3486)

---------

Signed-off-by: Alex Raiu <raiu.andrei@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fallback for "per-node" allocation strategy
3 participants