Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storcon: fix AZ-driven tenant selection in chaos #10443

Merged
merged 3 commits into from
Jan 30, 2025

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Jan 20, 2025

Problem

In #10438 I had got the function for picking tenants backwards, and it was preferring to move things away from their preferred AZ.

Summary of changes

  • Fix condition in is_attached_outside_preferred_az

@jcsp jcsp added a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller labels Jan 20, 2025
Copy link

github-actions bot commented Jan 20, 2025

7414 tests run: 7063 passed, 0 failed, 351 skipped (full report)


Flaky tests (7)

Postgres 17

Code coverage* (full report)

  • functions: 33.4% (8518 of 25532 functions)
  • lines: 49.1% (71513 of 145551 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
dd431d7 at 2025-01-30T22:25:07.945Z :recycle:

@jcsp jcsp changed the title storcon: refine chaos selection logic storcon: fix AZ-driven tenant selection & refine chaos selection logic Jan 20, 2025
@jcsp jcsp marked this pull request as ready for review January 20, 2025 12:09
@jcsp jcsp requested a review from a team as a code owner January 20, 2025 12:09
@jcsp jcsp requested review from arssher and problame and removed request for arssher January 20, 2025 12:09
@jcsp jcsp requested review from VladLazar and skyzh and removed request for problame and VladLazar January 30, 2025 20:56
Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't feel it needs to be that complex.

We can simply do an O(n) shuffle on tenant_id. After that, pick the first batch_size victims?

@skyzh
Copy link
Member

skyzh commented Jan 30, 2025

Oh I realized that they also have partial_shuffle that is O(batch_size)

@skyzh
Copy link
Member

skyzh commented Jan 30, 2025

collect tenant_ids
compute out_of_home_az and at_home_az
if out_of_home_az len >= batch_size {
  shuffle out_of_home_az
  victims = shuffled out_of_home_az
} else {
  shuffle at_home_az
  victims = out_of_home_az
  victims.extend(at_home_az[..batch_size - out_of_home_az len])
  shuffle victims
}

@jcsp
Copy link
Collaborator Author

jcsp commented Jan 30, 2025

Interesting suggestion, I mainly just wanted to merge the bugfix part of this PR, so I've reduced it to that. I'll try your suggestion in #10600

@jcsp jcsp enabled auto-merge January 30, 2025 21:08
@jcsp jcsp changed the title storcon: fix AZ-driven tenant selection & refine chaos selection logic storcon: fix AZ-driven tenant selection in chaos Jan 30, 2025
@jcsp jcsp added this pull request to the merge queue Jan 30, 2025
Merged via the queue into main with commit d18f619 Jan 30, 2025
80 checks passed
@jcsp jcsp deleted the jcsp/storage-less-chaotic-chaos branch January 30, 2025 22:18
github-merge-queue bot pushed a commit that referenced this pull request Jan 30, 2025
## Problem

In #10438 it was pointed out
that it would be good to avoid picking tenants in ID order, and also to
avoid situations where we might double-select the same tenant.

There was an initial swing at this in
#10443, where Chi suggested a
simpler approach which is done in this PR

## Summary of changes

- Split total set of tenants into in and out of home AZ
- Consume out of home AZ first, and if necessary shuffle + consume from
out of home AZ
winter-loo pushed a commit to winter-loo/neon that referenced this pull request Feb 4, 2025
## Problem

In neondatabase#10438 I had got the
function for picking tenants backwards, and it was preferring to move
things _away_ from their preferred AZ.

## Summary of changes

- Fix condition in `is_attached_outside_preferred_az`
winter-loo pushed a commit to winter-loo/neon that referenced this pull request Feb 4, 2025
## Problem

In neondatabase#10438 it was pointed out
that it would be good to avoid picking tenants in ID order, and also to
avoid situations where we might double-select the same tenant.

There was an initial swing at this in
neondatabase#10443, where Chi suggested a
simpler approach which is done in this PR

## Summary of changes

- Split total set of tenants into in and out of home AZ
- Consume out of home AZ first, and if necessary shuffle + consume from
out of home AZ
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a/tech_debt Area: related to tech debt c/storage/controller Component: Storage Controller
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants