-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VAULT-17827] Rollback manager worker pool #22567
Conversation
CI Results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome Mia!
unmount and remount operations trigger a rollback through the rollback manager, then wait for the rollback to complete before continuing. Because we're now using a worker pool it's possible that unmount and remounts will take longer to complete. Note that unmount and remount can be called replication invalidation operations.
This is such a good callout. I had not appreciated the interaction with replication which is certainly an additional side-effect it's good to consider and anticipate.
I guess one additional factor that it makes me think about is that rollbacks might be making calls to external systems e.g. cleaning up external database credentials. In that case limiting concurrency could be more of a big deal. Say for example there are 10,000 database secret mounts but the vault active node suddenly has a few seconds of latency caused by network issues to that database provider. If we also assume that these mounts are all active, and that because of the increased latency there is an elevated failure rate causing there to be rollback operations needed every minute. Now with just 256 concurrent rollbacks, if each one takes say 10s due to the latency to the provider, it will take about 6.5 mins to get through all of them. Even then that's probably not the end of the world unless it blocks replication for that time due to an unmount.
Can you confirm if it would? My mental model is that in this case the secondary would not have anything to rollback at least with an external provider because it's not the primary - i.e. the primary might get in the state above and have slow rollbacks, but on the secondary they'd only ever be rolling back internal state in our own store right? If so then that seems to mitigate the worst risks I could think of above.
Code Considerations
The worker pool package you found looks solid to me and beats re-inventing one or working around the quirks of fairshare
for this use-case. I wonder if eventually we might need yet another worker pool implementation that has dynamic pool sizing so we can do adaptive concurrency control for request handling (@mpalmi is working on this). (we already have a few other implicit pools in the code based e.g. the pool of flushers in Consul backend).
I guess we can cross that bridge when we get to it though. It would be kind of a shame to end up with so many different worker pool variants but then I don't think we should scope creep and this one looks great for the task at hand.
I left a few comments inline. I think overall my biggest feedback is that I wonder how important it is to keep the 0 == no pool
behaviour at the expense of more code and more things to test. In practice just settting the num workers to 999999999
seems to mitigate virtually all the risks I can think of esp. given the pool implementation chosen is doing virtually the same lines of code as the "no pool" option right up to the point it hits the limit 🤷 . That would simplify code a tiny but but also remove test cases and things to maintain in the future.
vault/rollback.go
Outdated
func (g *goroutineRollbackRunner) StopWait() { | ||
g.inflightAll.Wait() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if StopWait
should do something that prevents any further Submit
calls? We could leave it up to the calling code to ensure it stops calling Submit after this but it seems like it could be subtle and potentially cause live locks where some shutdown process is waiting on this but hasn't yet stopped some other process from submitting new things? Not read all of this yet so it might not be important, just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, there is a risk of a panic if a submit happens after StopWait()
. I've added this to prevent that: https://github.com/hashicorp/vault/pull/22567/files#diff-067fef428afca5813d7e1a68bf8b43f371d6106b77056e97cc07fbb825be65baR219-R235
DR secondaries don't start the rollback manager. Actually, many backends exclude the WAL prefix from being replicated: https://github.com/hashicorp/vault/blob/main/builtin/logical/database/backend.go#L97. This means that performance replica's WAL rollbacks would only be duplicated effort if the backend doesn't set the path to local. The Azure secrets and auth backends are the only builtin plugins that don't set the WAL path to local, and have WALRollbackFunc's. |
98a1397
to
eb18aee
Compare
eb18aee
to
c03fe11
Compare
Build Results: |
@miagilepner Question for the purposes of the doc review:
If a value less than 0 means "no pool" and a value greater than 0 defines the pool size, what does a value of 0 do? Also, where were you planning on documenting the new environment variable? I only see metric partials in the docs atm. |
I've updated the PR description and the code. VAULT_ROLLBACK_WORKERS must be greater than or equal to 1. If it's less than 1, we use the default value of 256.
No, I'm not planning to. For the vast majority of Vault users, they shouldn't need to be aware of the worker pool. The environment variable is there as an escape hatch if someone does run into performance difficulties. |
Content LGTM, feel free to merge once the code review is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome Mia, great job.
I think it's ready to go!
timeout, cancel := context.WithTimeout(context.Background(), 20*time.Second) | ||
defer cancel() | ||
got := make(map[string]bool) | ||
gotLock := sync.RWMutex{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely fine and does not need to change, but just in case you've not seen this before I thought I'd take the opportunity to share some "wisdom" from the Go authors which is basically "don't use RWMutex unless you've profiled and are pretty sure it make an important difference vs. Mutex" https://github.com/golang/go/wiki/CodeReviewConcurrency#rwmutex
This is a test so none of this matters even a small amount and I don't think you should change it, but in general I tend to suspect almost all new usages of RWMutex
are likely better of as the simpler Mutex!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to know!
* backport of commit 4e3b91d (#22567) * workerpool implementation * rollback tests * website documentation * add changelog * fix failing test * backport of commit de043d6 (#22754) * fix flaky rollback test * better fix * switch to defer * add comment --------- Co-authored-by: miagilepner <mia.epner@hashicorp.com>
(Description is still a WIP)
This PR adds a worker pool to the rollback manager with a default size of 256. The size of the worker pool can be adjusted with the environment variable
VAULT_ROLLBACK_WORKERS
.Considerations:
Scheduler latency profile with unlimited workers, with 9000 mounts:
256 workers, with 9000 mounts:
(# mounts / # workers) * 90 seconds
, rather than every 60 seconds.PeriodicFunc
and callWALRollback
with a collection of WAL entries. To be clear, these WAL entries are not the same WAL that Vault uses for replication. This is a separate, namespace/mount-scoped storage location, and the path is only written to by plugins viaframework.PutWAL
. By default, the WAL entries that get passed to the WALRollback function are any entries older than 10 minutes.