-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: parallel plan and apply also in a single workspace #3670
base: main
Are you sure you want to change the base?
Conversation
693957c
to
c5b3063
Compare
This has a lot of core changes but they do make sense. @finnag I want to set the right expectations that this might take a while to be reviewed/merged so please stick with us while we go through this process. @finnag Thanks a lot for this contribution |
d7cda33
to
e465e2e
Compare
d6340b9
to
580ea23
Compare
@GenPage I believe this is sanitized now, let me know if you want me to split it up or have some other reservations |
Thanks @finnag, let me take some more time to review this. |
@finnag I agree with @jamengual that we will hold on to this until we can properly fix the existing lock regressions plaguing Atlantis. |
We generate a list of all interesting directories, so we can target the locks to the affected directories instead of using a (too) global lock
There is a race condition here where we test if we are current, and only then if we are not current we grab the lock. In the meantime, that information could be stale. Extend the lock to cover all operations, and unconditionally wait for the lock. We can't assume anything can be skipped if we have to wait for the lock.
All Clone() calls that have signaled an interest in merging before another Clone() checks whether a merge is necessary can skip their own checks. This should reduce the thundering herd problem at the beginning of large paralell runs.
Clone is now a NOP if the PR has not changed, and loses its second return value, the MergedAgain flag. MergeAgain must be called explicitly in the only location that cares about this flag, just before planning. This cleans up the code for Clone and re-merging a bit. Also regenerated mocks
Some time has passed, we've run with this in production for more than a year now, happily planning and applying in parallel. Rebased to v0.30.0. |
If we wanted to run this in our K8s cluster, are there any chances you would be aware of requiring? We already set |
what
For parallel mode to work, you must either set the environment variable TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE to something, or check in your .hcl files. Otherwise terraform cannot run in parallel.
why
The Clone call had several race conditions where it could miss clones or delete the working directory under running processes causing failures.
tests
references