Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parallel plan and apply also in a single workspace #3670

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

finnag
Copy link
Contributor

@finnag finnag commented Aug 14, 2023

what

  • Add more thorough locking around Clone() calls, covering all of these phases:
    • Am I on the right commit
    • Merge with upstream
    • Clone if necessary
  • Reduce the number of remote git operations when planning or applying in parallel
  • Clean up the Clone() method, split into Clone() and MergeAgain()

For parallel mode to work, you must either set the environment variable TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE to something, or check in your .hcl files. Otherwise terraform cannot run in parallel.

why

The Clone call had several race conditions where it could miss clones or delete the working directory under running processes causing failures.

tests

  • I have tested my changes by make test-all
  • Run in production with several repos, large and small, including a monorepo multi-directory setup

references

@finnag finnag changed the title Parallel plans fix: parallel plans without workspaces Aug 14, 2023
@finnag finnag force-pushed the parallel-plans branch 4 times, most recently from 693957c to c5b3063 Compare August 16, 2023 13:02
@finnag finnag changed the title fix: parallel plans without workspaces fix: parallel plan and apply also in a single workspace Aug 16, 2023
@finnag finnag marked this pull request as ready for review August 16, 2023 13:24
@finnag finnag requested a review from a team as a code owner August 16, 2023 13:24
@github-actions github-actions bot added go Pull requests that update Go code provider/github labels Aug 16, 2023
server/events/working_dir.go Fixed Show fixed Hide fixed
server/events/working_dir.go Dismissed Show dismissed Hide dismissed
server/events/working_dir.go Dismissed Show dismissed Hide dismissed
@jamengual
Copy link
Contributor

jamengual commented Aug 16, 2023

This has a lot of core changes but they do make sense.
I'm a workspace user and I need to make sure this works with multiple workspaces well, BUT right now we are at an inflection point with the work on Locks, and I think I will have to defer this to @GenPage because this interjects with his work on the #3345 and can potentially conflict with it, hence, we need to be careful coordinate on this.

@finnag I want to set the right expectations that this might take a while to be reviewed/merged so please stick with us while we go through this process.

@finnag Thanks a lot for this contribution

@GenPage GenPage added refactoring Code refactoring that doesn't add additional functionality waiting-on-review Waiting for a review from a maintainer work-in-progress labels Sep 25, 2023
@finnag finnag force-pushed the parallel-plans branch 2 times, most recently from d6340b9 to 580ea23 Compare September 29, 2023 11:50
@finnag
Copy link
Contributor Author

finnag commented Sep 29, 2023

@GenPage I believe this is sanitized now, let me know if you want me to split it up or have some other reservations

@GenPage GenPage added this to the v0.27.0 milestone Oct 6, 2023
@GenPage
Copy link
Member

GenPage commented Oct 6, 2023

Thanks @finnag, let me take some more time to review this.

@GenPage
Copy link
Member

GenPage commented Dec 11, 2023

@finnag I agree with @jamengual that we will hold on to this until we can properly fix the existing lock regressions plaguing Atlantis.

@GenPage GenPage removed this from the v0.27.0 milestone Dec 11, 2023
@GenPage GenPage added needs discussion Large change that needs review from community/maintainers never-stale and removed waiting-on-review Waiting for a review from a maintainer labels Dec 11, 2023
@finnag finnag requested a review from a team as a code owner April 25, 2024 14:03
@finnag finnag removed the request for review from a team April 25, 2024 14:03
@finnag finnag requested review from lukemassa and nitrocode April 25, 2024 14:03
finnag added 4 commits October 8, 2024 16:00
We generate a list of all interesting directories, so we can target
the locks to the affected directories instead of using a (too) global lock
There is a race condition here where we test if we are current, and
only then if we are not current we grab the lock. In the meantime,
that information could be stale.

Extend the lock to cover all operations, and unconditionally wait for
the lock. We can't assume anything can be skipped if we have to wait for
the lock.
All Clone() calls that have signaled an interest in merging
before another Clone() checks whether a merge is necessary
can skip their own checks.

This should reduce the thundering herd problem at the
beginning of large paralell runs.
Clone is now a NOP if the PR has not changed, and loses its second
return value, the MergedAgain flag.

MergeAgain must be called explicitly in the only location that
cares about this flag, just before planning.

This cleans up the code for Clone and re-merging a bit.

Also regenerated mocks
@finnag
Copy link
Contributor Author

finnag commented Oct 8, 2024

Some time has passed, we've run with this in production for more than a year now, happily planning and applying in parallel. Rebased to v0.30.0.

@jseiser
Copy link

jseiser commented Oct 21, 2024

@finnag

If we wanted to run this in our K8s cluster, are there any chances you would be aware of requiring? We already set TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE to True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code needs discussion Large change that needs review from community/maintainers never-stale provider/github refactoring Code refactoring that doesn't add additional functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants