Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: parallel / de-duplicated secrets decryption #782

Closed
travisgroth opened this issue Jul 31, 2019 · 5 comments
Closed

feat: parallel / de-duplicated secrets decryption #782

travisgroth opened this issue Jul 31, 2019 · 5 comments

Comments

@travisgroth
Copy link
Contributor

Hi,

On larger installs leveraging the helm secrets plugin, a full run of lint, diff and apply can take a long time (I just watched a 3+ minute run). After digging into the code it seems like there's two reasons:

  1. A mutex to avoid two goroutines erasing each other's .dec.yaml files when moving to temp storage. This mutex forces all decrypts to be serial.

  2. No coordination of which secrets have already been decrypted and moved to temp storage. A secrets file is decrypted N times, where N is the number of charts referencing it.

Some implementation ideas:

  1. Allow decrypt to run free unless two routines are operating on the same file.

    a. Use filesystem flock() to shift the locking out of process. This is likely simpler, though may have platform corner cases.

    b. Shift the mutex into some sort of shared map indexed on some representation of a file path, so that locking can occur against file level granularity.

  2. Be able to detect if a secret has already been decrypted and skip it (this also helps item [1]). Like [1], there is a way both on-filesystem and in-memory.

    a. Hash the secret file path/name into a unique and secure value. Use this value as the file name inside the temporary location. Before calling decrypt, check if a file is already present. If it is, the decrypt can be skipped. If we try our new locking before checking for the tempfile, we can avoid any duplicate decrypts.

    b. Using a structure much like [1b], track the files that are already decrypted into temp storage and check if they are already decrypted.

Thoughts? This may have implications in #745

@mumoshu
Copy link
Collaborator

mumoshu commented Jul 31, 2019

@travisgroth Your proposal sounds great! Not sure I can tackle this shortly but it would definitely welcome more ideas and pull requests related to this.

Also, I was thinking that caching decrypted secrets #444 will also help.

@travisgroth
Copy link
Contributor Author

Question re #444 - I think [2] addresses that without putting them in-memory, which sounds objectively better, but more complicated.

I can probably take a pass at this in the next week or so.

@mumoshu
Copy link
Collaborator

mumoshu commented Aug 5, 2019

Question re #444 - I think [2] addresses that without putting them in-memory, which sounds objectively better, but more complicated.

I thought #444 can be better in terms of that it doesn't need to keep all the cached credentials on tmpfs until helmfile exits.

Anyway, I believe we can start with [2] because #444 is an extension of your [2].

Either way we need to decrypt the secret onto the disk(because that's how helm-secrets works) before saving the result in memory(#444).

@travisgroth
Copy link
Contributor Author

After looking a bit, I think you're right. My conclusion was that the cleanup gets complicated if we cache on disk with the current design.

Generating a new file from memory for each independent helm exec makes cleanup easier to reason about and shrinks the change scope. A data structure that implements [1] in memory could easily be extended for [2] with some extra fields and a method or two, so that sounds like a reasonable approach.

mumoshu pushed a commit that referenced this issue Aug 7, 2019
Related to #782 and #444 

- Allows concurrent decryption of different secrets files
- Caches decrypted secrets by original file path and returns decrypted results from memory
- Secrets being run through an instance of helmexec will be cached and run as fast as possible concurrently

NB: This particular PR doesn't make _all_ calls to secrets cached and concurrent.  Environment Secrets in particular seem to not be evaluated with a ScatterGather(), and doesn't use the same helmexec instance as other parts of the code, so it doesn't take advantage of these changes.  Some reworking of the plumbing there would be needed.
mumoshu pushed a commit that referenced this issue Aug 13, 2019
mumoshu pushed a commit that referenced this issue Aug 15, 2019
Closes #444 and #782 

This is the final PR to fully cache and parallelize helm secret decryption.  It threads the shared helmexec.Interface into the StateCreator and HelmState structs to be used during environment secret decryption.  This should effectively cache secrets for the duration of a helmfile run, regardless of where they are first decrypted.
@travisgroth
Copy link
Contributor Author

Closed in #804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants