feat: parallel / de-duplicated secrets decryption #782

travisgroth · 2019-07-31T01:17:12Z

Hi,

On larger installs leveraging the helm secrets plugin, a full run of lint, diff and apply can take a long time (I just watched a 3+ minute run). After digging into the code it seems like there's two reasons:

A mutex to avoid two goroutines erasing each other's .dec.yaml files when moving to temp storage. This mutex forces all decrypts to be serial.
No coordination of which secrets have already been decrypted and moved to temp storage. A secrets file is decrypted N times, where N is the number of charts referencing it.

Some implementation ideas:

Allow decrypt to run free unless two routines are operating on the same file.

a. Use filesystem flock() to shift the locking out of process. This is likely simpler, though may have platform corner cases.

b. Shift the mutex into some sort of shared map indexed on some representation of a file path, so that locking can occur against file level granularity.
Be able to detect if a secret has already been decrypted and skip it (this also helps item [1]). Like [1], there is a way both on-filesystem and in-memory.

a. Hash the secret file path/name into a unique and secure value. Use this value as the file name inside the temporary location. Before calling decrypt, check if a file is already present. If it is, the decrypt can be skipped. If we try our new locking before checking for the tempfile, we can avoid any duplicate decrypts.

b. Using a structure much like [1b], track the files that are already decrypted into temp storage and check if they are already decrypted.

Thoughts? This may have implications in #745

mumoshu · 2019-07-31T14:31:47Z

@travisgroth Your proposal sounds great! Not sure I can tackle this shortly but it would definitely welcome more ideas and pull requests related to this.

Also, I was thinking that caching decrypted secrets #444 will also help.

travisgroth · 2019-08-02T14:10:26Z

Question re #444 - I think [2] addresses that without putting them in-memory, which sounds objectively better, but more complicated.

I can probably take a pass at this in the next week or so.

mumoshu · 2019-08-05T06:10:47Z

Question re #444 - I think [2] addresses that without putting them in-memory, which sounds objectively better, but more complicated.

I thought #444 can be better in terms of that it doesn't need to keep all the cached credentials on tmpfs until helmfile exits.

Anyway, I believe we can start with [2] because #444 is an extension of your [2].

Either way we need to decrypt the secret onto the disk(because that's how helm-secrets works) before saving the result in memory(#444).

travisgroth · 2019-08-05T12:32:20Z

After looking a bit, I think you're right. My conclusion was that the cleanup gets complicated if we cache on disk with the current design.

Generating a new file from memory for each independent helm exec makes cleanup easier to reason about and shrinks the change scope. A data structure that implements [1] in memory could easily be extended for [2] with some extra fields and a method or two, so that sounds like a reasonable approach.

Related to #782 and #444 - Allows concurrent decryption of different secrets files - Caches decrypted secrets by original file path and returns decrypted results from memory - Secrets being run through an instance of helmexec will be cached and run as fast as possible concurrently NB: This particular PR doesn't make _all_ calls to secrets cached and concurrent. Environment Secrets in particular seem to not be evaluated with a ScatterGather(), and doesn't use the same helmexec instance as other parts of the code, so it doesn't take advantage of these changes. Some reworking of the plumbing there would be needed.

Ref #782

Closes #444 and #782 This is the final PR to fully cache and parallelize helm secret decryption. It threads the shared helmexec.Interface into the StateCreator and HelmState structs to be used during environment secret decryption. This should effectively cache secrets for the duration of a helmfile run, regardless of where they are first decrypted.

travisgroth · 2019-08-16T22:18:55Z

Closed in #804

mumoshu added feature request good-first-issue labels Jul 31, 2019

This was referenced Aug 6, 2019

Cache secrets and concurrent decryption #790

Merged

bug: HelmState.scatterGather ignores item parameter #793

Closed

This was referenced Aug 9, 2019

Handle environment secrets concurrently #800

Merged

Shared helmexec for Environment Secrets #804

Merged

mumoshu pushed a commit that referenced this issue Aug 13, 2019

Handle environment secrets concurrently

765bfe6

Ref #782

travisgroth closed this as completed Aug 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: parallel / de-duplicated secrets decryption #782

feat: parallel / de-duplicated secrets decryption #782

travisgroth commented Jul 31, 2019

mumoshu commented Jul 31, 2019

travisgroth commented Aug 2, 2019

mumoshu commented Aug 5, 2019 •

edited

Loading

travisgroth commented Aug 5, 2019

travisgroth commented Aug 16, 2019

feat: parallel / de-duplicated secrets decryption #782

feat: parallel / de-duplicated secrets decryption #782

Comments

travisgroth commented Jul 31, 2019

mumoshu commented Jul 31, 2019

travisgroth commented Aug 2, 2019

mumoshu commented Aug 5, 2019 • edited Loading

travisgroth commented Aug 5, 2019

travisgroth commented Aug 16, 2019

mumoshu commented Aug 5, 2019 •

edited

Loading