-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(CODEMGMT-1299) Do not recurse into recursive globbed exclusions when purging #1178
Conversation
I still need to functionally test this against a painfully large Puppetfile, but would love feedback on the approach. Also, I'm out Friday, so if someone wants to take this over, I'm fine with that. |
Note: I thought we removed usage of whitelist throughout our code, preferring allowlist instead. However, it looks like we just changed the external facing settings? Given that I extended our usage of whitelist to be consistent with the existing code. I assume if we want to change the internal terminology it needs another ticket? |
9070c6a
to
60d4f0c
Compare
And now with the ticket number in the commit message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ticket to remove whitelist
etc. in the code base is https://tickets.puppetlabs.com/browse/RK-368.
lib/r10k/util/purgeable.rb
Outdated
else | ||
child.to_s | ||
end | ||
end.compact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell, we should never get nils in here, but maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good catch. There was a version that did and needed this but I updated it without removing the additional compaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! Hopefully it speeds things up. If I have time today, I'll try it with a big Puppetfile.
Started testing this, but don't have "after" numbers yet. |
I think there's a bug in this, it seems to have cleaned up something it shouldn't have... |
This does speed things up considerably, in my testing a run that took a little over 7 minutes down to 8 seconds. |
0e4d7f7
to
2908986
Compare
lib/r10k/util/purgeable.rb
Outdated
# Directory must start with a dot and then have a non-dot character, so the *nix | ||
# special directories '.', and '..' are invalid. Matching path must have the | ||
# dot-prepended segment as an interior segment (not at the end of the path). | ||
HIDDEN_SUBPATH = Regexp.new("#{File::SEPARATOR}\\.[^.]+#{File::SEPARATOR}.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does FNM_DOTMATCH
not cover this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference in behavior stemmed from them not passing FNM_DOTMATCH
into Dir.glob
previously. I've updated the code this morning, which I think is less convoluted. Let me know what you think!
2908986
to
7c807a3
Compare
Testing this again last night I got times of 3:40 purging w/o this patch & 3 seconds with it. So similar percent improvement but smaller total times, I guess based on what my disk was otherwise doing? I've added some test fixtures that should cause the existing tests to flex the fix I've added. Let me know if I should add more. I haven't manually verified with my refactor this morning, doing so now. Would love it if you could manually verify with your reproducer as well, Maggie. |
Will do! |
Things are looking good locally with my scripts against the last refactor! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an environment that is almost all module content (https://github.com/Magisus/control-repo/tree/big_boy), this reduces non-initial deploy times from ~50s to ~17s 👍 The bug I mentioned previously is gone.
lib/r10k/util/purgeable.rb
Outdated
logger.debug2 _("Not purging %{path} due to internal exclusion match: %{exclusion_match}") % {path: path, exclusion_match: exclusion_match} | ||
elsif allowlist_match = allowed_globs.find { |allowed| matches?(allowed, path) } | ||
logger.debug _("Not purging %{path} due to whitelist match: %{allowlist_match}") % {path: path, allowlist_match: allowlist_match} | ||
elsif desired_match = desireds_not_to_recurse_into.grep(path).first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this just be
else
desired_match = desireds ...
end
I think that might be a little clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the instinct to match the pattern of the other cases, which do assignment in the check, but that's sometimes frowned upon in itself, so doing it here where there is no body to the check seems even less good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought I pushed that change up before lunch. Sorry.
# @return [Array<String>] Directory contents that are present but not expected | ||
def stale_contents(recurse, exclusions, whitelist) | ||
fn_match_opts = File::FNM_PATHNAME | File::FNM_DOTMATCH | ||
dirs = self.managed_directories | ||
desireds = self.desired_contents |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't love "desireds" 😛 But I don't have an alternative that isn't really wordy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it should be "desires" but I didn't want to write a bunch of code about my hidden_desires
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds weird to me too, anyway. Oh well, adjectives can be nouns whenever we want, it's fine :P
… purging When we purge at the environment level we create recurse globs of our Puppetfile managed directories, eg "<environment>/modules/**/*". Previous to this patch we would collect a list of all content in the environment and then match them against this recursive glob using `File.fnmatch?`. This means we could be doing fnmatches against hundreds of thousands of then-filtered results which can become prohibitively slow. This patch removes the usage of Dir.glob to retrieve all files and instead manually recurses into child directories if set to recurse and the child does not match either the purge_exclusions or whitelist globs. However, because we match using a glob something like "<environment>/modules/**/*" won't match "<environment>/modules", but instead will filter each module within "<environment>/modules". In large deployments this may mean filtering over a thousand child files, but will prevent us from recursing into any of the modules themselves. This maintains the post-recursion filtering of "desired_contents" as this is how git-backed, environment maintained files are cleaned.
I think the testing here is fine, looks like we correctly show the hidden dir as "managed" and then proceed to no purge it, and similar tests for other scenarios 👍 |
7c807a3
to
a1a6251
Compare
When we purge at the environment level we create recurse globs of our
Puppetfile managed directories, eg "/modules/**/*".
Previous to this patch we would collect a list of all content in the
environment and then match them against this recursive glob using
File.fnmatch?
. This means we could be doing fnmatches against hundredsof thousands of then-filtered results which can become prohibitively
slow.
This patch removes the usage of Dir.glob to retrieve all files and
instead manually recurses into child directories if set to recurse and
the child does not match either the purge_exclusions or whitelist globs.
However, because we match using a glob something like
"/modules/**/*" won't match "/modules", but
instead will filter each module within "/modules". In large
deployments this may mean filtering over a thousand child files, but
will prevent us from recursing into any of the modules themselves.
This maintains the post-recursion filtering of "desired_contents" as
this is how git-backed, environment maintained files are cleaned.
Please add all notable changes to the "Unreleased" section of the CHANGELOG in the format: