Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CODEMGMT-1299) Do not recurse into recursive globbed exclusions when purging #1178

Merged
merged 1 commit into from
Jul 7, 2021

Conversation

justinstoller
Copy link
Member

When we purge at the environment level we create recurse globs of our
Puppetfile managed directories, eg "/modules/**/*".

Previous to this patch we would collect a list of all content in the
environment and then match them against this recursive glob using
File.fnmatch?. This means we could be doing fnmatches against hundreds
of thousands of then-filtered results which can become prohibitively
slow.

This patch removes the usage of Dir.glob to retrieve all files and
instead manually recurses into child directories if set to recurse and
the child does not match either the purge_exclusions or whitelist globs.
However, because we match using a glob something like
"/modules/**/*" won't match "/modules", but
instead will filter each module within "/modules". In large
deployments this may mean filtering over a thousand child files, but
will prevent us from recursing into any of the modules themselves.

This maintains the post-recursion filtering of "desired_contents" as
this is how git-backed, environment maintained files are cleaned.

Please add all notable changes to the "Unreleased" section of the CHANGELOG in the format:

- (JIRA ticket) Summary of changes. [Issue or PR #](link to issue or PR)

@justinstoller justinstoller requested a review from a team July 2, 2021 06:06
@justinstoller
Copy link
Member Author

justinstoller commented Jul 2, 2021

I still need to functionally test this against a painfully large Puppetfile, but would love feedback on the approach. Also, I'm out Friday, so if someone wants to take this over, I'm fine with that.

@justinstoller
Copy link
Member Author

Note: I thought we removed usage of whitelist throughout our code, preferring allowlist instead. However, it looks like we just changed the external facing settings? Given that I extended our usage of whitelist to be consistent with the existing code. I assume if we want to change the internal terminology it needs another ticket?

@justinstoller justinstoller changed the title (maint) Do not recurse into recursive globbed exclusions when purging (CODEMGMT-1299) Do not recurse into recursive globbed exclusions when purging Jul 2, 2021
@justinstoller
Copy link
Member Author

And now with the ticket number in the commit message

@Magisus Magisus self-assigned this Jul 2, 2021
Copy link
Collaborator

@Magisus Magisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticket to remove whitelist etc. in the code base is https://tickets.puppetlabs.com/browse/RK-368.

else
child.to_s
end
end.compact
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, we should never get nils in here, but maybe I'm missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch. There was a version that did and needed this but I updated it without removing the additional compaction.

Copy link
Collaborator

@Magisus Magisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! Hopefully it speeds things up. If I have time today, I'll try it with a big Puppetfile.

@Magisus
Copy link
Collaborator

Magisus commented Jul 3, 2021

Started testing this, but don't have "after" numbers yet.

@Magisus
Copy link
Collaborator

Magisus commented Jul 6, 2021

I think there's a bug in this, it seems to have cleaned up something it shouldn't have...

@justinstoller
Copy link
Member Author

This does speed things up considerably, in my testing a run that took a little over 7 minutes down to 8 seconds.
I've also reproduced the issue Maggie saw. IIUC, it arises from the refactoring around desired contents which are treated differently than exclusions. Looking further into it.

@justinstoller justinstoller force-pushed the CODEMGMT-1299 branch 2 times, most recently from 0e4d7f7 to 2908986 Compare July 7, 2021 04:50
# Directory must start with a dot and then have a non-dot character, so the *nix
# special directories '.', and '..' are invalid. Matching path must have the
# dot-prepended segment as an interior segment (not at the end of the path).
HIDDEN_SUBPATH = Regexp.new("#{File::SEPARATOR}\\.[^.]+#{File::SEPARATOR}.*")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does FNM_DOTMATCH not cover this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference in behavior stemmed from them not passing FNM_DOTMATCH into Dir.glob previously. I've updated the code this morning, which I think is less convoluted. Let me know what you think!

@justinstoller
Copy link
Member Author

Testing this again last night I got times of 3:40 purging w/o this patch & 3 seconds with it. So similar percent improvement but smaller total times, I guess based on what my disk was otherwise doing?

I've added some test fixtures that should cause the existing tests to flex the fix I've added. Let me know if I should add more. I haven't manually verified with my refactor this morning, doing so now. Would love it if you could manually verify with your reproducer as well, Maggie.

@Magisus
Copy link
Collaborator

Magisus commented Jul 7, 2021

Will do!

@justinstoller
Copy link
Member Author

Things are looking good locally with my scripts against the last refactor!

Copy link
Collaborator

@Magisus Magisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an environment that is almost all module content (https://github.com/Magisus/control-repo/tree/big_boy), this reduces non-initial deploy times from ~50s to ~17s 👍 The bug I mentioned previously is gone.

logger.debug2 _("Not purging %{path} due to internal exclusion match: %{exclusion_match}") % {path: path, exclusion_match: exclusion_match}
elsif allowlist_match = allowed_globs.find { |allowed| matches?(allowed, path) }
logger.debug _("Not purging %{path} due to whitelist match: %{allowlist_match}") % {path: path, allowlist_match: allowlist_match}
elsif desired_match = desireds_not_to_recurse_into.grep(path).first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be

else
  desired_match = desireds ...
end

I think that might be a little clearer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the instinct to match the pattern of the other cases, which do assignment in the check, but that's sometimes frowned upon in itself, so doing it here where there is no body to the check seems even less good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought I pushed that change up before lunch. Sorry.

# @return [Array<String>] Directory contents that are present but not expected
def stale_contents(recurse, exclusions, whitelist)
fn_match_opts = File::FNM_PATHNAME | File::FNM_DOTMATCH
dirs = self.managed_directories
desireds = self.desired_contents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't love "desireds" 😛 But I don't have an alternative that isn't really wordy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it should be "desires" but I didn't want to write a bunch of code about my hidden_desires.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds weird to me too, anyway. Oh well, adjectives can be nouns whenever we want, it's fine :P

… purging

When we purge at the environment level we create recurse globs of our
Puppetfile managed directories, eg "<environment>/modules/**/*".

Previous to this patch we would collect a list of all content in the
environment and then match them against this recursive glob using
`File.fnmatch?`. This means we could be doing fnmatches against hundreds
of thousands of then-filtered results which can become prohibitively
slow.

This patch removes the usage of Dir.glob to retrieve all files and
instead manually recurses into child directories if set to recurse and
the child does not match either the purge_exclusions or whitelist globs.
However, because we match using a glob something like
"<environment>/modules/**/*" won't match "<environment>/modules", but
instead will filter each module within "<environment>/modules". In large
deployments this may mean filtering over a thousand child files, but
will prevent us from recursing into any of the modules themselves.

This maintains the post-recursion filtering of "desired_contents" as
this is how git-backed, environment maintained files are cleaned.
@Magisus
Copy link
Collaborator

Magisus commented Jul 7, 2021

I think the testing here is fine, looks like we correctly show the hidden dir as "managed" and then proceed to no purge it, and similar tests for other scenarios 👍

@Magisus Magisus merged commit fedc81a into puppetlabs:main Jul 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants