Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve pack performance, implement tree pruning #21

Closed
wants to merge 1 commit into from

Conversation

skeggse
Copy link

@skeggse skeggse commented Jul 21, 2021

On a large repository (2^17 files, mostly ignored by git), this plus an appropriate .terraformignore improves performance from ~60 seconds to ~3 seconds.

Fixes #19 and #20.

On a large repository (2^17 files), this plus an appropriate
`.terraformignore` improves performance from ~60 seconds to ~3 seconds.
@hashicorp-cla
Copy link

hashicorp-cla commented Jul 21, 2021

CLA assistant check
All committers have signed the CLA.

@@ -149,14 +149,21 @@ func packWalkFn(root, src, dst string, tarW *tar.Writer, meta *Meta, dereference
return nil
}

if m := matchIgnoreRule(subpath, ignoreRules); m {
return nil
if m, r := matchIgnoreRule(subpath, ignoreRules); m && !(r.val == "**/*" && info.IsDir()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not terribly happy about having to include **/* in here, but it makes it possible to write ignorefiles like:

.git/
node_modules/
*
!*.tf
!*.json
!*.hcl

and get only the tf, json and hcl files. Without this, the * matches all directories and prevents recursion into anything that isn't explicitly allowed.

match, _ := rule.match(path)

if match {
matched = !rule.excluded
if rule.excluded {
return false, rule
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This precedence matches git's behavior, in that git will see !-prefixed lines as overriding other inclusive rules.

@@ -200,9 +206,13 @@ var defaultExclusions = []rule{
excluded: false,
},
{
val: filepath.Join("**", ".terraform", "**"),
val: filepath.Join("**", ".terraform", "?**"),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to actually discovery .terraform/modules, we have to explicitly allow recursion into .terraform but not anything under it.

Comment on lines +80 to +81
for idx := range rules {
rule := &rules[idx]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know if there's an idiomatic go way to express this, but this is a pretty significant performance improvement

@felipeplets
Copy link

@brandonc is there anything concerning you in this PR?

@brandonc
Copy link
Contributor

brandonc commented Nov 22, 2023

@felipeplets The package has changed quite a bit since this PR was developed so it needs to be revisited. Aside from that, my main concern with this PR is the seemingly magic rule **/*

Imagine trying to prevent recursion within a large directory:

**/*
!src/this/one/folder

These kinds of rules works today but would stop working based on my understanding! This is evident in the special .terraform/?** default rule that had to be carved out. I'm afraid this kind of change would break many existing .terraformignore files that contain ! exclusions.

This suggestion does make me wonder if the rules could be defined by the parser such that the directory walk could be short-circuited if there are no exclusions defined that would pertain to an ignored directory. Character classes and globbing patterns make this a little less than simple to determine. I could also imagine using simplified matching if no exclusion rules exist, but right now there's an exclusion rule included by default. I think this is similar to how gitignore optimizes for many usecases.

@brandonc
Copy link
Contributor

brandonc commented Nov 23, 2023

I have created a second proposal which can optimize the recursive descent if absolutely no exclusion rules follow a matching .terraformignore rule on a subdirectory #50

@brandonc
Copy link
Contributor

brandonc commented Dec 4, 2023

@skeggse @felipeplets I merged #50 (but have not yet integrated it with Terraform). It addressed the problem a little more narrowly but avoids special cases. The bottom line is this: In the future, if you must use negation rules in your .terraformignore file, place them as high as possible on the list of rules. I was able to optimize for the situation where a directory can be fully ignored if no negation rules follow. Consider:

node_modules/
logs/**/*
!logs/production/**/*

With the negated rule present, .terraformignore cannot rule out the entire node_modules directory because there might be a "logs/production/" directory inside there somewhere. But, since rules are designed to be allowed to override previous rules, if you move that negation above node_modules, it can now be safely ignored.

logs/**/*
!logs/production/**/*
node_modules/

Now .terraformignore can avoid descending into node_modules since there are no rules that could possibly include files that follow it.

@brandonc brandonc closed this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid entering excluded directories
4 participants