-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve pack performance, implement tree pruning #21
Conversation
On a large repository (2^17 files), this plus an appropriate `.terraformignore` improves performance from ~60 seconds to ~3 seconds.
@@ -149,14 +149,21 @@ func packWalkFn(root, src, dst string, tarW *tar.Writer, meta *Meta, dereference | |||
return nil | |||
} | |||
|
|||
if m := matchIgnoreRule(subpath, ignoreRules); m { | |||
return nil | |||
if m, r := matchIgnoreRule(subpath, ignoreRules); m && !(r.val == "**/*" && info.IsDir()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly happy about having to include **/*
in here, but it makes it possible to write ignorefiles like:
.git/
node_modules/
*
!*.tf
!*.json
!*.hcl
and get only the tf
, json
and hcl
files. Without this, the *
matches all directories and prevents recursion into anything that isn't explicitly allowed.
match, _ := rule.match(path) | ||
|
||
if match { | ||
matched = !rule.excluded | ||
if rule.excluded { | ||
return false, rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This precedence matches git
's behavior, in that git
will see !
-prefixed lines as overriding other inclusive rules.
@@ -200,9 +206,13 @@ var defaultExclusions = []rule{ | |||
excluded: false, | |||
}, | |||
{ | |||
val: filepath.Join("**", ".terraform", "**"), | |||
val: filepath.Join("**", ".terraform", "?**"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to actually discovery .terraform/modules
, we have to explicitly allow recursion into .terraform
but not anything under it.
for idx := range rules { | ||
rule := &rules[idx] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't know if there's an idiomatic go way to express this, but this is a pretty significant performance improvement
@brandonc is there anything concerning you in this PR? |
@felipeplets The package has changed quite a bit since this PR was developed so it needs to be revisited. Aside from that, my main concern with this PR is the seemingly magic rule Imagine trying to prevent recursion within a large directory:
These kinds of rules works today but would stop working based on my understanding! This is evident in the special This suggestion does make me wonder if the rules could be defined by the parser such that the directory walk could be short-circuited if there are no exclusions defined that would pertain to an ignored directory. Character classes and globbing patterns make this a little less than simple to determine. I could also imagine using simplified matching if no exclusion rules exist, but right now there's an exclusion rule included by default. I think this is similar to how gitignore optimizes for many usecases. |
I have created a second proposal which can optimize the recursive descent if absolutely no exclusion rules follow a matching .terraformignore rule on a subdirectory #50 |
@skeggse @felipeplets I merged #50 (but have not yet integrated it with Terraform). It addressed the problem a little more narrowly but avoids special cases. The bottom line is this: In the future, if you must use negation rules in your .terraformignore file, place them as high as possible on the list of rules. I was able to optimize for the situation where a directory can be fully ignored if no negation rules follow. Consider:
With the negated rule present, .terraformignore cannot rule out the entire
Now .terraformignore can avoid descending into node_modules since there are no rules that could possibly include files that follow it. |
On a large repository (2^17 files, mostly ignored by git), this plus an appropriate
.terraformignore
improves performance from ~60 seconds to ~3 seconds.Fixes #19 and #20.