-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filebeat: expand double wildcards in prospector #3980
filebeat: expand double wildcards in prospector #3980
Conversation
// glob detects the use of "**" and expands it to standard glob patterns up to a max depth | ||
func glob(g string) ([]string, error) { | ||
var globs []string | ||
if filepath.Base(g) == "**" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean it only detects **
at the end of the path? Does /foo/**/bar
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't realize that was also a use case. Let me fix it, thanks
I'm 👍 with this approach, since I think it's the simplest implementation that gets us what most people want. I'm on the fence about making the 16 configurable or not, but I guess it would be good to be able to set it to 0 in which case the feature is disabled (for BWC). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach that by default we have the old behaviour.
This PR should also have changelog entry and docs.
@@ -112,17 +114,69 @@ func (l *Log) Run() { | |||
} | |||
} | |||
|
|||
// globPattern detects the use of "**" and expands it to standard glob patterns up to a max depth | |||
func (l *Log) globPatterns(pattern string) ([]string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could move this to a separate package and pass the max depth to the function. Reason is like this we could also use it in other places and it does not further "bloat" the prospector code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I like that
filebeat/prospector/config.go
Outdated
Module string `config:"_module_name"` // hidden option to set the module name | ||
Fileset string `config:"_fileset_name"` // hidden option to set the fileset name | ||
Processors processors.PluginConfig `config:"processors"` | ||
DoubleWildcardMaxDepth uint8 `config:"double_wildcard_max_depth"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Where is the default set for this one?
- The config name seems to be a bit long
I would like to have this feature disabled by default and people who need it enable it specifically. This also makes it easier for us to debug it in the future in case of errors, because we know someone enabled it.
I like the idea of defining the max depth as in case someone just needs one or two levels, this would have positive performance implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just assumed the golang type default (0) is used as fallback? Totally agree on the config name: how about double_star_pattern_depth
? Maybe not shorter but simpler..
patternList := strings.Split(pattern, "/") | ||
for i, dir := range patternList { | ||
if len(patterns) > 0 { | ||
if dir == "**" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means test/**/hello/**/abc.log
is not going to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we were discussing with @tsg and if we allow that the complexity will escalate quickly
pattern string | ||
expectedMatches []string | ||
}{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests which have a deeper depth like 4 or 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
} | ||
var patterns []string | ||
isAbs := filepath.IsAbs(pattern) | ||
patternList := strings.Split(pattern, "/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about windows users having \
in their path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I should use os.PathSeparator
instead
} | ||
} | ||
if len(patterns) == 0 { | ||
patterns = []string{pattern} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have any kind of error if pattern is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check on that, I guess we should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so right now we're not doing any validation on the pattern, with this change that wouldn't change wrt an empty pattern. Do we want to start validating those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no, since for example, I use ""
in the filebeat modules to disable a fileset on a particular version.
{ | ||
"foo/**/bazz", | ||
[]string{}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some more tests to add:
- double
/
likefoo//bar
foo\bar
foo\**\bar
c:\foo\**\bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't add the last test since I end up joining all the paths to a root that has a volume in it on windows, via ioutil.TempDir()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a new test just for cases like this
filebeat/input/file/glob.go
Outdated
if len(patterns) > 0 { | ||
if dir == "**" { | ||
err := fmt.Sprintf("glob(%s) failed: cannot specify multiple ** within a pattern", pattern) | ||
logp.Err(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recommend logging errors and returning them. Do one or the other. It's probably best to let the caller make the decision on how to handle the error (which might be to log it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
===== double_star_pattern_depth | ||
|
||
The maximum depth for `**` patterns. This setting is disabled by default, i.e., | ||
`**` is not expanded. When it's set to a positive value, a single `**` per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the limit in single **
per path? With a bit of recursion in the expansion logic I think this could be handled elegantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What worried me and @tsg was how the complexity could easily escalate here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also worried about the number of generated globs. With a max depth of 16, and using **
twice, you need to generate 16*16 patterns, right? If you use **
three times that's already 4096 globs to evaluate, which might be problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a bit mitigated by the fact that we default to zero (disabled) now, but I'd still wait for an actual request/use case for it before opening up to multiple **.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7AC See my comment above. I'm ok with having it fixed at first but that we take into account now, it could appear in the future.
filebeat/input/file/glob.go
Outdated
} | ||
var patterns []string | ||
isAbs := filepath.IsAbs(pattern) | ||
patternList := strings.Split(pattern, string(os.PathSeparator)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.PathSeparator
might not be safe. windows users sometimes use /
for paths (if they didn't get \
working due to not using single-quote), which is working with go as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah interesting, didn't know that. So the windows case is basically a superset, let me handle that too.
filebeat/_meta/common.full.p2.yml
Outdated
@@ -82,6 +82,11 @@ filebeat.prospectors: | |||
# This is especially useful for multiline log messages which can get large. | |||
#max_bytes: 10485760 | |||
|
|||
# Maximum depth for "**" patterns. This setting is disabled by default, i.e., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow expect we will add more config options in the future. So I would suggest to make this a namespace with different config options below. Suggestion:
recursive_glob.enabled: false # set this to true to get the defaults for the below options
recursive_glob.depth: 16 # max depth of expanding **
recursive_glob.max_count: 1 # max number of ** in a single path
This could also solve the discussion about how many patterns we allow per path.
I named it recursive_glob
because that what its doing from my perspective. Not sure what the user would look for when searching for such a config option. I had also recursive_path
because we call it paths
in the config above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed to recursive_glob_depth
for now, I don't think we should encourage users to use more than one per path so the namespace seems overkill to me.
filebeat/prospector/config.go
Outdated
DocumentType: "log", | ||
ScanFrequency: 10 * time.Second, | ||
InputType: cfg.DefaultInputType, | ||
CleanRemoved: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to all the other default configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all the ones that have default values for the golang type (including the new one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly prefer to have all values in here, even if they are the golang defaults. This makes it very easy (and future proof) to lookup what our defaults are without having to think about what the default in Golang is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
filebeat/filebeat.full.yml
Outdated
@@ -211,6 +211,10 @@ filebeat.prospectors: | |||
# This is especially useful for multiline log messages which can get large. | |||
#max_bytes: 10485760 | |||
|
|||
# Maximum depth for "**" patterns. When it's set to a positive value, "**" | |||
# patterns are expanded up to the specified depth. | |||
#recursive_glob_depth: 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we don't want to make it configurable at first, could we go with recursive_glob.enabled
as config option instead? This will allow us to extend it in the future if needed. I'm aware that just recursive_glob: true
would feel more natural at first, but this will prevent us from using it as a namespace in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
### Recursive glob configuration | ||
|
||
# Expand "**" patterns into regular glob patterns. | ||
#recursive_glob.enabled: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally have the default value in the full config, which is false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
Filebeat starts a harvester for each file that it finds under the specified | ||
paths. You can specify one path per line. Each line begins with a dash (-). | ||
|
||
===== recursive_glob | ||
|
||
*`enabled`*:: Enable expanding `**` into recursive glob patterns. With this feature enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we mark this beta first to get some users to test it first? At the same time it rolls out with the 6 alpha release, so we get already some testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add []beta
and it gets expanded with a message automatically in the docs. But lets leave it out. We can still add it.
matches, err := filepath.Glob(glob) | ||
for _, path := range l.config.Paths { | ||
depth := uint8(0) | ||
if l.config.recursiveGlob.enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected to see here recursiveGlob.Enabled()
as this come with ucfg for each "object". Problem is that if we want that we would have to do unpack each time the scan runs which is not a good idea. So this looks good to me.
filebeat/prospector/config.go
Outdated
@@ -45,6 +46,11 @@ type prospectorConfig struct { | |||
Module string `config:"_module_name"` // hidden option to set the module name | |||
Fileset string `config:"_fileset_name"` // hidden option to set the fileset name | |||
Processors processors.PluginConfig `config:"processors"` | |||
recursiveGlob *recursiveGlobConfig `config:"recursive_glob"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not
recursiveGlob bool `config:"recursive_glob.enabled" `
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you add a CHANGELOG entry, squash the commits and add a commit message?
Expand double wildcards into standard glob patterns, up to a maximum depth of 8 levels after the wildcard. Resolves #2084
|
||
Filebeat starts a harvester for each file that it finds under the specified | ||
paths. You can specify one path per line. Each line begins with a dash (-). | ||
|
||
===== recursive_glob | ||
|
||
*`enabled`*:: Enable expanding `**` into recursive glob patterns. With this feature enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7AC This won't render correctly in asciidoc because it will try to evaluate the asterisks before the backticks. Wherever you have asterisks in an example, use plus signs (+) instead of backticks. For example: +**+
. (So replace the backticks with plus signs in the examples.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it thank you!
|
||
Filebeat starts a harvester for each file that it finds under the specified | ||
paths. You can specify one path per line. Each line begins with a dash (-). | ||
|
||
===== recursive_glob | ||
|
||
*`enabled`*:: Enable expanding +**+ into recursive glob patterns. With this feature enabled, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7AC I think my explanation of the behavior of plus signs was a little off in my previous comment. The backticks are OK for the doc build, but the content doesn't render for users who preview the file in GitHub. We aren't optimizing for viewing in GitHub, though, so technically you can use the backticks. But you should be consistent in this entire description (either use backticks or plus signs for any example that contains asterisks in the path). The asterisk doesn't get evaluated either way.
|
||
Filebeat starts a harvester for each file that it finds under the specified | ||
paths. You can specify one path per line. Each line begins with a dash (-). | ||
|
||
===== recursive_glob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7AC You need to add the section ID above this section or the doc build will fail. So add this to line 50:
[[recursive_glob]]
Is this available in 5.6.3 release? |
Expand double wildcards into standard glob patterns, up to a maximum
depth of 16 levels after the wildcard.
Resolves #2084