-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Fix #14021 (Better path matching) #7645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f29f42d to
572dedf
Compare
|
See #7540. I had some issues with the behavior which I have not outlined in that PR yet because the code contains a bug which made the tests all pass although that should not have. Also |
test/cli/proj2_test.py
Outdated
| proj_dir = tmp_path / 'proj2' | ||
| shutil.copytree(__proj_dir, proj_dir) | ||
| create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b']) | ||
| create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b/']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really dislike that we need to rely on some options ending with / so they are treated as a path. That also complicated things while working on my PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think the path matching is currently trying to do too much which makes the syntax weird and difficult to implement. Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.
I dislike it also.. but it is good imho that you try to limit the effects for now.. it's already a big PR anyway.
|
See also https://trac.cppcheck.net/ticket/12268. And if you work on a ticket please assign yourself to it in Trac. |
I only noticed you were also working on PathMatch after I made this, sorry about that. I agree the standard regex is slow and I'd rather use pcre, but I didn't want to make it a hard dependency just for this. For the common use case of PathMatch, which is matching maybe a few hundred files or so against a handful of patterns, the performance penalty of using |
No problem. I started working on it thinking it would a low-hanging fruit which wasn't the case. And I forgot to tag it with the ticket. And the one you worked on was no properly triaged.
Right, PCRE is currently not a hard dependency. So PCRE2 might not be a good replacement.
That might depend on the scenario. If there might be thousands of files it could be slow. It has been ages since I tried using
I will try to get the regex refactoring in. The first step is awkward but with some help we might get it into better shape fast. |
|
With a glob implementation inline in That is probably where my performance concerns might have come from since the latter is also used outside of path handling stuff. |
I mostly agree with this, for the three places where matchglob is currently used for something other than paths it makes sense to keep it that way, all other uses of matchglob should be changed to use PathMatch instead. |
I agree since it makes things more explicit. But I would still prefer if we only have a single implementation of the glob logic. |
|
I've updated this to use PathMatch everywhere a pattern is matched against a file, so that file matching is consistent. While doing so I found that there are places where the performance of PathMatch is actually quite important (especially in suppressions), so I've opted to use a hand-written matcher instead of regex. Another noteworthy change is that directories can now be matched without a trailing '/'. This has the awkward side-effect that 'foo.cpp' matches 'foo.cpp/bar.cpp' when 'foo.cpp' happens to be a directory, and 'foo/' matches 'foo' even if foo is not a directory (enforcing directories for such patterns would require a file system lookup), but I still think it's preferable. Regarding matchglob, I think it makes sense to have a more general wildcard implementation in matchglob, and a separate implementation in PathMatch because I'm not convinced that the rules for path globs should be the same as for matching general wildcards in error id's and such. Path globs need to match on path component boundaries, and '*' and '?' do not match path separators. For matching path separators as well, '**' is used instead. These changes fix #12821, #12268, and #13997. Possibly related: #13983. |
test/cli/proj2_test.py
Outdated
| proj_dir = tmp_path / 'proj2' | ||
| shutil.copytree(__proj_dir, proj_dir) | ||
| create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b']) | ||
| create_gui_project_file(os.path.join(tmp_path, 'test.cppcheck'), root_path='proj2', import_project='proj2/proj2.sln', exclude_paths=['b/']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would like it to make it simpler but I've been hesitant to do so because I don't know what might break because of it.
I dislike it also.. but it is good imho that you try to limit the effects for now.. it's already a big PR anyway.
test/testcmdlineparser.cpp
Outdated
| ASSERT(!fillSettingsFromArgs(argv)); | ||
| ASSERT_EQUALS(1, parser->getIgnoredPaths().size()); | ||
| ASSERT_EQUALS("src/file.cpp", parser->getIgnoredPaths()[0]); | ||
| ASSERT_EQUALS("src\\file.cpp", parser->getIgnoredPaths()[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho it is preferable to always use / as path separator internally. we don't have to ensure that backslash is handled properly everywhere then. would it cause problems to keep this behavior?
|
The ticket and PR is called "GUI: exclude folders in imported project" I think this PR does so much more. Either I think we should create a new ticket that better describes the work in this PR. Or we should rename the ticket+PR.. |
|
Here's an overview of the current state of this PR. Since the scope has grown beyond the original ticket I think it'd be appropriate to open a new ticket. Summary of changes:
New PathMatch behavior:All match patterns and paths are canonicalized during matching. Absolute patterns are matched from the start until a path component boundary in the canonicalized path. Relative patterns must be explicit by having their first path component be Other patterns, not explicitly absolute or relative, are matched between any two path component boundaries in the canonicalized path. This is working as before, except matching the middle of a path previously required a trailing slash. Also, the matching is now always done on the canonical absolute path of the specified path name, where previously a match could depend on whether a filename was given as relative or absolute (i.e. Patterns can use globs: |
|
I changed my mind; take the manual in a separate PR. The manual needs to describe the path matching much better. |
|
|
It's aggressive to merge this now so soon before the release but the old behavior is biting customers; I want to have this in 2.18. |
#7645 added support for `**` matching path separators, but didn't update `isValidGlobPattern` for `--suppress` to enable it. This also updates test cases and makes `isValidGlobPattern` more correct and robust overall. In particular multiple consecutive `?` for a glob is absolutely valid and frequently useful to ensure exactly or at least some number of characters. --------- Co-authored-by: Joel Johnson <mrjoel@stellarscience.com> Co-authored-by: Daniel Marjamäki <daniel.marjamaki@gmail.com>
This is an addendum to #7645 that adds the ability to specify directory-only matching in PathMatch patterns with a trailing slash. I originally didn't add this because I wanted to keep PathMatch purely syntactic, and this feature seemed like it would require filesystem calls in the PathMatch code. This PR solves that problem by lifting the responsibility of file mode checking to the caller, thus keeping PathMatch purely syntactic while still supporting directory-only matching. Previously `/test/foo/` would match `/test/foo` even if `foo` is a regular file. With this change the caller can specify the file mode of the file named by the provided path, and `/test/foo` will only match when the file mode is directory. The semantics of patterns that do not have a trailing slash is unchanged, e.g. `/test/foo` still matches `/test/foo/` and `/test/foo/bar.cpp`, regardless of the file mode. Also adds some more tests.



No description provided.