Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megalinter 7.x _REGEX path handling seems not fully compatible with 6.x #2744

Closed
pfiaux opened this issue Jun 13, 2023 · 25 comments · Fixed by #2917
Closed

Megalinter 7.x _REGEX path handling seems not fully compatible with 6.x #2744

pfiaux opened this issue Jun 13, 2023 · 25 comments · Fixed by #2917
Labels
bug Something isn't working

Comments

@pfiaux
Copy link

pfiaux commented Jun 13, 2023

Describe the bug
We're using regex to limit which paths specific linters run on (monorepo setting):

teamA/
teamB/
teamC/
scripts/

So our linter configs might look like this:

GO_REVIVE_FILTER_REGEX_INCLUDE: ^(teamA|teamC)
PYTHON_FLAKE8_FILTER_REGEX_INCLUDE: ^(teamB|scripts)
JAVA_CHECKSTYLE_REGEX_INCLUDE: ^(teamA|teamB)

We use ^ in front to limit it to root folders, for example:

  • scripts/* files will be covered by the python linter
  • teamA/scripts/* shall not be covered by the python linter

To Reproduce
Assuming there are go file that would trigger a revive lint errors in:

  • test/guilty.go
  • teamA/test/guilty.go
  1. Set GO_REVIVE_FILTER_REGEX_INCLUDE: ^(test)
  2. Run linter for both files test/guilty.go teamA/test/guilty.go

For 6.22 only test/guilty.go shall remain after running the filters.

For 7.x no files remain after running the filters.

Expected behavior
The ability to use to filter root of workspace level folders by linter in v7,
ideally via _FILTER_REGEX_INCLUDE to be backward compatible with v6.
(another solution that achieves the same goal would also work)

Additional context
We're running mega-linter in ci (using the image with everything), and passing it a list of files to lint
determined by our pipeline.

I feel this could be due to the switch from relative paths to absolute paths, which would cause the ^teamA/... to not match if the path is /tmp/.../teamA/...

@pfiaux pfiaux added the bug Something isn't working label Jun 13, 2023
@nvuillam
Copy link
Member

@pfiaux you're right, it is probably related to relative paths management

Did you try to send the list of files with relative path format ?

Best regards

@pfiaux
Copy link
Author

pfiaux commented Jun 19, 2023

I just checked, looks like I'm already sending the files with relative paths in the config:

MEGALINTER_FILES_TO_LINT:
- teamA/mega-linter.yaml
- scripts/ci-lint.sh
- teamB/product1/main.go

The outputs from megalinter such as MegaLinter now collects the files to analyse or Kept files before applying linter filters: seem to already convert to absolute paths.

@pfiaux
Copy link
Author

pfiaux commented Jun 19, 2023

I'm also fine with another solution even if it's not compatible with the v6 config, as long as we can filter by base path :)

Sticking an absolute path in the config would be problematic tho, for us the absolute path isn't fixed and dependents on which linter instance will pickup the job (we run multiple replicas).

@nvuillam
Copy link
Member

I agree that it's an issue...

Would you mind sharing the full log with LOG_LEVEL: DEBUG in .mega-linter.yml ?

If sensitive you can eventually send it to me to nicolas.vuillamy@gmail.com

@github-actions
Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity.
It will be closed in 14 days if no further activity occurs.
Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

@github-actions github-actions bot added the O: stale 🤖 This issue or pull request is stale, it will be closed if there is no activity label Jul 21, 2023
@pfiaux
Copy link
Author

pfiaux commented Jul 24, 2023

@nvuillam did you get my email / have a chance to take a look? Let me know if there's something I can look into from my side (we're stuck on v6 until we find a workaround/solution)

@github-actions github-actions bot removed the O: stale 🤖 This issue or pull request is stale, it will be closed if there is no activity label Jul 25, 2023
@pfiaux
Copy link
Author

pfiaux commented Jul 25, 2023

I retested this with 7.2.0 today, results are still the same:

MegaLinter now collects the files to analyse
All found files before filtering:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go
- File extensions: , .bash, .bazel, .bzl, .dash, .go, .java, .ksh, .mk, .pl, .pm, .py, .sh, .t, .yaml, .yml
- File names (regex): BUILD, Dockerfile, Makefile, WORKSPACE

...

Kept [2] files on [2] found files
Kept files before applying linter filters:
- /tmp/lint/linter-build-0/repo/ci/mega-linter.yaml
- /tmp/lint/linter-build-0/repo/tools/notifier/cmd/root.go

...

[Filters] {'name': 'GO_REVIVE', 'filter_regex_include': '^(examples|ci|tools)', 'filter_regex_exclude_descriptor': None, 'filter_regex_exclude_linter': '^(tools/test|tools/other)', 'files_sub_directory': None, 'lint_all_files': False, 'lint_all_other_linters_files': False, 'file_extensions': ['.go'], 'file_names_regex': [], 'file_names_not_ends_with': [], 'file_contains_regex': [], 'file_contains_regex_extensions': []}
GO_REVIVE linter kept 0 files after applying linter filters:
[Filters] {'name': 'JAVA_CHECKSTYLE',...

@pfiaux
Copy link
Author

pfiaux commented Jul 25, 2023

The filters for linters are applied in collect_filesfrom what I can tell.

In utils.filter_files it would seem the regex is run on both absolute and relative files:
https://github.com/oxsecurity/megalinter/blob/main/megalinter/utils.py#L154-L158

In my case it's using files_to_lint as input:
https://github.com/oxsecurity/megalinter/blob/main/megalinter/MegaLinter.py#L633C6-L633C6
I could be wrong but it looks like the files are already made absolute paths before being passed to utils.filter_files

Would it work if collect_files() used as is paths for files_to_lint? Or pass the raw files_to_lint to utils.filter_files() which should handle relative/absolute path?

@nvuillam
Copy link
Member

@pfiaux I apologize for the delay, the summer was full and not a lot of time for advanced issues :/

Any change you can reproduce the issue locally ?

Or what if I add a custom debug variable that would display the values of variables in utils.py when filtering ?

@pfiaux
Copy link
Author

pfiaux commented Aug 22, 2023

Well I can reproduce it in our CI, locally I'm on macOS with the virtualization performance it would take me ages to run it locally.

I will bump to v7.3.0 and try again but from the changelog I don't think you change the paths so I expect I can reproduce it. From digging into the code I suspect the linter filters are passed absolute paths and so their relative path handling isn't leveraged at all.

A debug statement might help confirm that, I wonder if there's a faster way to avoid too many loops between you add something and I test it. I need to check if I can add a debug myself or even maybe attempt a patch.

@pfiaux
Copy link
Author

pfiaux commented Aug 23, 2023

I can confirm still happens in v7.3.0.

@Kurt-von-Laven
Copy link
Collaborator

It is definitely possible to edit the MegaLinter Python code directly in a MegaLinter Docker container and iterate more quickly that way.

@pfiaux
Copy link
Author

pfiaux commented Aug 25, 2023

I'm having trouble testing in the large image in CI (nonroot image and other CI hardening) but I was able to reproduce it with a small gist locally.

I suggest the following change as it seems to make it work from the perspective of the filtering:

diff --git a/megalinter/MegaLinter.py b/megalinter/MegaLinter.py
index 8bcb894d5e..261394bb0d 100644
--- a/megalinter/MegaLinter.py
+++ b/megalinter/MegaLinter.py
@@ -630,7 +630,7 @@ class Megalinter:
             all_files = list()
             for file_to_lint in files_to_lint:
                 if os.path.isfile(self.workspace + os.path.sep + file_to_lint):
-                    all_files += [self.workspace + os.path.sep + file_to_lint]
+                    all_files += [file_to_lint]
                 else:
                     logging.warning(
                         "[File listing] Input file "

See here for a small example that should be easy to reproduce locally:
https://gist.github.com/pfiaux/5b2d6a37831217816551ff74993caf48

@nvuillam
Copy link
Member

This part has been tricky when switching from absolute to related paths in files... i'm afraid to break everything ^^

What if I add a config variable that would activate such behavior only if set to true ?

@pfiaux
Copy link
Author

pfiaux commented Aug 26, 2023

I figured as it sits pretty early in the processing the impact could have a large blast radius. A config option would work for our use case. If you're willing to add that we'll be very happy to upgrade to V7 and to provide feedback if we run into other issues because of that option.

The other alternative I thought of of would be to keep a list of both, say:

  • all_files_absolute (or keep that one named all_files for compatibility)
  • all_files_relative
    Then in utils.py the filter could check the regex against both. But that's probably messier than a config option and keeping a single list of files. So it's not exactly better.

@nvuillam
Copy link
Member

nvuillam commented Aug 26, 2023

@pfiaux I think I just understood your use case, which is not so usual :)

Do you confirm you send yourself the list of files using variable MEGALINTER_FILES_TO_LINT ?

@pfiaux
Copy link
Author

pfiaux commented Aug 26, 2023

Correct, we set MEGALINTER_FILES_TO_LINT via the config.
I detailed it in #2701 (comment), our pipeline runs our own change detection which in some cases differs from the way megalinter does it, so for consistency we use our list of files.

@nvuillam
Copy link
Member

@pfiaux I think I just forgot to take this case in account when switching to relative file paths :) ( #1877 )

So I think no need for a specific workaround variable... it's just a bug :D

I'll post here once merged so you'll be able to confirm with beta version that it's ok :)

nvuillam added a commit that referenced this issue Aug 26, 2023
* Fix v7 issue when using MEGALINTER_FILES_TO_LINT

Fixes #2744

* cspell
@nvuillam
Copy link
Member

@pfiaux please can you check with beta version?

@pfiaux
Copy link
Author

pfiaux commented Aug 28, 2023

Rolling it out now, did a quick test and it was working as expected so I think we're good. Thanks for the help 🙇

@nvuillam
Copy link
Member

@pfiaux great, so you'll be able to upgrade to v7 at the next release :) ( should be in 2 weeks maximum )

@ashokm
Copy link
Contributor

ashokm commented Sep 8, 2023

Is there an ETA on when the next release is?
I'm looking forward to the next release after the current v7.3.0 as we're also seeing that the CLOUDFORMATION_CFN_LINT (cfn-lint) linter keeps including 'cdk.out/' files in the linting, even though this directory is in the .gitignore file and should be ignored.
We did not see this problem in v6.

@nvuillam
Copy link
Member

nvuillam commented Sep 8, 2023

@ashokm 2 weeks maximum :)

@ashokm
Copy link
Contributor

ashokm commented Sep 8, 2023

I just tested against beta and I get the same issue as mentioned above, with the CLOUDFORMATION_CFN_LINT (cfn-lint) linter still including 'cdk.out/' files in the linting, even though this directory is in the .gitignore file and should be ignored :/

@nvuillam
Copy link
Member

nvuillam commented Sep 8, 2023

@ashokm if this is another problem, you should declare it in another issue :)

sanmai-NL pushed a commit to sanmai-NL/megalinter that referenced this issue Oct 4, 2023
* Fix v7 issue when using MEGALINTER_FILES_TO_LINT

Fixes oxsecurity#2744

* cspell
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants