-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(php): Support case insensitive function calls and classes #8356
feat(php): Support case insensitive function calls and classes #8356
Conversation
5c63607
to
dbcf2ff
Compare
📸 The pytest shapshots changed in your PR.
|
🚫 The whole benchmark suite is too slow: +10.5% (+1.105 s) 14 benchmarks, 10.5% slower on average. Individual deviations greater than 20% from the baseline are reported. An individual performance degradation of over 30% or a global degradation of over 7% is an error and will block the pull request. See run output for full results ('Show all checks' > 'Tests / semgrep benchmark tests' 'Details'). |
I think these changes to the python output are fine. I am pretty sure they are caused by the update to |
The tests that are failing are the ones that I have an approved pull request in semgrep-rules for. I just can't land that PR until those tests pass on develop, which requires I land this diff first. |
@akuhlens Thanks for the awesome work! Just out of curiosity, what were the major benefits of introducing bit encoding instead of using two boolean record fields? I might be missing some context, but it appears a bit over-engineered to me. |
I probably wouldn't have done it that way at first either, but there was some concern on the team that adding more boolean flags to such a prevalent node in the AST might have performance implications. I went ahead and provided the optimized representation since there were concerns. As a side note, the language specific approach in this discussion ended up not working which is why there is no naive diff. |
You can update the semgrep-rules submodule in this repo to point to your experimental semgrep-rules, so that we can be sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works if the metavariable binds to an expression rather than directly
to an idenifier.
For example if you have the pattern $X == $X,
and you have 1 + bar() == 1 + Bar()
I don't think your PR will return a match.
It's because for that we call MV.Structural.equal_mvalue a b,
and I don't think we do anything with case sensitivity there.
.../test_semgrep_core_parse_error/test_file_parser__failure__error_messages/settings1/error.txt
Outdated
Show resolved
Hide resolved
...s/test_semgrep_core_parse_error/test_file_parser__failure__error_messages/settings1/out.json
Outdated
Show resolved
Hide resolved
To be fair, this is an important improvement about what we had before; maybe my $X == $X match of complex expression should be done in a separate PR. |
@akuhlens Maybe fix the failing tests, and add a giant TODO somewhere (maybe in equal_bind_ast case) that equality between metavar expression is still done in a case-sensitive way (because we use the derived eq). In theory we should use Generic_vs_generic too when we check for equality of metavars ... In any case this looks like progress to what we had before |
@aryx Sorry, I am just seeing these last two comments. Maybe I need to adjust my notification settings to give github notification more visibility, or perhaps they were here when I replied to your comments and I just got caught up with the bug you noticed. Either way I will take steps to make sure it doesn't happen again. I did see your comment in slack on Monday but have been out sick with covid until today. I will fix the other things you mentioned and leave a TODO and commented out test case for the derived equality case. |
06a484e
to
1ce06fb
Compare
🚫 Benchmark semgrep.bench.dropbox.std is too slow: +48.3% (+0.569 s) 🚫 Benchmark semgrep.bench.0c34.std is too slow: +45.7% (+0.563 s) 🚫 Benchmark semgrep.bench.lodash.std is too slow: +35.7% (+0.619 s) 🚫 Benchmark semgrep.bench.coolMenu.std is too slow: +47.1% (+0.551 s) 🚫 Benchmark semgrep.bench.grpc.std is too slow: +33.9% (+0.561 s) 🚫 The whole benchmark suite is too slow: +27.4% (+1.274 s) 14 benchmarks, 27.4% slower on average. Individual deviations greater than 20% from the baseline are reported. An individual performance degradation of over 30% or a global degradation of over 7% is an error and will block the pull request. See run output for full results ('Show all checks' > 'Tests / semgrep benchmark tests' 'Details'). |
🚫 Benchmark semgrep.bench.dropbox.std is too slow: +53.6% (+0.650 s) 🚫 Benchmark semgrep.bench.0c34.std is too slow: +50.6% (+0.640 s) 🚫 Benchmark semgrep.bench.lodash.std is too slow: +41.2% (+0.730 s) 🚫 Benchmark semgrep.bench.coolMenu.std is too slow: +53.1% (+0.646 s) 🚫 Benchmark semgrep.bench.grpc.std is too slow: +40.0% (+0.683 s) 🚫 The whole benchmark suite is too slow: +30.6% (+1.306 s) 14 benchmarks, 30.6% slower on average. Individual deviations greater than 20% from the baseline are reported. An individual performance degradation of over 30% or a global degradation of over 7% is an error and will block the pull request. See run output for full results ('Show all checks' > 'Tests / semgrep benchmark tests' 'Details'). |
…ep#8356) ## What Adds matching support for languages that have case insensitive identifiers and demonstrates their usage for Php. closes semgrep#7231 ## How Adds a boolean field to `id_info` fields and updates `Generic_vs_generic.ml` and `Matching_generic.ml` to respect these fields. I originally thought it would be easier to add a special case for Php in matching, but extending `Matching_generic.ml` to be language specific becomes troublesome because `equal_ast_bound_code` is called from outside this submodules in contexts that could possibly be addressing variables for multiple languages (or at least that was my take on the situation and types at play). ## Remaining Work To Do This was shared as a draft to communicate my work and get feedback about how to add the bitfield to id_info and update the submodule. - [x] Modify `id_info` to contain a `id_flags` field that contains a bitfield instead of having two separate boolean fields (id_case_insensitive, and id_hidden). - [x] Clean up code and document purpose. - [x] Add change log entry. - [x] Submit pull request for `semgrep-rules` submodule and update submodule to point to main branch again. (Not actually sure the correct order to do this without breaking things). [(ongoing here)](semgrep/semgrep-rules#3013) ## Testing Adds a few test cases matching against identifiers and metavariables in a case insensitive fashion and updates `tests/semgrep-rules` that were disabled due lack of support for this. Note, `tests/semgrep-rules` is currently pointing to a branch that I need to open a pull request for. PR checklist: - [x] Purpose of the code is [evident to future readers](https://semgrep.dev/docs/contributing/contributing-code/#explaining-code) - [x] Tests included or PR comment includes a reproducible test plan - [x] Documentation is up-to-date - [x] A changelog entry was [added to changelog.d](https://semgrep.dev/docs/contributing/contributing-code/#adding-a-changelog-entry) for any user-facing change - [x] Change has no security implications (otherwise, ping security team) If you're unsure about any of this, please see: - [Contribution guidelines](https://semgrep.dev/docs/contributing/contributing-code)! - [One of the more specific guides located here](https://semgrep.dev/docs/contributing/contributing/)
What
Adds matching support for languages that have case insensitive identifiers and demonstrates their usage for Php.
closes #7231
How
Adds a boolean field to
id_info
fields and updatesGeneric_vs_generic.ml
andMatching_generic.ml
to respect these fields. I originally thought it would be easier to add a special case for Php in matching, but extendingMatching_generic.ml
to be language specific becomes troublesome becauseequal_ast_bound_code
is called from outside this submodules in contexts that could possibly be addressing variables for multiple languages (or at least that was my take on the situation and types at play).Remaining Work To Do
This was shared as a draft to communicate my work and get feedback about how to add the bitfield to id_info and update the submodule.
id_info
to contain aid_flags
field that contains a bitfield instead of having two separate boolean fields (id_case_insensitive, and id_hidden).semgrep-rules
submodule and update submodule to point to main branch again. (Not actually sure the correct order to do this without breaking things). (ongoing here)Testing
Adds a few test cases matching against identifiers and metavariables in a case insensitive fashion and updates
tests/semgrep-rules
that were disabled due lack of support for this. Note,tests/semgrep-rules
is currently pointing to a branch that I need to open a pull request for.PR checklist:
If you're unsure about any of this, please see: