Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: github-env: use tree-sitter queries #354

Merged
merged 10 commits into from
Dec 25, 2024
Merged

feat: github-env: use tree-sitter queries #354

merged 10 commits into from
Dec 25, 2024

Conversation

woodruffw
Copy link
Owner

@woodruffw woodruffw commented Dec 25, 2024

Very WIP.

This experimentally replaces the stack-of-nodes approach with hand-written TS queries, which can be more precise about matches and their sub-captures (e.g. allowing us to match only when the RHS is actually a variable expansion).

It also adds some examples of false positives that the previous approach flagged. These FPs aren't likely to occur in the wild, however, so the direct benefit for actual users is pretty small.

The larger impact here is indirect:

  • By switching to TS queries, we get full access to match and capture-level spanning information. We had this indirectly with the stack-of-nodes approach, but the query language makes it even simpler by giving us capture names 🙂
  • This unblocks a bugfix for [BUG]: False positive: github-env on trivial echos #234 which would be challenging to generalize with the previous approach -- we could do it one-off, but doing things at the TS query level gives us more generality.

At the same time, there are some downsides to this approach:

  • It's not as elegant (IMO) as the previous approach.
  • It's probably slower both at startup and runtime, since each TS query needs to be parsed, evaluated, etc.
  • Writing TS queries is a less-ideal DX, since they're a new DSL that isn't highlighted and aren't super easy to debug (I mostly use the tree-sitter playground).

CC @ubiratansoares for thoughts on this approach -- I think it makes things somewhat easier/more general overall, but it has downsides. I'm curious what you think about these tradeoffs 🙂

(I also still need to do the same for the PowerShell codepath.)

Signed-off-by: William Woodruff <william@yossarian.net>
@woodruffw woodruffw added the enhancement New feature or request label Dec 25, 2024
@woodruffw woodruffw self-assigned this Dec 25, 2024
Ok(Self {
bash_parser: RefCell::new(bash_parser),
pwsh_parser: RefCell::new(pwsh_parser),
bash_redirect_query_span_idx: bash_redirect_query
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I cache these capture indices at the struct level because they vary between individual queries, and because extracting them is technically fallible (although infallible in practice, as long as there's no typo). But it's definitely a bit of an eyesore.

@woodruffw
Copy link
Owner Author

NB tree-sitter/tree-sitter#4034 -- that'll allow us to remove the streaming-iterator direct dep.

Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
@woodruffw
Copy link
Owner Author

1a1eac7 (#354) does an initial version of this -- it's not quite as clean as I would have liked, but it's probably shorter than the equivalent would be if we didn't have the query doing pre-filtering/shape checking for us.

Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Copy link
Contributor

@ubiratansoares ubiratansoares left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @woodruffw! Thanks for tagging me for input!

I was not aware of TS query capabilities, always learning I guess 😄

Imho, this approach looks amazing. I'm sold tbh, regardless the downsides!

I do agree that TS queries might be not as intituitive to augment or troubleshoot compared to walking nodes directly, but they look like a solid take for tackling more false positives we may have in the future.

Perf-wise, I think we should not bother that much for now, afterall zizmor is already pretty fast elsewhere.

Since tree-sitter/tree-sitter#4034 was already merged, perhaps you want to wait until the next tree-sitter release to merge this. Otherwise, LGTM 🚀

Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
@woodruffw woodruffw marked this pull request as ready for review December 25, 2024 19:10
@woodruffw woodruffw merged commit d1c9004 into main Dec 25, 2024
5 checks passed
@woodruffw woodruffw deleted the ww/redirect-query branch December 25, 2024 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants