Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macros can break Tree-Sitter parsing #30

Closed
narnaud opened this issue Jun 19, 2024 · 6 comments · Fixed by #116
Closed

Macros can break Tree-Sitter parsing #30

narnaud opened this issue Jun 19, 2024 · 6 comments · Fixed by #116
Assignees
Labels
🪲 bug Something isn't working
Milestone

Comments

@narnaud
Copy link
Member

narnaud commented Jun 19, 2024

Example:

class AFX_EXT_CLASS CPrjDbase6 : public CDialog
{}; 

This is parsed entirely incorrectly:
image-2024-03-01-10-46-55-138

It's a known bug in tree-sitter-cpp, that doesn't seem to have an easy solution.
tree-sitter/tree-sitter-cpp#85

It would be good to be able to fix this from within Knut somehow.

@narnaud narnaud added the 🪲 bug Something isn't working label Jun 19, 2024
@narnaud narnaud added this to the Release 1.0.0 milestone Jun 19, 2024
@LeonMatthesKDAB
Copy link
Collaborator

I guess the only real solution here is to run the preprocessor in some capacity.
We could probably fake this by allowing you to add your own definitions which would allow us to at least emulate simple macros by doing a text-replace before we pass the code to tree-sitter...

e.g. AFX_EXT_CLASS would probably just be declared as an empty string, so wouldn't show up in tree-sitter.

What would probably be easier for the beginning is to simply run a text-replace in the scripts first that removes these kinds of macros before doing any tree-sitter transformations/queries.

@narnaud
Copy link
Member Author

narnaud commented Jul 9, 2024

That was my idea too: remove everything that is between class XXXXX Word: using a regexp, and re-adding after the script.
It will need to be done explicitely though, as I don't think we could do that for the user.

Something along those lines:

headerDocument.cleanExport()
..
..
..
headerDocument.restoreExport()

It's not really fixing the issue, but workarounding it :)

@LeonMatthesKDAB
Copy link
Collaborator

Ah, hm, I was thinking that we can just remove the macro when we're porting to e.g. Qt.
But of course you may want to run a script that doesn't do a full transform, so it would be no use there. Tricky 🤔

@narnaud
Copy link
Member Author

narnaud commented Jul 9, 2024

You can't remove the export macro as you still need it on Windows, even with Qt.
As a first step, to mitigate the issue, we could warn people when we detect an export macro (should be easy with a small regexp). A spdlog::warning or even spdlog::error when using API that breaks due to that (symbol / queryClass/queryMember).

That would be an acceptable fix for v1.

@LeonMatthesKDAB LeonMatthesKDAB self-assigned this Jul 16, 2024
@LeonMatthesKDAB
Copy link
Collaborator

New idea: We can actually include/exclude certain ranges from the parsing.
See: https://tree-sitter.github.io/tree-sitter/using-parsers#multi-language-documents
For many macros it should be enough to just skip them during parsing, so let's try that.
We could then even add them back as a custom TSLanguage if we wanted to.

@LeonMatthesKDAB
Copy link
Collaborator

TODO: Put the macros to filter out in a setting. We may need to create a CppSettings subclass for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪲 bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants