Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Clang interface to use C++ libtooling API instead of libclang #1740

Closed
wants to merge 88 commits into from

Conversation

rinon
Copy link
Contributor

@rinon rinon commented Mar 3, 2020

The libclang API doesn't expose enough information about the AST for what bindgen really needs, which has spawned numerous workarounds and bugs in bindgen. This PR replaces the generic libclang interface with a customized C wrapper around the Clang C++ API, which can simply expose everything Clang knows about the AST. This should allow us to fix some long outstanding issues (e.g., #380, #778) as we can directly query for the correct struct layouts and C++ calling conventions.

This PR only replaces the libclang interface with new, functionally equivalent bindings that should function identically to the libclang-9 API. It does not fix any bugs or change functionality, and should result in no functional changes modulo differences in behavior between libclang-9 and previous versions.

Using the clang libtooling libraries rather than libclang means that we now link against the individual clang libraries for each module (e.g. libclangAST, libclangBasic, etc.). In most LLVM builds, these libraries are built and distributed as static libraries, but we support both static and dynamic linking, in case they are available as shared libraries. I haven't replicated the clang-sys crate's option for dynamic loading of libclang.so at run time, so we have to link against a particular set of clang libraries at build time. Since most LLVM distributions ship these libraries as static libraries, in most cases we couldn't defer library linking to run time, even if we implemented support for it.

This resolves #297.

rinon added 30 commits February 15, 2020 00:52
Bindgen builds and runs now, but does not produce any bindings yet.
We need to compare on variant contents, not just on variant equality
Bindgen currently assumes that it won't see these types from libclang.
Added some misc CXXBaseSpecifier accessors and fixed visiting CXXBaseSpecifiers
Needed for compatibility with latest Clang
We should return true to continue traversal at invalid nodes. We should also
recurse through Stmt nodes that aren't Exprs, since we don't handle them in
bindgen.
rinon added 3 commits March 19, 2020 17:46
We can't pass C++ objects by value from C++ to Rust. QualType has an
interface for converting back and forth to void*, so we can use that
instead.
@rinon
Copy link
Contributor Author

rinon commented Mar 26, 2020

I've been working on and discussing distribution of Windows C++ LLVM+Clang libraries on the LLVM mailing list. Haven't gotten many responses there yet, but I'm continuing to try. It's still odd to me that they ship the static C++ libraries for *NIX but not for Windows.

In the mean time, I've thought of a few other options. One option, as you suggested might be for rustc to distribute static versions of the clang and LLVM libs. I think this would substantially increase the size of the compiler packages, so I'm not sure if that would be viable. Another option is to host a Windows build of LLVM+clang suitable for linking C++ projects. This could be a fallback option in case the LLVM release maintainers are opposed to adding the libs to the official packages.

Just brainstorming, I wonder if it would be possible to maintain both a libclang and a clang tooling frontend interface, and just get less information if only libclang was available. That sounds like a maintenance nightmare to me, personally.

That maybe isn't that crazy as it sounds, if done right... We'd basically replace libclang.so with our own, and add new bindgen APIs... Then we can just use the existing code with all the is_loaded() shenanigans, potentially. So it wouldn't really look that much different than a native "improved" libclang. It really depends on which kinds of changes we want to do, if we want to add a bunch of extra stuff it might become a pain... But maybe that's a good path forward?

The more I think about this option the less it appeals. I'm just afraid of the maintenance burden of bugfixes for both APIs and a disjoint set of features and issues depending on how you link against clang. I'm happy to take on some of the maintenance of this new C++ API and associated features, but supporting both might be too much of a time sink. That said, I'm very interested in improving bindgen support for C++, so if all else fails this is still something I'm willing to try.

Just wanted to keep the PR updated, things are still moving on my side.

@bors-servo
Copy link

☔ The latest upstream changes (presumably d5cdad2) made this pull request unmergeable. Please resolve the merge conflicts.

@DemiMarie
Copy link

Is there anything people could do to get this merged?

@kulp
Copy link
Member

kulp commented Jun 11, 2022

Is there anything people could do to get this merged?

Fixing the existing merge conflicts (thus allowing the test suite to run again) would be a nice start.

kulp added a commit to kulp/rust-bindgen that referenced this pull request Jun 17, 2022
This reverts commit c462892.

This turned out not to be used, and causes trouble with rust-lang#1740.
kulp added a commit to kulp/rust-bindgen that referenced this pull request Jun 24, 2022
This reverts commit c462892.

This turned out not to be used, and causes trouble with rust-lang#1740.
@kulp kulp mentioned this pull request Jun 24, 2022
kulp added a commit to kulp/rust-bindgen that referenced this pull request Jul 22, 2022
This reverts commit c462892.

This turned out not to be used, and causes trouble with rust-lang#1740.
kulp added a commit to kulp/rust-bindgen that referenced this pull request Jul 22, 2022
This reverts commit c462892.

This turned out not to be used, and causes trouble with rust-lang#1740.
@amanjeev
Copy link
Member

amanjeev commented Aug 9, 2022

Hello everyone who has been involved with this PR thus far - we (@pvdrz and I) are beginning to work on some tasks in the project as a part of our contract. This PR is one of the first things we are asked to look at. We want to do a simple check if anyone is still working on this PR that we are unaware of. We do not want to step on anyone's progress and work. Please let us know if:

  • we can take over this PR
  • there is some pending working still yet to be pushed by anyone here
  • any details anyone would like to add

Thank you!

@rinon
Copy link
Contributor Author

rinon commented Aug 9, 2022

Hey @amanjeev, good to hear there's interest in reviving this. I basically dropped this because I couldn't find a good answer for how to handle windows. There seemed to be a fairly strong argument that we should be able to use the official LLVM windows builds, but those don't ship with a C++ libtooling DLL afaik. The fundamental reason for that is that windows has a hard cap on the number of exported symbols in a DLL that libtooling exceeds. So, where I left this was that we need to go and cut the number of symbols exported there down to fit in a DLL and then possibly maintain both a libclang and a libtooling bindgen interface for awhile until libtooling is widely available.

I'm happy to help out here as well, but I didn't see a good path forward. You're welcome to push on this, maybe you can get further than I have. Keep me posted on potential plans and I'll see what I can do to help.

@kulp
Copy link
Member

kulp commented Aug 9, 2022

We want to do a simple check if anyone is still working on this PR that we are unaware of.

@amanjeev thanks for checking in, and thanks @rinon for all the work you did to this point.

In #2229 I resolved a number of conflicts and got tests passing with LLVM 9 on macOS and Linux; if you find that branch useful, feel free to do whatever you like with it, including nothing. I am unlikely to work on it further myself.

As far as I am concerned, you can go forth and conquer!

@DemiMarie
Copy link

Regarding Windows: why does libtooling export so many symbols? This seems like it would be solved with proper use of -fvisibility=hidden. Perhaps bindgen should just statically link to libtooling on Windows?

@pvdrz
Copy link
Contributor

pvdrz commented Feb 10, 2023

given that we now have the experimental feature we could put this behind it and keep using libclang on windows until we find a good solution. However this would mean that we would be maintaining the new and old clang modules for a while.

@amanjeev
Copy link
Member

amanjeev commented Mar 9, 2023

given that we now have the experimental feature we could put this behind it and keep using libclang on windows until we find a good solution. However this would mean that we would be maintaining the new and old clang modules for a while.

I agree, putting this behind the experimental will be very helpful.

I finally got around to asking this in Rustlang Zulip. In preliminary messages it is noted that -

Hm... llvm have a script for reducing the number of exported symbols. It was very recently patched to strip out "about 5k symbols" in clang-repl. Not sure if/how that script affects libtooling (I'm not familiar with llvm's build process) but it should be possible to contribute any fixes to llvm.
Rust does ship an llvm-tools component though that seems to be for bins atm. I guess in theory we could have an llvm-libs component too but I'm not really the person to ask about that.

I wonder @rinon, if it is worth checking again. Meanwhile, I will see if there is more information on this from Rustlang community.

Update (April 27, 2023): Moved the message on Zulip to t-infra for this question of shipping llvm-libs. https://rust-lang.zulipchat.com/#narrow/stream/242791-t-infra/topic/LLVM.2C.20Libtooling.20DLL.2C.20Windows

@DemiMarie
Copy link

This PR should be reopened.

@DemiMarie
Copy link

@rinon: What is the reason for dynamically linking to libtooling on Windows? To me it seems that the simplest approach is to link statically.

@rinon
Copy link
Contributor Author

rinon commented Nov 4, 2023

Static linking is fine, but the windows LLVM binary distribution didn't include any static libs at the time and they still do not ship any C++ libraries on windows. So the issue comes down to an easily accessible way for users to build bindgen on windows without having to compile Clang and libtooling from sources.

Thinking about this again, it might be viable to ask the LLVM maintainers distributing the windows binaries to include C++ clang static libs? There was also the idea floated of including C++ libtooling as a binary distribution in rustup/toolchain libs.

@kulp
Copy link
Member

kulp commented Nov 4, 2023

This PR should be reopened.

I suspect (not having talked to @pvdrz though) that the PR was closed automatically by GitHub because the target branch was deleted pursuant to #2286 and #2669.

I think the PR would have to be re-created at this point.

Screenshot 2023-11-04 at 17 35 28

@DemiMarie
Copy link

Static linking is fine, but the windows LLVM binary distribution didn't include any static libs at the time and they still do not ship any C++ libraries on windows. So the issue comes down to an easily accessible way for users to build bindgen on windows without having to compile Clang and libtooling from sources.

Why is compiling clang and libtooling from source a problem? Could a script that downloads a prebuilt binary be used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider rewriting the parsing back-end as a clang plugin instead of using libclang.
8 participants