Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in cc_binary(linkshared=True) starting with 6.3.0 #20252

Closed
layus opened this issue Nov 17, 2023 · 9 comments
Closed

Regression in cc_binary(linkshared=True) starting with 6.3.0 #20252

layus opened this issue Nov 17, 2023 · 9 comments
Assignees
Labels

Comments

@layus
Copy link
Contributor

layus commented Nov 17, 2023

Description of the bug:

We have an admittedly very hacky cc_binary(linkshared=True) rule that used to produce a valid shared library to be used as a native python module. When updating Bazel (from 7.0.0-pre.20230123.5 to 7.0.0rc2), we discovered that python refused to load it because of undefined symbols. The binary also became 5 times bigger. That binary is based on several static libraries (.a), and a fat, kitchen-sink-and-everything shared library (.so) generated with cc_shared_library. Let's call it fatlib.

The only change is that the shared library now appears after the static libraries on the command line, while it used to appear first:

# Manually redacted diff
diff --git a/home/layus/1.params b/home/layus/2.params
index 022286d2ec7..34bcf860161 100644
--- a/home/layus/1.params
+++ b/home/layus/2.params
@@ -4,11 +4,11 @@
[ ... some rpath changes due to lib_Usomewhere_Sfatlib_Slibfatlib.so -> _Usomewhere_Sfatlib_Slibfatlib/libfatlib.so rename ]
@@ -17,13 +17,9 @@
 -rpath
 -Xlinker
 $ORIGIN/external/library/relative/path
--Lbazel-out/target_platform/bin/_solib_x86_64
+-Lbazel-out/target_platform/bin/_solib_x86_64/_U_Usomewhere_Sfatlib
 -Lbazel-out/target_platform/bin/_solib_x86_64/
 -Wl,--start-group
--l_Usomewhere_Sfatlib_Slibfatlib
 -Wl,-whole-archive
 bazel-out/target_platform/.../this_target_inputs.lo
 -Wl,-no-whole-archive
@@ -76,6 +72,10 @@
 bazel-out/target_platform/.../some/objects_lib.a
 bazel-out/target_platform/.../another/objects_lib.a
 external/.../libfoo.a
+-lfatlib
 -Wl,--end-group
 -lpthread
 -lrt

I have checked that running the command with the order fixed creates a valid native python module, and that it is the only change needed. I have also bisected bazel itself and found 946777a to be the culprit, and therefore the issues first appears with 6.3.0 for us.
However, this is a backport of many changes. On the main branch, the commit that makes it fail for us is a772452, which is indeed changing the logic around arguments order. I do not know why it used to work, nor why it fails with the new code. I also have found no way to force the order I need in this particular case.

The rule invocation looks like

cc_shared_library(
    name = "fatlib",
    roots = ...
)

custom_cc_binary(
    name = "python_native_module.so",
    src = [ srcs/** ],
    deps = [ other `cc_library` ],
    dynamic_deps = [ ":fatlib" ],
)

# somewhere.bzl
def custom_cc_binary(name, srcs = None, deps = None, dynamic_deps = None):
    native.cc_library(
        name = "intermediate_lib",
        srcs = srcs,
        deps = deps,
    )

    native.cc_binary(
        name = name,
        deps = ["intermediate_lib"],
        linkshared = True,
        dynamic_deps = dynamic_deps,
    )

/cc @oquenchil as these commits are yours, and you are probably the most knowledgeable person around this code.
/cc @sluongng and @fmeum just FYI

Which category does this issue belong to?

C++/Objective-C Rules

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

I think the order change should be easy to reproduce, but for that order to have such an impact you probably need our weird setup.

Which operating system are you running Bazel on?

Ubuntu

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

a772452 when bisecting along the master branch, but

946777a is the one that will appear when bisecting between 6.0.0 and 6.4.0

Have you found anything relevant by searching the web?

#19920 Seemed at first related, but this is a different issue.

See also my discussion about this issue on slack: https://bazelbuild.slack.com/archives/CA31HN1T3/p1700223322017839

Any other information, logs, or outputs that you want to share?

No response

@fmeum
Copy link
Collaborator

fmeum commented Nov 17, 2023

Cc @buildbreaker2021

@fmeum
Copy link
Collaborator

fmeum commented Nov 17, 2023

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 17, 2023
@iancha1992 iancha1992 added the team-Rules-CPP Issues for C++ rules label Nov 17, 2023
@iancha1992
Copy link
Member

@bazel-io fork 7.0.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label Nov 17, 2023
@oquenchil
Copy link
Contributor

The culprit commit is doing the right thing as far as I can tell.

The logic works by considering the exports of the cc_shared_library in dynamic_deps and the cc_libraries in the deps of the current target (the current target in your case being the cc_binary(linkshared=1) target. If none of the exports from fatlib are in the cc_library graph reachable from cc_binary.deps then it defaults to adding the cc_shared_library at the end.

To guide the logic into placing the fatlib shared library at the beginning what you can do is to list one of the roots at the beginning of cc_binary.deps.

cc_library(
    name = "fatlib_lib",
    srcs = ["fatlib_lib.cc"]
)
cc_shared_library(
    name = "fatlib",
    deps = ["fatlib_lib"] # Intentionally renamed roots to deps, roots is deprecated. Just a name change.
)
native.cc_binary(
    name = name,
    deps = [
         ":fatlib_lib",
         ":intermediate_lib",
    ],
    linkshared = True,
    dynamic_deps = [
          ":fatlib",
    ]
)

fatlib_lib won't be linked statically, the shared library will be used and it will be placed at the beginning.

That solution would be more in line with how cc_shared_library is meant to be used. If you don't like it, an alternative would be to not to use dynamic_deps from cc_binary(linkshared=True) and instead pull out the shared library artifact via a filegroup and cc_import. Place the cc_import target first in the list of deps of the cc_binary. See here.

cc_shared_library is meant to replace cc_binary(linkshared=True), so interactions between the two aren't very thoroughly tested, if the first solution happened to work in your case then it is mostly because the codepath is the same as for a regular cc_binary but it's not something that would be strictly supported.

In fact, that ship has probably sailed but I should have forbidden cc_binary(linkshared=1) with non-empty dynamic_deps.

@meteorcloudy
Copy link
Member

@layus Sounds like this is an intentional breaking change in Bazel 7, does @oquenchil 's suggestion work for you?

@layus
Copy link
Contributor Author

layus commented Nov 23, 2023

Thanks @oquenchil,

I am not saying that the code is doing the right or the wrong thing. I understand that the point is to provide a better interface, which is a much welcome change.

As a colleague of mine suggested, it may be worth explaining the underlying logic and assumptions of cc_shared_library. C/C++ is very old, and there are plenty of usages that a built-in rule set cannot cover by default. It would help if the opinionated view taken here was detailed. And the implementation explained somewhere. We both tried to understand the code of cc_shared_library, but stopped without any good insights.

The solution above seems very elegant and simple, but I would never have figured that by myself.

My use case relies more on cc_import rules, and I link shared libraries without any corresponding library. It would never have occurred to me to add the same library twice in both deps and dynamic_deps. Even more so given the number of times I hit issues with symbols defined multiple times.
The second solution seems more natural to me, but a bit less elegant.

cc_shared_library is meant to replace cc_binary(linkshared=True), so interactions between the two aren't very thoroughly tested, if the first solution happened to work in your case then it is mostly because the codepath is the same as for a regular cc_binary but it's not something that would be strictly supported.

With external rule sets, we do not always control all the code paths. I bet there will be a lot of interactions during the transition.

@ layus Sounds like this is an intentional breaking change in Bazel 7, does @ oquenchil 's suggestion work for you?

@meteorcloudy I re-implemented the whole thing in the meantime, which could be considered a good thing. So I have fixed the problem for my use case, and the solutions above probably work. Will see if I get time to test them.

However, if you want to know if it is an acceptable solution for me, I would say cc_shared_library introduces it's own model of shared libraries and requires users to get acquainted with it on top of their vanilla c/c++ knowledge. So it definitely deserves a lot of documentation. And this issue contains intel I could not find anywhere else. The solution is thus perfectly acceptable if it gets documented somewhere.

@meteorcloudy
Copy link
Member

@layus Thanks for confirming, then I'll remove this as a release blocker for 7.0.0.

For cc_shared_library, we do have the user documentation here: https://bazel.build/reference/be/c-cpp#cc_shared_library, but it's definitely going to be helpful to have more docs about the implementation details for more involved use cases . Feel free to send PRs to help us improve it!

@meteorcloudy
Copy link
Member

Since this is WAI, I'll close this one.

@meteorcloudy meteorcloudy closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2023
@layus
Copy link
Contributor Author

layus commented Nov 23, 2023

For cc_shared_library, we do have the user documentation here: bazel.build/reference/be/c-cpp#cc_shared_library, but it's definitely going to be helpful to have more docs about the implementation details for more involved use cases . Feel free to send PRs to help us improve it!

I would love to contribute documentation, but I am unable to do so because I do not understand how it works (yet). That is kind of the issue here really.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants