Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix rpi library order #9097

Closed
wants to merge 1 commit into from
Closed

Conversation

cyph84
Copy link
Contributor

@cyph84 cyph84 commented Aug 8, 2021

Fix order of linking libraries to prevent the "Could not get DISPMANX objects." error.

Should fix the remaining issue in #7314

@jeeb
Copy link
Member

jeeb commented Aug 8, 2021

I think you need to rebase this on top of master, since this seems like it is on top of 0.33.1 :)

Additionally, it seems like the commit contains two separate changes. One adds additional logging, another is the wscript change.

@cyph84 cyph84 force-pushed the fix-rpi-library-order branch from 051c35d to e2fdca0 Compare August 8, 2021 13:46
@cyph84
Copy link
Contributor Author

cyph84 commented Aug 8, 2021

Oops about the logging. Sorry was having a headache after debugging this one.

Rebased and re-pushed

@Dudemanguy
Copy link
Member

This would make mmal always required for rpi. Does simply moving the --rpi-mmal check before the --rpi check and removing the rpi dependency work? Sure it would look weird but if that fixes the problem, just add a comment in the code explaining why.

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 17, 2021

@Dudemanguy tried that, and it still produces the wrong library order. Not sure how waf is determining link order here.

--rpi-mmal really does depend on --rpi. --rpi-mmal without --rpi gives me "No OpenGL video output found or enabled. Aborting. If you really mean to compile without OpenGL video outputs use --disable-gl. "

--rpi indeed does not depend on --rpi-mmal but the --gpu-context=rpi produces the same "Could not get DISPMANX objects." error and seems to be broken for quite a while (#5405).

In short --rpi-mmal seems to be the only working config to use the Broadcom libraries now anyway. Keeping them independent is nice, but has no practical benefits right now.

Edit: --gpu-context=rpi only works if --rpi-mmal is linked in for some reason. It just feels to me like all the broadcom libs is one giant cyclic dependency clusterfuck.

@Dudemanguy
Copy link
Member

I guess this is okay. It feels like a hack but the entire broadcom lib thing is a mess anyway so who knows who's really screwing up here (the libs? mpv? waf?). I don't believe any of the devs use rpi and/or have any willingness to dig into the issue deeper so this is probably the best way for now (no working mmal seems quite bad).

Anyways just a couple of minor nits. Could the --rpi-mmal part not be moved and just left where it is? Might as well just leave it in the hwaccel section where it belongs. Also could the commit message be more descriptive about what's actually going on here. The prefix should be build: (since this is a build-related change). Maybe something like build: check for mmal before brcmegl on rpi.

@cyph84 cyph84 force-pushed the fix-rpi-library-order branch from e2fdca0 to 0f9e235 Compare August 17, 2021 14:59
@cyph84
Copy link
Contributor Author

cyph84 commented Aug 17, 2021

Updated the commit message and rebased.

It's unfortunate that we still got to deal with this mess to get decent HD playback on the RPI. wm4 started this code but seemed to have lost interest and his hardware somewhere along the way.

@Dudemanguy
Copy link
Member

Did you want to fix the authorship? The author and committer are different.

libbrcmEGL.so was linked before libmmal.so which somehow made the
DISPMANX library unhappy, resulting in "Could not get DISPMANX objects."
error in vo_rpi preinit().

This commit restores the exquisite link order that was lost in
db09d77.
@cyph84 cyph84 force-pushed the fix-rpi-library-order branch from 0f9e235 to eed70b4 Compare August 17, 2021 16:17
@cyph84
Copy link
Contributor Author

cyph84 commented Aug 17, 2021

Ah yes thanks for pointing that out.

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 19, 2021

I just found an old commit by wm4 trying to wrangle this same problem: 72a8120

Previous attempt to use pkg-config to configure the Broadcom MMAL/EGL caused problems due to the order they were linked in.

@avih
Copy link
Member

avih commented Aug 19, 2021

While I've never tried to build for the RPI, aren't all these "special fix for this or that" highly dependent on the actual distribution?

For instance, how can we tell that the changes in this PR don't break the build for any of the main RPI distributions? In which environments was this patch tested?

I mean, clearly the last change which you referenced in your commit message DID fix it for some people, right? so now it will break for them again?

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 19, 2021

The build / code was already broken for multiple MPV releases. It seemed that with multiple attempts at cleaning up the build code, the same bug was fixed and re-introduced a couple of times.

I don't think it's distro specific, all RPI distros use the same Broadcom libraries. At some point, this need to load libraries in this specific order was introduced into the Broadcom libraries.

The official Raspbian distribution doesn't build with --vo=rpi so this should not affect them at all. With RPI4 apparently being able to do HD decode using more standard interfaces (Mesa / v4lm2m) interest in keeping this working is understandably waning.

@avih
Copy link
Member

avih commented Aug 19, 2021

So can you specify in which distributions this is going to fix things, which distributions will break due to this, and which distributions were and remain broken? (why not fix it for them too?)

As I mentioned, I'm not familiar withthe RPI ecosystem, but I think it would be best if we can ensure that mpv can build nicely out of the box for the main RPI distributions (which are those? I'm guessing at least Raspbian?)

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 19, 2021

Yes I think I can do a test on Raspbian. Right now have tested it against Arch with the latest Broadcom libs.

It won't be possible to test against all distros out there.

@avih
Copy link
Member

avih commented Aug 19, 2021

It won't be possible to test against all distros out there.

So which distributions you suggest we should support?

I presume you're using Arch, so clearly you'd think we should support it, but what others?

Can you list the distributions you think are the most popular for the RPI?

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 19, 2021

We should definitely support Raspbian (since its the offiically supported distro by the RPI Foundation) and the rest of the distros can adapt their builds from there.

@avih
Copy link
Member

avih commented Aug 19, 2021

the rest of the distros

Can you give a clear minimal list of the RPI distributions which you think cover at least 80% of RPI users?

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 20, 2021

A quick google doesn't give me much info on market share of the various RPI distros unfortunately.

I think this is much less sensitive to distros and much more due to the release of RPI firmware and supporting libraries / headers (https://github.com/raspberrypi/firmware)

Various attempts to cleanup/fix this issue was also complicated by distros updating this raspberrypi-firmware package and probably waf versions and not cleaning pycache dirs used by waf (I've been bitten by this while trying to bisect this issue)

@avih
Copy link
Member

avih commented Aug 20, 2021

Please answer all the questions below. Feel free to combine answers, but do make sure that all questions are answered, where applicable.

A quick google doesn't give me much info on market share of the various RPI distros unfortunately.

Well, I'd really prefer we'd be able to declare support by some criteria. Distributions is what I had in mind, and personally I'd think these should be, if possible: Debian, Arch, Fedora (assuming all of them support the RPI, which I think they do). Prefrably the stable versions (except Arch), but if that's not possible then we'd still need to know where the code is expected to work.

But if you can't come up with such list or think that it's not a good criteria then we'd need a different one, though I'd still prefer to be able to declare supported distros too.

Regardless, I think Raspbian (stable? I'm unfamiliar with its release policy) should always be supported. Do you agree?

I think this is much less sensitive to distros and much more due to the release of RPI firmware and supporting libraries / headers (https://github.com/raspberrypi/firmware)

Are there release numbers/versions? are they listed someplace? Can we declare which releases are supported?

Are all RPI distributions expected to use the binaries from this reposiroty? Do we know which distros use which versions of the libs? Does it depend on the kernel version and libs versions independently? or are they tied together?

We have to assume the change in db09d77 which was made in 2019 did work, at least for some use cases, right?

Can we identify the use cases or releases where the current code works, and where it doesn't work?

Supporting "latest" is great, but this shouldn't come at the cost of breaking everything which isn't latest, especially if there's a considerable percentage of RPI users who don't have "latest". Which is really why we need to have some estimation of which versions are in use out there, and need to know what's the cutoff point of support in terms of whatever criteria we come up with, and preferably with this cutoff point automated/encoded as version numbers in wscript .

Is there an official (or close enough) thing which states the required libs order? If not, how did you reach the "order" in this PR? Preferably it would be more than blindly messing about till it works.

How is the link order derived from the construction of func in wscript? is this a waf thing or an mpv waf extension? is it documented and/or with existing examples elsewhere at the current wscript?

probably waf versions and not cleaning pycache dirs used by waf

Can you expand on this? I don't immediately see how this can relate to the linking order. Are there versions of waf which end up with a different link order for the same wscript? Do we know which versions do what? Is it documented?

What's the "pycache issue" and how does it relate to the link order? What's the solution to this issue? Should we add a wiki entry which describes it?

This PR does more than just "changing order". Currently we have RPI gpu context support, which, as far as I can tell, can be enabled independently, and rpi-mmal hw accelaration which depends on the gpu context. But in this PR you're basically making both depend on exactly the same conditions.

Why is that? Can we keep the the gpu context independent from the hwaccel? If not (why?), is there a point in keeping both rpi and rpi-mmal configure options?

@cyph84
Copy link
Contributor Author

cyph84 commented Aug 25, 2021

Regardless, I think Raspbian (stable? I'm unfamiliar with its release policy) should always be supported. Do you agree?

Yes. I think we should support Raspberry Pi OS based on Buster. This is the image you get when you download from https://www.raspberrypi.org/software/operating-systems/. The previous was Stretch (superseded on 2019-06-20 according to https://downloads.raspberrypi.org/raspios_full_armhf/release_notes.txt)

IMO we should just support Raspberry Pi OS because:

  1. First distro to get tested by the hardware engineers
  2. Every other distro refers to it to see how things are integrated

Are there release numbers/versions? are they listed someplace? Can we declare which releases are supported?

No version numbers, the packages are tagged with a date instead of version number.

The source for the userland libraries and headers (and pkgconfigs) are here https://github.com/raspberrypi/userland. From what I can tell this is the component that causes the most issues from mpv's git history.

A pre-compiled kernel, pre-compiled userland, GPU firmware (closed source, proprietary and running on a proprietary CPU architecture) and Linux kernel are versioned together here https://github.com/raspberrypi/firmware.

Additionally more adventurous users will use rpi-update to roll back and forth kernel, GPU firmware, and userland libs/headers. Very useful for bisecting problems.

We can support whatever is on Buster's repositories at this moment plus whatever rpi-update brings (to anticipate upcoming changes).

Are all RPI distributions expected to use the binaries from this reposiroty? Do we know which distros use which versions of the libs? Does it depend on the kernel version and libs versions independently? or are they tied together?

Kernel / userland are supposed to be backward compatible in theory...

All rpi distros use binaries from this repo, even installing them in the same non-standard location (/opt/vc/).

We have to assume the change in db09d77 which was made in 2019 did work, at least for some use cases, right?

Yes it probably did. Unfortunately I can't even find a pull request to give me more context for that change. I am not sure if all permutations of --rpi/--rpi-mmal were tested.

Furthermore it is when bisecting around this commit that I realized it is important to remove your pycache directories! I thought that commit did not cause a break when it actually did. So it might have seemed to work due to unclean pycache. Shrugs.

Can we identify the use cases or releases where the current code works, and where it doesn't work?

It's possible if you really want to root cause this to a particular https://github.com/raspberrypi/firmware or https://github.com/raspberrypi/userland commit. I would do this when I get more time.

Supporting "latest" is great, but this shouldn't come at the cost of breaking everything which isn't latest, especially if there's a considerable percentage of RPI users who don't have "latest". Which is really why we need to have some estimation of which versions are in use out there, and need to know what's the cutoff point of support in terms of whatever criteria we come up with, and preferably with this cutoff point automated/encoded as version numbers in wscript .

Unfortunately the relevant libraries do not have usable version numbers in their pkgconfig. No changes to these files with every release: https://github.com/raspberrypi/userland/commits/master/pkgconfig

Is there an official (or close enough) thing which states the required libs order? If not, how did you reach the "order" in this PR? Preferably it would be more than blindly messing about till it works.

By bisecting to db09d77 (while thoroughly confusing myself with the pycache issue) and then asking myself "what is the differences between these two binaries?"

I can't find anything on this mmal before egl rule. However I know it used to work because I have a build of mpv-0.24.0 linked against firmware 20170301 that worked with libEGL.so before libmmal_core.so.

697c438 and then 72a8120 strongly suggests the issue was introduced in Broadcom's libs shortly after that in 2017.

How is the link order derived from the construction of func in wscript? is this a waf thing or an mpv waf extension? is it documented and/or with existing examples elsewhere at the current wscript?

Sorry I am new to waf, and Python is a second language to me. I don't think other libraries are this sensitive to link order therefore I might be exploting an undocumented waf feature.

Can you expand on this? I don't immediately see how this can relate to the linking order. Are there versions of waf which end up with a different link order for the same wscript? Do we know which versions do what? Is it documented?

What's the "pycache issue" and how does it relate to the link order? What's the solution to this issue? Should we add a wiki entry which describes it?

waf seems to link differently depending on whether you clean your pycache dirs:

git co db09d77^1
./waf configure && ./waf clean && ./waf build
# ./build/mpv is good

git co db09d77
./waf configure && ./waf clean && ./waf build
# ./build/mpv is still good

git clean -xdf
./waf configure && ./waf clean && ./waf build
# ./build/mpv has bad link order

This PR does more than just "changing order". Currently we have RPI gpu context support, which, as far as I can tell, can be enabled independently, and rpi-mmal hw accelaration which depends on the gpu context. But in this PR you're basically making both depend on exactly the same conditions.

--rpi without --rpi--mmal builds, but as far as I can tell it gives the same error. (details in previous comment: #9097 (comment))

This is rather bizarre. I can find no reason why EGL/GLESv2 should depend on MMAL being linked in. The error is in the DISPMANX APIs which is supposed to be usable independently without MMAL or EGL/GLESv2.

Why is that? Can we keep the the gpu context independent from the hwaccel? If not (why?), is there a point in keeping both rpi and rpi-mmal configure options?

Just to clarify, MMAL is used for rendering as well as decode. I am not familiar enough with the vo=gpu abstractions to answer this question unfortunately.

@avih
Copy link
Member

avih commented Oct 26, 2021

Many apologies for not responding earlier. I did read your reply carefully at the time, and I did find it (finally) on point, so thank you for that.

However, the facts did leave many open questions (which is not your fault), and I didn't have the kind of energy needed to answer these questions in order to decide how to proceed, and it doesn't help than none of the mpv developers use a RPI on a regular basis...

So, this is still roughly where it stands, still. If someone wants to suggest a meaningful way forward while taking the facts above into account - it could help.

@cyph84
Copy link
Contributor Author

cyph84 commented Oct 26, 2021

It is fixed in #8840 Once that PR is merged, this would not be relevant.

Meson seems to have more explicit control over linking order and relationship between configuration flags and library tests. It's much more suited to deal with corner cases like this.

@Dudemanguy
Copy link
Member

Since meson support is now finally official, I think I will close this now. As mentioned in the comments, this PR is more of a workaround/hack for what is (probably) some strange waf bug. Of course, clean fixes to the waf build are welcome but I think for now the proper workaround for your usecase is "just build with meson".

@Dudemanguy Dudemanguy closed this Nov 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants