-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875
Conversation
drivers/video/fbdev/bcm2708_fb.c
Outdated
} | ||
|
||
fb->cpuaddr = dma_alloc_coherent(info->device, image_size, | ||
&fb->dma_addr, GFP_ATOMIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an atomic (in time) allocation necessary? Would GFP_KERNEL not do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy/paste from the same call in vc_mem_copy
.
GFP_KERNEL would probably do (from the limited amount I know of these flags).
I'm just noticing that the call in vc_mem_copy
page aligns the size, but I see no particular reason for doing so in the generic case.
I'm happy to merge now if you are, |
Give me a few minutes to confirm again that standard image with this updated kernel does still work. I had checked yesterday, but better safe than sorry. I had also tested new kernel but with the default cma=8M (too small for a 1080P frame buffer), and that correctly reverts back. |
Tested and all seems good. That only leaves the caveat I mentioned above. I can't see a good solution to that before we extend the CMA heap. |
Extend it a little now? |
Fine with me - is 16MB sufficient? |
That's the number I was imagining. |
Can do. 16MB should be sufficient for most use cases, although still possibly too low if someone is running a 4k display (~32MB for the FB). |
Multiple FB support or transposed displays or combinations of the two would also require quite a chunk of memory - there is a circumstances where three transpose buffers are needed for the new LCD's (720p * 3) and DPI displays. Is that going to be affected by this change? |
Depends when exactly the transpose buffers are allocated. This PR is shifting the main frame buffer that the ARM knows about from being a VPU to ARM allocation. |
So it seems CONFIG_CMA_SIZE_MBYTES=5 gets rounded up to 8 by some magic. Setting it to 16 does result in a 16MiB allocation reported by the kernel. Are we happy at 16MiB and not worry about those using 4k screens? They should revert to using the reloc heap instead, so a soft fail anyway at present. |
They always have the option of adding |
Very true. |
d816acb
to
bd23d63
Compare
Squashed, and defconfigs amended. |
I think the more serious failure is if these succeed but leave CMA with an awkwardly small amount remaining causing later failures. I wonder if the firmware should indicate whether cma allocation is preferred? (i.e. new firmware are large cma size)? |
The case I'm nervous about is something like kodi with firmware 3d driver that may experience a performance hit from the remoted allocations. As such we probably want a switch between old style gpu_mem and new style cma. With the switch in the gpu_mem position cma allocation of framebuffers is probably not the best option. |
@popcornmix This patch is only touching the frame buffer driver, not remoting VPU allocations - that's waiting on https://github.com/6by9/linux/tree/rpi-4.19.y-clean2 and needs firmware changes which are up as an MR. Remote allocations is behind a config.txt flag at present, and off by default. There's near zero performance overhead with this PR as it is all controlled from the fb driver. During FB setup it does the CMA allocation (from the newly enlarged heap) and tries passing it to the VPU. If the VPU accepts it then job's done. If not it frees the CMA allocation and lets the VPU allocate.
Hence my comment of do we worry about 4k screen users, or more 2560x1440 as that allocation will just succeed in a 16MiB CMA heap. How we handle switching from gpu_mem sizing to cma sizing is one to discuss. There are several combinations of things that could result in a system that fails to boot, particularly if we're considering the bare-metal world or RiscOS (they won't have the vcsm service). |
My initial reaction was to merge this as the awkward edge cases (framebuffer alloc that succeeds but prevents booting because there is insufficient cma remaining) would be temporary until we move to the big cma world. But my kodi example suggests that the edge cases may not be temporary so we might want to think through possible solutions when we aren't in the big cma world. It seems like with this PR, I can run fbset with carefully constructed (or just unlucky) dimensions that exhaust cma and cause the kernel to grind to a halt. The same can happen on boot with a larger than 1080p monitor configured. Some ideas: |
No apparent call to do that, and certainly not through the dma API.
That could be extracted, although again I suspect not through official APIs.
Can do that. Make sure we have 6MB excess, and hope no other big users have jumped in before us? If we've done it once, then stick with ARM allocations (unless it actually fails).
The firmware doesn't know for definite. It knows that it has been asked to do them should a suitable provider connect over VCHI, but there's no guarantee that will happen.
As above, the firmware doesn't know at this early stage whether it will actually have a suitable provider for remote allocations or not. I'm not in tomorrow morning, but it sounds like it's better to have a 15minute bash this through rather than banging things back and forth here. |
Conversation had with @popcornmix. On the firmware side we will be gaining the config.txt setting It does make the full implementation of this dependent on the patch set for remote memory allocations, but that's not a big deal. All happy with that plan? |
What is the intended use of |
Or is it just a boolean to put the choice into the hands of the user (and the default value compiled into the firmware)? |
After reading the VPU-side counterpart, I'm happy with this scheme. |
Currently the VPU allocates the contiguous buffer for the framebuffer. Try an alternate path first where we use dma_alloc_coherent and pass the buffer to the VPU. Should the VPU firmware not support that path, then free the buffer and revert to the old behaviour of using the VPU allocation. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
As you've surmised, The intention is not to enable this by default until we have a sensible scheme in place to play off gpu_mem vs CMA. For the standard case we'd expect a largish CMA heap (256MB) with minimal reloc heap (<20MB). When using the firmware GLES driver there is a minor performance hit, therefore users may want to revert back to using a genuine reloc heap. |
Rebased on top of 4.19.27. |
kernel: Enable MT76 USB wifi modules See: raspberrypi/linux#2890 kernel: staging: bcm2835-codec: NULL component handle on queue_setup failure See: raspberrypi/linux#2898 kernel: f2fs: fix to skip verifying block address for non-regular inode See: raspberrypi/linux#2896 kernel: vcsm: rpi-4.19.y - gcc-v8 fixes See: raspberrypi/linux#2897 kernel: vc-sm-cma tidy ups, and one for bcm2835_codec See: raspberrypi/linux#2889 kernel: staging: vc_sm_cma: Remove erroneous misc_deregister See: raspberrypi/linux#2888 kernel: video: bcm2708_fb: Try allocating on the ARM and passing to VPU See: raspberrypi/linux#2875 firmware: smservice: Avoid a double free firmware: arm_loader/display: All the framebuffer allocation to be made on the ARM firmware: isp_tuners: Fix memory leak in error path firmware: testc: Fix dps parsing firmware: camera_subsystem: Fix hardware sync pulses off disable_camera_led firmware: Replace the remaining direct users of C(mask_gpu_interrupt[0|1]) firmware: arm_display: Kick transposer for the rainbow screen if required firmware: dispmanx: Do not allow transpose if using Full KMS See: raspberrypi/linux#2891
kernel: Enable MT76 USB wifi modules See: raspberrypi/linux#2890 kernel: staging: bcm2835-codec: NULL component handle on queue_setup failure See: raspberrypi/linux#2898 kernel: f2fs: fix to skip verifying block address for non-regular inode See: raspberrypi/linux#2896 kernel: vcsm: rpi-4.19.y - gcc-v8 fixes See: raspberrypi/linux#2897 kernel: vc-sm-cma tidy ups, and one for bcm2835_codec See: raspberrypi/linux#2889 kernel: staging: vc_sm_cma: Remove erroneous misc_deregister See: raspberrypi/linux#2888 kernel: video: bcm2708_fb: Try allocating on the ARM and passing to VPU See: raspberrypi/linux#2875 firmware: smservice: Avoid a double free firmware: arm_loader/display: All the framebuffer allocation to be made on the ARM firmware: isp_tuners: Fix memory leak in error path firmware: testc: Fix dps parsing firmware: camera_subsystem: Fix hardware sync pulses off disable_camera_led firmware: Replace the remaining direct users of C(mask_gpu_interrupt[0|1]) firmware: arm_display: Kick transposer for the rainbow screen if required firmware: dispmanx: Do not allow transpose if using Full KMS See: raspberrypi/linux#2891
Firmware update required to actually allow the allocation, but this is backwards compatible with the old firmware (reverts to using the VPU allocation).