video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875

6by9 · 2019-02-27T17:39:41Z

Firmware update required to actually allow the allocation, but this is backwards compatible with the old firmware (reverts to using the VPU allocation).

pelwell · 2019-02-27T19:46:44Z

drivers/video/fbdev/bcm2708_fb.c

+		}
+
+		fb->cpuaddr = dma_alloc_coherent(info->device, image_size,
+						 &fb->dma_addr, GFP_ATOMIC);


Is an atomic (in time) allocation necessary? Would GFP_KERNEL not do here?

Copy/paste from the same call in vc_mem_copy.
GFP_KERNEL would probably do (from the limited amount I know of these flags).

I'm just noticing that the call in vc_mem_copy page aligns the size, but I see no particular reason for doing so in the generic case.

drivers/video/fbdev/bcm2708_fb.c

pelwell · 2019-02-28T10:18:15Z

I'm happy to merge now if you are,

6by9 · 2019-02-28T10:28:18Z

Give me a few minutes to confirm again that standard image with this updated kernel does still work. I had checked yesterday, but better safe than sorry.

I had also tested new kernel but with the default cma=8M (too small for a 1080P frame buffer), and that correctly reverts back.
I guess we do now have the potential for the new kernel, cma=8MB, and eg a 720P frame buffer (3.6MB) that would boot and allocate the majority of the CMA heap at boot. Other functions could then be compromised. I don't know the best route out of that one. AFAIK there are no calls to check the size of the dma_coherent heap (there is cma_get_size right under the hood), so no easy way to check.
If we're increasing the default cma heap size in the very near future to ditch the reloc heap, then I guess we don't worry too much.

6by9 · 2019-02-28T11:04:03Z

Tested and all seems good.

That only leaves the caveat I mentioned above. I can't see a good solution to that before we extend the CMA heap.

popcornmix · 2019-02-28T11:26:35Z

Extend it a little now?

pelwell · 2019-02-28T11:37:01Z

Fine with me - is 16MB sufficient?

popcornmix · 2019-02-28T11:37:39Z

That's the number I was imagining.

6by9 · 2019-02-28T11:39:53Z

Extend it a little now?

Can do.
I've lost track as to where the default heap size is specified. It's not apparently CONFIG_CMA_SIZE_MBYTES as that is 5 (default is 8), but I also can't see it quickly in device tree.

16MB should be sufficient for most use cases, although still possibly too low if someone is running a 4k display (~32MB for the FB).

JamesH65 · 2019-02-28T12:15:30Z

Multiple FB support or transposed displays or combinations of the two would also require quite a chunk of memory - there is a circumstances where three transpose buffers are needed for the new LCD's (720p * 3) and DPI displays. Is that going to be affected by this change?

6by9 · 2019-02-28T13:49:56Z

Depends when exactly the transpose buffers are allocated.

This PR is shifting the main frame buffer that the ARM knows about from being a VPU to ARM allocation.
I suspect the transpose buffers are later mem_alloc calls (maybe wrapped as vc_image_relocatable_alloc), in which case those are independent.
Iff the vcsm-cma service has connected, then it will deal with the allocation. Otherwise it will fall back to the reloc heap (what remains of it). We may need to refactor when it does the alloc to ensure that vcsm-cma has connected first - that may be tricky as currently that is at ~7sec after boot, probably because it is probed as a platform driver via the VCHIQ driver coming up.

6by9 · 2019-02-28T15:27:29Z

So it seems CONFIG_CMA_SIZE_MBYTES=5 gets rounded up to 8 by some magic. Setting it to 16 does result in a 16MiB allocation reported by the kernel.

Are we happy at 16MiB and not worry about those using 4k screens? They should revert to using the reloc heap instead, so a soft fail anyway at present.
Likewise James' transpose buffers aren't an issue for now, but no need to be considered when we are trying to shrink the reloc heap for real.

pelwell · 2019-02-28T15:29:46Z

They always have the option of adding cma=32M to their cmdline.txt.

6by9 · 2019-02-28T15:32:14Z

They always have the option of adding cma=32M to their cmdline.txt.

Very true.
OK, I'll squash the two current patches, and add one patching the defconfigs.

6by9 · 2019-02-28T15:38:24Z

Squashed, and defconfigs amended.
Any further comments, or are we good to go?

popcornmix · 2019-02-28T15:39:10Z

They should revert to using the reloc heap instead, so a soft fail anyway at present.

I think the more serious failure is if these succeed but leave CMA with an awkwardly small amount remaining causing later failures.

I wonder if the firmware should indicate whether cma allocation is preferred? (i.e. new firmware are large cma size)?

popcornmix · 2019-02-28T15:40:19Z

The case I'm nervous about is something like kodi with firmware 3d driver that may experience a performance hit from the remoted allocations. As such we probably want a switch between old style gpu_mem and new style cma.

With the switch in the gpu_mem position cma allocation of framebuffers is probably not the best option.

6by9 · 2019-02-28T16:00:55Z

@popcornmix This patch is only touching the frame buffer driver, not remoting VPU allocations - that's waiting on https://github.com/6by9/linux/tree/rpi-4.19.y-clean2 and needs firmware changes which are up as an MR. Remote allocations is behind a config.txt flag at present, and off by default.

There's near zero performance overhead with this PR as it is all controlled from the fb driver. During FB setup it does the CMA allocation (from the newly enlarged heap) and tries passing it to the VPU. If the VPU accepts it then job's done. If not it frees the CMA allocation and lets the VPU allocate.

I think the more serious failure is if these succeed but leave CMA with an awkwardly small amount remaining causing later failures.

Hence my comment of do we worry about 4k screen users, or more 2560x1440 as that allocation will just succeed in a 16MiB CMA heap.
CMA is semi-reserved memory, so it can be allocated to other users if normal memory has been exhausted. Those users may get remapped should a bigger CMA allocation require it.
Setting the heap size higher therefore has fewer downsides to setting gpu_mem higher.

How we handle switching from gpu_mem sizing to cma sizing is one to discuss. There are several combinations of things that could result in a system that fails to boot, particularly if we're considering the bare-metal world or RiscOS (they won't have the vcsm service).
John's going to be happy as he's just gained 8MB of reloc heap. Likewise Kodi with firmware v3d has gained a small amount of extra reloc heap.

popcornmix · 2019-02-28T16:27:34Z

My initial reaction was to merge this as the awkward edge cases (framebuffer alloc that succeeds but prevents booting because there is insufficient cma remaining) would be temporary until we move to the big cma world.

But my kodi example suggests that the edge cases may not be temporary so we might want to think through possible solutions when we aren't in the big cma world.

It seems like with this PR, I can run fbset with carefully constructed (or just unlucky) dimensions that exhaust cma and cause the kernel to grind to a halt. The same can happen on boot with a larger than 1080p monitor configured.

Some ideas:
Measure free cma and fall back to firmware allocation is it seems too low.
If measuring free cma not possible, use initial value of cma.
If initial value of cma not available a larger allocation can be attempted (and freed) to check
The firmware could pass a flag to say if it's using cma allocations to the kernel (e.g. module parameter)
The kernel could query this information through a mailbox call.

6by9 · 2019-02-28T16:39:12Z

Measure free cma and fall back to firmware allocation is it seems too low.

No apparent call to do that, and certainly not through the dma API.
CMA only keeps a bitmask of allocated/free pages, not an overall count.

If measuring free cma not possible, use initial value of cma.

That could be extracted, although again I suspect not through official APIs.

If initial value of cma not available a larger allocation can be attempted (and freed) to check

Can do that. Make sure we have 6MB excess, and hope no other big users have jumped in before us? If we've done it once, then stick with ARM allocations (unless it actually fails).

The firmware could pass a flag to say if it's using cma allocations to the kernel (e.g. module parameter)

The firmware doesn't know for definite. It knows that it has been asked to do them should a suitable provider connect over VCHI, but there's no guarantee that will happen.

The kernel could query this information through a mailbox call.

As above, the firmware doesn't know at this early stage whether it will actually have a suitable provider for remote allocations or not.

I'm not in tomorrow morning, but it sounds like it's better to have a 15minute bash this through rather than banging things back and forth here.

6by9 · 2019-03-06T13:35:34Z

Conversation had with @popcornmix.
Merge the support for this as is without increasing the CMA heap.

On the firmware side we will be gaining the config.txt setting remote_gpu_mem if we're allocating from CMA via vcsm-cma.
Iff that is set, then the firmware will accept the CMA allocation made by this patch and use it as the frame buffer.
If it is not set then we assume we have lots of gpu_mem, reject the passed in buffer, and allocate our own from gpu_mem.

It does make the full implementation of this dependent on the patch set for remote memory allocations, but that's not a big deal.

All happy with that plan?

pelwell · 2019-03-06T13:44:58Z

What is the intended use of remote_gpu_mem? Will it cause a larger CMA area to be reserved on the ARM?

pelwell · 2019-03-06T13:47:35Z

Or is it just a boolean to put the choice into the hands of the user (and the default value compiled into the firmware)?

pelwell · 2019-03-06T14:03:15Z

After reading the VPU-side counterpart, I'm happy with this scheme.

Currently the VPU allocates the contiguous buffer for the framebuffer. Try an alternate path first where we use dma_alloc_coherent and pass the buffer to the VPU. Should the VPU firmware not support that path, then free the buffer and revert to the old behaviour of using the VPU allocation. Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>

6by9 · 2019-03-06T14:22:42Z

As you've surmised, remote_gpu_mem means that the VPU is expecting to be able to use CMA allocations instead of the reloc heap, therefore other config needs to be updated to allow this.

The intention is not to enable this by default until we have a sensible scheme in place to play off gpu_mem vs CMA. For the standard case we'd expect a largish CMA heap (256MB) with minimal reloc heap (<20MB). When using the firmware GLES driver there is a minor performance hit, therefore users may want to revert back to using a genuine reloc heap.

6by9 · 2019-03-06T14:24:11Z

Rebased on top of 4.19.27.

kernel: Enable MT76 USB wifi modules See: raspberrypi/linux#2890 kernel: staging: bcm2835-codec: NULL component handle on queue_setup failure See: raspberrypi/linux#2898 kernel: f2fs: fix to skip verifying block address for non-regular inode See: raspberrypi/linux#2896 kernel: vcsm: rpi-4.19.y - gcc-v8 fixes See: raspberrypi/linux#2897 kernel: vc-sm-cma tidy ups, and one for bcm2835_codec See: raspberrypi/linux#2889 kernel: staging: vc_sm_cma: Remove erroneous misc_deregister See: raspberrypi/linux#2888 kernel: video: bcm2708_fb: Try allocating on the ARM and passing to VPU See: raspberrypi/linux#2875 firmware: smservice: Avoid a double free firmware: arm_loader/display: All the framebuffer allocation to be made on the ARM firmware: isp_tuners: Fix memory leak in error path firmware: testc: Fix dps parsing firmware: camera_subsystem: Fix hardware sync pulses off disable_camera_led firmware: Replace the remaining direct users of C(mask_gpu_interrupt[0|1]) firmware: arm_display: Kick transposer for the rainbow screen if required firmware: dispmanx: Do not allow transpose if using Full KMS See: raspberrypi/linux#2891

pelwell reviewed Feb 27, 2019

View reviewed changes

drivers/video/fbdev/bcm2708_fb.c Outdated Show resolved Hide resolved

6by9 force-pushed the rpi-4.19.y-clean4 branch from d816acb to bd23d63 Compare February 28, 2019 15:37

6by9 force-pushed the rpi-4.19.y-clean4 branch from bd23d63 to 7df0535 Compare March 6, 2019 13:29

popcornmix force-pushed the rpi-4.19.y branch from 7f41587 to 789277d Compare March 6, 2019 14:06

6by9 force-pushed the rpi-4.19.y-clean4 branch from 7df0535 to 6b7ad24 Compare March 6, 2019 14:24

pelwell merged commit c0e09b3 into raspberrypi:rpi-4.19.y Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875

video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875

6by9 commented Feb 27, 2019

pelwell Feb 27, 2019

6by9 Feb 28, 2019

pelwell commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

pelwell commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

JamesH65 commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

pelwell commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Mar 6, 2019

pelwell commented Mar 6, 2019

pelwell commented Mar 6, 2019

pelwell commented Mar 6, 2019

6by9 commented Mar 6, 2019

6by9 commented Mar 6, 2019

video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875

video: bcm2708_fb: Try allocating on the ARM and passing to VPU #2875

Conversation

6by9 commented Feb 27, 2019

pelwell Feb 27, 2019

Choose a reason for hiding this comment

6by9 Feb 28, 2019

Choose a reason for hiding this comment

pelwell commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

pelwell commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

JamesH65 commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

pelwell commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

popcornmix commented Feb 28, 2019

6by9 commented Feb 28, 2019

6by9 commented Mar 6, 2019

pelwell commented Mar 6, 2019

pelwell commented Mar 6, 2019

pelwell commented Mar 6, 2019

6by9 commented Mar 6, 2019

6by9 commented Mar 6, 2019