Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi 5 kernel8.img for 6.6.62 causes audio players to freeze with a looping sound #6504

Closed
theflyingape opened this issue Nov 30, 2024 · 36 comments

Comments

@theflyingape
Copy link

Describe the bug

config.txt

[pi5]
kernel=kernel8.img

Steps to reproduce the behaviour

ffplay sound.wav race condition with studdering sound until killed.
If I comment out kernel8.img, ffplay runs fine, but some other software will segfault without kernel8.

Device (s)

Raspberry Pi 5

System

uname -r
6.6.62+rpt-rpi-v8

If I rollback to 6.6.51, it kernel8.img and sound events run fine.

Logs

No response

Additional context

No response

@popcornmix
Copy link
Collaborator

What device are you outputting audio to? (e.g. hdmi?)
Is this RPiOS bookworm 64-bit? Lite or desktop image?

@theflyingape
Copy link
Author

Yes, latest Bookworm release and output to HDMI-0 -- both from LightDM with Wayland+labwc and from console mode.

@theflyingape
Copy link
Author

I can confirm rolling back the linux-image-6.6.62 --> linux-image-6.6.51 with the kernel8.img enabled works as expected.

@pelwell
Copy link
Contributor

pelwell commented Dec 2, 2024

I'm not a regular ffpmeg user, but it's working for me to HDMI0. There is no audible difference between kernel8.img and kernel_2712.img.

  1. What sample format and rate is sound.wav? If you've used several formats, does the behaviour differ between them?
  2. Does speaker-test -c 2 stutter?
  3. How about aplay sound.wav?

@theflyingape
Copy link
Author

Just tried sudo apt install linux-image-6.6.62+rpt-rpi-2712 linux-image-6.6.62+rpt-rpi-v8 on Pi 5, rebooted with and without kernel8.img, and sound stutters regardless of format (wav, mp3) which includes playing video (flv, mp4) and the studder stalls video playback. I also tested with and without wireplumber and pipewire services.

speaker-test does stutter and I cannot Ctrl-C to stop -- must wait 30-seconds.
aplay the same -- but interesting that ffplay, mplayer, mpv all work on Pi 4 (not Pi 5 as reported) but aplay fails.

Only on Pi4, I must stop wireplumber (not pipewire) else mplayer immediately reports "Audio device got stuck!"
On Pi 5, stopping all and sending it to ALSA device reports "Audio device got stuck!" repeatedly.

@pelwell
Copy link
Contributor

pelwell commented Dec 2, 2024

If you have a spare SD card, try a clean install of the latest 64-bit OS - it works with ffplay, speaker-test, aplay and mplayer (which I did have to apt install). Yes, speaker-test doesn't like Ctrl-C, and ffplay doesn't exit and the end of the file, but those are common problems.

@theflyingape
Copy link
Author

Happy to oblige -- used 2024-11-19-raspios-bookworm-arm64-full.img.xz
Works perfectly off the installation with the auto updates it applies.
However, after modifying config.txt:

# For more options and information see
# http://rptl.io/configtxt
# Some settings may impact device functionality. See link above for details

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# Additional overlays and parameters are documented
# /boot/firmware/overlays/README

# Automatically load overlays for detected cameras
camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[cm5]
dtoverlay=dwc2,dr_mode=host

[all]

[pi5]
arm_freq=2760
dtparam=nvme
dtparam=pciex1_gen=3
gpu_freq=920
#over_voltage=3
# account for Mupen64Plus-Next limitation
kernel=kernel8.img

ffplay essentially gets stuck and speaker-test studders horribly @ ~60 seconds per period instead of constant pink noise for ~11

@popcornmix
Copy link
Collaborator

Can you confirm the issue occurs without the overclock?

@theflyingape
Copy link
Author

No problem, I had posted a "clean" config.txt from the start but certainly worth a try off the "clean" install.
So booted off the clean SD card install without any NVME and without any cpu/gpu changes, sound behaves badly as before with the kernel8.img on Pi 5. I comment out config.txt kernel=kernel8.img, it works perfectly fine.

I do see a quick flash on the console from power up with the kernel of a cma failure with allocate multiple times that does not occur from the regular kernel startup. I could not find it in any boot log, but was able to capture it with my phone:
Screenshot_20241202_140242

@popcornmix
Copy link
Collaborator

popcornmix commented Dec 2, 2024

That's interesting. Does changing:

dtoverlay=vc4-kms-v3d

to

dtoverlay=vc4-kms-v3d,cma-320

help?

Note: you can see those messages by running dmesg. Posting the output of dmesg here is probably useful.

@theflyingape
Copy link
Author

theflyingape commented Dec 2, 2024

Well that was a magic bullet -- I updated linux images on both the SD card clean install and then my NVME boot (with overclocking) and sound works for all my use-cases using the kernel8.img now.
How did we get here? :)
P.S., I had checked dmesgand there were no like cma entries in it.

@popcornmix
Copy link
Collaborator

You should see the cma messages in dmesg if you hit the issue (i.e. without the cma-320 parameter).

The question is, what needs so much cma on your system?
I wouldn't expect it from ffplay (with an audio file).
The nvme drive is something to consider.

Can you show output of grep Cma /proc/meminfo?
I get:

$ grep Cma /proc/meminfo 
CmaTotal:          65536 kB
CmaFree:           48992 kB

@popcornmix
Copy link
Collaborator

Post url from raspinfo | pastebinit.

@theflyingape
Copy link
Author

CmaTotal:         327680 kB
CmaFree:          245392 kB

... nets only: 82288 kB
Url: raspinfo

@theflyingape
Copy link
Author

I checked the Pi 4, it shows 524288 allocated using stock settings off the stock kernel.
On Pi 5, removed cma-320 using kernel8.img and it shows 65536 allocated with 0 free.
This keeps coming back to me as a regression in this kernel build, no?
Here's the dmesg | grep -i cma output:

[    0.000000] Reserved memory: created CMA memory pool at 0x0000000001c00000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: reserved mem: 0x0000000001c00000..0x0000000005bfffff (65536 KiB) map reusable linux,cma
[    0.000000] Memory: 8043824K/8380416K available (13504K kernel code, 2206K rwdata, 4292K rodata, 4864K init, 1086K bss, 271056K reserved, 65536K cma-reserved)
[    0.663863] cma: cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[    0.671307] cma: cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[    0.678843] cma: cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[    0.686280] cma: cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[    0.693814] cma: cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[    0.701251] cma: cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[    0.708776] cma: cma_alloc: linux,cma: alloc failed, req-size: 4 pages, ret: -12
[    0.716208] cma: cma_alloc: linux,cma: alloc failed, req-size: 16 pages, ret: -12
[    1.460346] cma: cma_alloc: linux,cma: alloc failed, req-size: 514 pages, ret: -12
[    1.468420] cma: cma_alloc: linux,cma: alloc failed, req-size: 514 pages, ret: -12

@popcornmix
Copy link
Collaborator

See: #6361

It's a deliberate change. We are not planning to revert that.
We need to work out why your setup needs the large amount of CMA, and if that is avoidable.

@popcornmix
Copy link
Collaborator

I updated linux images on both the SD card clean install and then my NVME boot

Just to confirm. Do you have the looping sound issue on an sdcard install (with kernel8), or only the NVME boot?

@theflyingape
Copy link
Author

In summary:
The effect is reproducible using the kernel8.img for Pi 5 -- regardless of boot device: SD card or NVMe.
The fix is assigning enough CMA (I am using cma-512) for Pi 5 using the kernel8.img.
Pi 4 does not need to use kernel8.img and defaults to 512M from a stock boot.

  1. Pi 5 using cma-512 is not required on the "regular" kernel, it boots and runs fine having 64M.
  2. Pi 5 kernel8 boot reports a series of cma_alloc failures to the console before the Plymouth splash screen starts, as there is no more free CMA memory from the default 64M assigned.

@popcornmix
Copy link
Collaborator

This is the surprising part.
For me, booting on a Pi5 with kernel8.img I get:

pi@pios:~ $ uname -a
Linux pios 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux
pi@pios:~ $ grep Cma /proc/meminfo 
CmaTotal:          65536 kB
CmaFree:           49040 kB

so most of the default 64M cma is still free. This is after booting to desktop.

I need about 16M cma. You need 82M cma. What is the difference.

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

I get identical numbers here using your config.txt as-is:

pi@raspberrypi:~$ grep Cma /proc/meminfo
CmaTotal:          65536 kB
CmaFree:           49040 kB
pi@raspberrypi:~$ uname -a
Linux raspberrypi 6.6.63-v8+ #1824 SMP PREEMPT Tue Dec  3 12:59:39 GMT 2024 aarch64 GNU/Linux

@theflyingape
Copy link
Author

So sad: I am assuming raspinfo was not revealing any between our Pi 5 boards and startup configurations then. Would attaching the initial ramdisk help? If I use the stock kernel, I get 64M allocated with 16032 kB free.

I personally do not mind any coupling of kernel8.img with a hardwired cma upsizing overlay in the config.txt

I could not speak to the differences between the kernel & kernel8 build as to why they would behave so differently on initial startup. I just know that some software emulation cores will not work using kernel and must have kernel8 to mitigate some complication with how its memory is allocated.

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

You've got our attention, and we do believe you - we're just having some difficulty finding the differentiating factor.

And it looks like you may not be alone: https://forums.raspberrypi.com/viewtopic.php?p=2263123

@popcornmix
Copy link
Collaborator

It's kind of tricky to get good information on where cma allocations come from. Can you try adding to cmdline.txt (on existing line):

ftrace=function ftrace_filter=cma_alloc trace_options=func_stack_trace trace_event="cma:*"

boot up (ideally with the clean sdcard image with kernel8 that has cma allocation failures)
and then, when booted post url from:

sudo cat /sys/kernel/debug/tracing/trace | pastebinit

@gitbls
Copy link

gitbls commented Dec 3, 2024

Here's a trace as requested by @popcornmix

https://paste.debian.net/hidden/39d6dcf6/

@theflyingape
Copy link
Author

sudo cat /sys/kernel/debug/tracing/trace | pastebinit

https://paste.debian.net/hidden/c65c5554/

@popcornmix
Copy link
Collaborator

Okay, that narrows it down. The call stack is:

=> nvme_setup_host_mem
 => nvme_probe
 => local_pci_probe
 => pci_device_probe
 => pci_host_probe
 => brcm_pcie_probe

and that seems to be called 15 times each wanting 4M of CMA.

Running that again but without kernel=kernel8.img might be interesting.
(Does it not allocate this at all, or is it perhaps 4x smaller?)

Was this an sdcard boot? If so can you confirm if removing dtparam=nvme also avoids the cma error (I assume it will from the call stack).

@gitbls
Copy link

gitbls commented Dec 3, 2024

Okay, that narrows it down. The call stack is:

=> nvme_setup_host_mem
 => nvme_probe
 => local_pci_probe
 => pci_device_probe
 => pci_host_probe
 => brcm_pcie_probe

and that seems to be called 15 times each wanting 4M of CMA.

Running that again but without kernel=kernel8.img might be interesting. (Does it not allocate this at all, or is it perhaps 4x smaller?)

Was this an sdcard boot? If so can you confirm if removing dtparam=nvme also avoids the cma error (I assume it will from the call stack).

Here is the log without kernel8: https://paste.debian.net/hidden/928f3ddc/

The system boots off a USB-connected SATA SSD (Kingston A400 ICIM).

The nvme drive is a data/work disk. No dtparam=nvme in config.sys, but there is dtparam=pciex1_gen=3. I can run without that and do runs with and without kernel8 after breakfast, etc. LMK if there's anything besides these 2 that would be helpful.

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

No dtparam=nvme in config.sys

The nvme dtparam is set automatically by the firmware if it detects the presence of a PCIe adaptor via hardware magic, or if NVME is in the boot order.

pelwell added a commit to pelwell/linux that referenced this issue Dec 3, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: raspberrypi#6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
pelwell added a commit to pelwell/linux that referenced this issue Dec 3, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: raspberrypi#6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
pelwell added a commit to pelwell/linux that referenced this issue Dec 3, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: raspberrypi#6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

After a lot of head scratching and research I've come up with a highly technical potential workaround for the problem, which you'll find in #6510.

--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1936,6 +1936,7 @@ static void nvme_free_host_mem(struct nvme_dev *dev)
        dev->nr_host_mem_descs = 0;
 }
 
+#if 0
 static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred,
                u32 chunk_size)
 {
@@ -2003,9 +2004,11 @@ static int __nvme_alloc_host_mem(struct nvme_dev *dev, u64 preferred,
        dev->host_mem_descs = NULL;
        return -ENOMEM;
 }
+#endif
 
 static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
 {
+#if 0
        u64 min_chunk = min_t(u64, preferred, PAGE_SIZE * MAX_ORDER_NR_PAGES);
        u64 hmminds = max_t(u32, dev->ctrl.hmminds * 4096, PAGE_SIZE * 2);
        u64 chunk_size;
@@ -2018,6 +2021,7 @@ static int nvme_alloc_host_mem(struct nvme_dev *dev, u64 min, u64 preferred)
                        nvme_free_host_mem(dev);
                }
        }
+#endif
 
        return -ENOMEM;
 }

If you wait about 40 minutes for the autobuilds to complete, you'll be able install a trial kernel with sudo rpi-update pulls/6510.

@gitbls
Copy link

gitbls commented Dec 3, 2024

I Installed the rpi-update kernel and running with kernel=kernel8.img I'm not seeing any cma_alloc errors.

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

Do you notice any reduction in NVME speed?

@gitbls
Copy link

gitbls commented Dec 3, 2024

Ran some quick, smallish tests. Both are 8G Pi5 systems with a Kingston A400 SSD and a Kingston NV2 data disk. All tests done on the NV2. There is a difference in the NVME adapters however.

host pw: my workstation, installed with 2023-12-05 IMG. NVME adapter: Geekworm x1001
host pit: Test system, installed with 2024-11-19 IMG. NVME adapter: Geekworm N04

hdparm speeds

  • pw on pw pi (no rpi-update): Timing buffered disk reads: 2270 MB in 3.00 seconds = 756.15 MB/sec
  • pit disk on pw pi (rpi-updated): Timing buffered disk reads: 2310 MB in 3.00 seconds = 769.40 MB/sec
  • pit on pit pi (rpi-updated): Timing buffered disk reads: 2502 MB in 3.00 seconds = 833.60 MB/sec
  • pw disk on pit pi (no rpi-update): Timing buffered disk reads: 2492 MB in 3.00 seconds = 830.16 MB/sec

rpi-updated raw speed slightly better than non-rpi-updated on same hardware.

But, looks like the NVME adapter in pit is better than the one in pw 😮

Simple test (using the same 4.8GB IMG, copied locally). yea, it's a little small, but went for breadth

  • rm -f test.img
  • echo 1 | tee /proc/sys/vm/drop_caches
  • time cp imgfile.img test.img

Result

Summary:

  • real-world app (cp) as fast or faster with the rpi-update kernel
  • (bonus) as expected, cma logging not expensive

pw disk on pit (no rpi-update)

kernel8 logging on

real 0m8.479s
user 0m0.021s
sys 0m3.742s

real 0m8.026s
user 0m0.020s
sys 0m3.752s

kernel8 logging off

real 0m8.901s
user 0m0.009s
sys 0m3.833s

real 0m9.271s
user 0m0.021s
sys 0m3.925s

pit disk on pit (rpi-updated)

kernel8 logging off
real 0m7.637s
user 0m0.016s
sys 0m3.958s

real 0m8.711s
user 0m0.044s
sys 0m3.885s

pit disk on pw (rpi-updated)

kernel8 logging on
real 0m9.419s
user 0m0.028s
sys 0m4.017s

real 0m10.748s
user 0m0.021s
sys 0m3.819s

kernel8 logging off
real 0m9.325s
user 0m0.012s
sys 0m3.733s

real 0m9.375s
user 0m0.038s
sys 0m3.811s

pw disk on pw (no rpi-update)

kernel8 logging off
real 0m9.445s
user 0m0.023s
sys 0m3.969s

real 0m9.335s
user 0m0.021s
sys 0m3.824s

@theflyingape
Copy link
Author

Used kernel8 on all 3 geekbench6 runs off NVME boot disk:

  1. 6.6.62 with cma-128: 9225167
  2. 6.6.63 w/o cma-128: 9225361 - CMA FREE: 0 kB
  3. 6.6.63 pulls/6510: 9225534 - CMA FREE: 48784 kB

... shows only a small drop-off between overall scores 1 - 3

@popcornmix
Copy link
Collaborator

I believe geekbench doesn't do any file access within the benchmark (it loads the data files it uses before starting the timing), so I wouldn't expect it to be affected by pr6510.

Might be worth running the geekbench tests again to see if the scores are consistent. I do find that these small changes (the odd percent) tend to be in the noise and you need multiple runs to be sure if something is consistently better.

@theflyingape
Copy link
Author

Agreed with all of that and understand my Samsung SSD with Geekworm X1001 is different too.
I run mine 7x24 so it's running A/V all the time, so I'm more interested that it remains stable overnight and then I'll launch some retro games in-between. :)
Thanks for all the investigative work into root cause.

@pelwell
Copy link
Contributor

pelwell commented Dec 4, 2024

Thanks for the feedback - it all sounds pretty positive. I'll merge the PR as it stands - there should be no doubt about what it is doing - and wait for complaints about performance.

pelwell added a commit that referenced this issue Dec 4, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
pelwell added a commit that referenced this issue Dec 5, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
pelwell added a commit that referenced this issue Dec 5, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
pelwell added a commit that referenced this issue Dec 6, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 7, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 7, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 9, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 10, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 16, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 16, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
popcornmix pushed a commit that referenced this issue Dec 20, 2024
Some NVME drives seem to request significant amounts of DMA coherent
memory - enough to exhaust our standard 64MB CMA allocation.

Try disabling the feature to see what effect it has - drives should
continue to function without it.

Link: #6504

Signed-off-by: Phil Elwell <phil@raspberrypi.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants