-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to do fast and battery efficient DMA-driven SPI transfers? #992
Comments
As you've discovered, The SPI interfaces, like the I2C, SDHOST and a few others, share the main VPU core clock - that's a fundamental hardware restriction - so when the core clock changes so do the slaved clocks. The downstream SDHOST driver gives control of its clock divisor to the VPU, but the drivers for the other interfaces just calculate their divisors based on the turbo frequency, leading to the effect you are seeing. What is the minimum SPI clock rate you need to hit? It looks like the SPI clock divisor has to be even, but 250MHz/4 = 62.5MHz, which you could achieve by setting "core_freq=250". Setting the core clock maximum to the same value as the minimum has the effect of keeping the core clock constant, decoupling it from the ARM's turbo changes. With regard to the "warranty void" bit, from https://www.raspberrypi.org/documentation/configuration/config-txt/overclocking.md:
In other words, if the core can run at the chosen turbo frequency without requiring over-voltage then the over-voltage bit won't be set. So if you are prepared to lower your turbo frequency rather than raising it you should be able to use a CDIV of 4 to hit 66.75:
|
Thanks @pelwell for the quick reply, super informative as always!
In order to sustain a 60fps update rate on fast moving content, a SPI frequency around 66MHz is needed. A
Is it possible to force the core frequency to fixed 400 MHz, while letting the ARM cpu turbo up and down between 600Mhz and 1200Mhz by itself? I.e.
and run SPI with |
Unfortunately the firmware contains a restriction that |
Ah, gotcha. Perhaps there might be a way to allow some leeway to these, e.g.
Btw, how does SDHOST achieve this? Is this a software/firmware/fixed in hardware based thing? If it was possible to specify the SPI0 controller to seamlessly switch to Or even something like
which would result in Would it be feasible to release SPI0 CDIV over to the firmware(?) to control this way? Or is the behavior fixed in hardware? |
It would be possible to implement something like that - it's just a matter of which processor writes to the register. For the SDHOST interface there is a mailbox message to indicate the preferred SD clock speed, and the firmware calculates the correct divisor for the current core clock, taking care never to exceed the requested value. Sadly this shared clock control only happens in the downstream SDHOST drive - upstream keeps it simple, with the same result as you've seen with SPI - and to add it to SPI we'd have to patch the upstream SPI driver - something we try to avoid. |
Thanks, that makes sense. Ran some power consumption numbers to estimate how much more power such a with
I got
and then
gave
The numbers may be a bit rough, used a cheap USB power consumption tester and integrated consumed current over a 30 minute operation, and then derived average mA consumption from that. So if the limitation on
setting, it seems that that would consume around +10%-+20% more power compared to a hypothetical
option where the firmware was able to control SPI0 CDIV on the fly, rather than locking |
@popcornmix Any thoughts on allowing a single core frequency, either above or below 250MHz, to be specified, e.g. by detecting that |
The only issue is voltage will presumably be at minimum when arm freq is at 600MHz so core_freq may not be reliable when over 250MHz. But if we're willing to treat it as a overclock style "it may work for you" then I guess we could allow it. |
That was the idea - this will be for specialist applications, and we can document the concerns about guaranteeing the voltage is adequate. |
Any chance Raspberry Pi 4 would support specifying desired SPI CDIV values in /boot/config.txt separately for idle state and turbo state? I am assuming here that Pi 4 would also have two power states like the Pi 3, is that right? |
Got my hands on a Pi 4B, and observing that it does still have two performance states, with idle clock speed of 600MHz CPU & 250MHz SoC, whereas the turbo speeds are 1500MHz CPU & 500 MHz SoC (vs 250<->400 scaling from Pi3B). Since the gap is now bigger, to get good SPI bus performance, Pi4B would even more need to be able to set different SPI CDIVs for idle and turbo. |
Hello - I'd like to revisit this topic if possible? There are a lof of SPI-based peripherals that are used with the Pi, especially Pi-based portable gaming devices are extremely popular. Since its creation, fbcp-ili9341 has become ubiquitous in use for driving SPI-based displays on Pi gaming devices, with thousands of users. There are a kickstarter projects and commercial products out there that rely on fast SPI bus speeds on the Pi. The related bug item raspberrypi/userland#440 gained 124 thumbs up requests, which is likely the highest by a long margin amount of feedback that any single bug against the Pi has ever received. In #992 (comment) it was mentioned that the SDHOST clock divisor is already controlled by the code that governs the turbo up/down scaling. In later comment #992 (comment) it was mentioned that it would be possible for the same behavior to be applied to the SPI bus, but that it was just not done due to implementation complexity. Given the large userbase, I would like to ask to revisit the possibility of tackling this complexity head-on? Ideally, SPI bus speeds would also be controllable in sync by the turbo state switches. Reiterating the proposal from #992 (comment), if one could set fields in
that would tell the firmware which SPI bus speeds it should be applying for both power states. Then for example a default value -1 would disable this feature, for compatibility. (Alternative specification might be to apply a desired bus speed that should not be exceeded: e.g. via That way use cases that dedicate the SPI bus for a single peripheral could specify appropriate targets for the SPI bus speed, instead of having to severely undershoot the bus, which leads to ~ -37.5% available SPI bandwidth on the Pi3B, and -50% available SPI bandwidth on the Pi4B. This would greatly improve the display performance of all these popular Pi projects. |
I am trying to do fast continuous SPI transfers out from a Pi (working on a Model 3 B and a Zero W) and while the transfers are running, I'd like to make the main CPU idle until the transfers are finished.
The BCM core has an idle frequency of 250 MHz, and when the CPU is under load, it boosts up to 400 MHz. It looks like the CPU frequency is linked to this same turbo, and it idles at 600 MHz, and turbos up to 1200 MHz on Pi 3, and 1000 MHz on Zero.
Originally I was doing SPI Polled Mode transfers, busy spinning the CPU in a loop pushing bytes out to FIFO, and reading back from it when the read bytes become available. This gave me a nice
400/6=66mbits/sec
transfer rate withCDIV=6
, but the issue was that this busy spinning kills the battery, and one hardware thread, so this was not feasible on the Zero.Then after migrating to using DMA instead of Polled Mode, I see I get the same
400/6=66mbits/sec
of transfer rate as long as I busy spin the CPU to wait until the DMA transfer is complete. However, after switching from busy spinning to actually sleeping the CPU to wait until the DMA completes, I get a drop of the BCM core frequency down to 250 MHz, and my transfer rates drop to250/6=41.66mbits/sec
, a dramatic -37.5% reduction in SPI throughput.It seems that heavy SPI activity by itself in the absence of CPU activity does not cause the BCM core to trigger itself to turbo up to increase the SPI transfer speed, but the turbo is controlled only by activity on the main CPU core.
Ideally, what I would like to achieve is to have the BCM core automatically trigger itself to turbo up whenever there exists heavy SPI activity in the FIFO (or perhaps when there are active DMA writes to the SPI TX or RX PER_MAPs ongoing?), ideally keeping the main CPU core frequency at idle, so the system would bump itself up from 600MHz/250MHz to 600MHz/400Mhz when SPI transfers are ongoing.
If such "600MHz/400Mhz" turbo mode is not technically possible and the main CPU and BCM core clocks are fixed to have to turbo at the same time, I'd then like to the system to automatically detect to turbo up to 1200MHz/400Mhz when there is SPI activity going on, while user code could still run an
usleep()
or a futex/mutex wait for a signal/interrupt to occur.Are either of the above technically feasible?
As a third fallback option, it would be possible in my application to manually control turbo via some kind of hinting, if such a method might be feasible. My DMA transfers can be anything between a few bytes to up to 480 * 320 * 2 bytes in size at a time, and before I start a DMA transfer, I could add in a hint trigger to tell the system e.g. "please keep BCM core turbo up for the next 0.7/1.3/2.5 msecs". This kind of hinting would allow the BCM core get a breather immediately when the application does not need to do any SPI transfers, dropping back to idle to save power.
My application is about implementing a power and performance efficient display driver for SPI connected displays, you can find the fbcp-ili9341 project here:
A demo video of Quake running at 60fps here.
The transfer footprint of my application ranges between long periods of heavy activity, to short bursts of heavy activity, to long periods of no activity, depending on how much pixel animation there is on screen in particular content. Ideally I'd be able to turbo up the BCM core quickly when SPI transfers are performed, and drop it back to idle when there are none.
As a workaround to not have to busy spin burn cycles on the main CPU to make the SPI transfers keep up, I have added
force_turbo=1
in/boot/config.txt
, and in that way, I can keep the main CPU asleep but still have the SPI bus running at 400 MHz. This lets the CPU schedule other processes on the Pi Zero W to keep things running smooth. However this is not a feasible solution, as I understand booting withforce_turbo=1
irrevocably sets a "warranty void" bit on the device, and it's likely excessive to have the main CPU core run at 1200MHz (1000Mhz on Zero W) even if it is sleeping idle for the most of those cycles.Any thoughts on what would be the best way to proceed? Thanks in advance!
The text was updated successfully, but these errors were encountered: