Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix buffer size for WS2812 PWM driver #17046

Merged
merged 8 commits into from
Aug 14, 2022

Conversation

yiancar
Copy link
Contributor

@yiancar yiancar commented May 9, 2022

In a recent build, I was trying to use more than 60 ws2812 using the PWM driver (per key RGB).
My STM32F072 run out of RAM. (heap linker error).
Apparently the buffer for the PWM overflow value is a 32bit register...
Did some math to figure out if it need to be this big:
High period for a 1 PWM frequency (CPUCLK/2) / (1000000000 / 800).
Essentially we want to find how many PWM ticks it will take to get 800ns

The above equation never seems to go above 50 ticks... 48MHz CLK is about 19 ticks.

As the DMA controller allows a transfer of a byte of memory into a word of peripheral we better fo that as well:)

Now our buffer is about 1.5Kb of RAM instead of 6.5 Kb.

Description

Changed DMA for PWM WS2812 driver from word to word TO byte to word.
Changed PWM buffer to a byte instead of a word as it was mostly empty:)

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Keyboard (addition or update)
  • Keymap/layout/userspace (addition or update)
  • Documentation

Issues Fixed or Closed by This PR

  • STM32F072 runs out of RAM with more than 60ish WS2812 LEDs using the PWM driver.

Checklist

  • My code follows the code style of this project: C, Python
  • I have read the PR Checklist document and have made the appropriate changes.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

@github-actions github-actions bot added the core label May 9, 2022
@drashna drashna requested a review from a team May 9, 2022 20:15
@sigprof
Copy link
Contributor

sigprof commented May 9, 2022

As the DMA controller allows a transfer of a byte of memory into a word of peripheral we better fo that as well:)

The problem is that not all STM32 chips have DMA controllers that support that kind of operation. Older chips (DMAv1 in ChibiOS) can zero-extend data exactly like you want if you specify different MSIZE and PSIZE values. However, some newer chips like F401/F411 (DMAv2 in ChibiOS) cannot do that — depending on DMA FIFO settings, they either force MSIZE and PSIZE to be the same, or perform multiple transfers (either collecting the larger destination data unit by performing multiple reads from the source, or splitting the larger source data unit into multiple writes to the destination). Therefore blindly changing MSIZE to BYTE would break the driver at least on F4x1 chips (and probably the whole F4xx series, or possibly even more).

So the suggested optimization would need to be conditional — the type of DMA controller implementation in the chip needs to be checked somehow. At least the most resource-restricted chips tend to have DMAv1, so the optimization would be possible for them.

For DMAv2 controllers the only possible optimization would be using 16-bit data if the chosen timer is 16-bit (in this case the DMA controller would perform 16-bit writes to the timer registers, which is not formally correct according to the datasheet, but seems to work in practice). Doing the same thing for a 32-bit timer, however, would break things (apparently a 16-bit write to a 32-bit register results in the 16-bit value getting duplicated to both halves of the 32-bit register).

Co-authored-by: Drashna Jaelre <drashna@live.com>
@yiancar
Copy link
Contributor Author

yiancar commented May 9, 2022

Good shout @sigprof. I can condition this no problem.
How about doing what was done for F1XX SPI like here:
https://github.com/qmk/qmk_firmware/blob/88fe5c16a5cdca5e3cf13ef3cd91f5f1e4898c37/platforms/chibios/chibios_config.h
Define DMAv2 in case of F401/F411. Then if DMAv2 is defined, in the WS2812 driver use 32bit transfers.

Any other suggestions are welcome:)

@yiancar
Copy link
Contributor Author

yiancar commented May 9, 2022

I did a quick skim of the F401 datasheet. As we are using double buffer mode it seems to be OK. Please look at 9.3.10 of https://www.st.com/resource/en/reference_manual/rm0368-stm32f401xbc-and-stm32f401xde-advanced-armbased-32bit-mcus-stmicroelectronics.pdf

@sigprof
Copy link
Contributor

sigprof commented May 10, 2022

I did a quick skim of the F401 datasheet. As we are using double buffer mode it seems to be OK. Please look at 9.3.10 of https://www.st.com/resource/en/reference_manual/rm0368-stm32f401xbc-and-stm32f401xde-advanced-armbased-32bit-mcus-stmicroelectronics.pdf

No, see this table — with memory width 8 and peripheral width 32 the controller will do 4 1-byte memory transfers, assemble a 32-bit word from those 4 bytes and perform a single 32-bit peripheral transfer

_20220510_103937

@sigprof
Copy link
Contributor

sigprof commented May 10, 2022

For comparison, https://www.st.com/resource/en/reference_manual/rm0091-stm32f0x1stm32f0x2stm32f0x8-advanced-armbased-32bit-mcus-stmicroelectronics.pdf has this:

_20220510_104818

Here the DMA controller zero-extends the data in case of width mismatch, which would be the desired behavior for the WS2812 driver. Unfortunately, the F401 DMA controller does not have an equivalent mode.

@yiancar
Copy link
Contributor Author

yiancar commented May 10, 2022

Cheers thanks for the clarification:)
I also remembered that this driver supports all timers.. but not all timers are 32 bit:)
So plan of action

  1. Use Chibios define STM32_TIMx_IS_32BITS to figure out what the peripheral is like. For MCUs with zero-extend do make the necessary transfer (byte to half word/word)
  2. Use the defines STM32F2XX STM32F4XX STM32F7XX to check if the DMA controller cannot do the zero extend and do so in software, either to a half word or a word. In that case use either a half word or a word buffer depending on the timer used above. I realize we cant intercept the DMA transfer and do the conversion. Also F2/F4/F7 come with at least 64Kb of RAM so it should not be an issue. (quick math for a TKL. Around 10Kb when using words, around 5Kb when using halfword and about 2.5Kb when using bytes. Well depending how everything is packed that is:)

How does that sound?

@yiancar
Copy link
Contributor Author

yiancar commented May 10, 2022

@sigprof let me know what you think of it now:)

@drashna drashna requested review from sigprof and a team May 11, 2022 23:19
Copy link
Contributor

@sigprof sigprof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code with my changes seems to work on STM32F401; tested pin B15 with TIM1 (with 16-bit optimization enabled) and pin A2 with TIM5 (in this case using a 16-bit data buffer fails, 32-bit works as before). Still need to test on some other chips.

platforms/chibios/drivers/ws2812_pwm.c Outdated Show resolved Hide resolved
platforms/chibios/drivers/ws2812_pwm.c Outdated Show resolved Hide resolved
yiancar and others added 2 commits May 15, 2022 20:13
Co-authored-by: Sergey Vlasov <sigprof@gmail.com>
@zvecr zvecr changed the base branch from master to develop May 15, 2022 21:11
@drashna drashna requested a review from a team July 28, 2022 16:15
@yiancar
Copy link
Contributor Author

yiancar commented Aug 5, 2022

Hello!
Any other changes you would like?

@tzarc tzarc requested review from sigprof, drashna and a team August 13, 2022 14:07
@drashna
Copy link
Member

drashna commented Aug 14, 2022

Sorry, has a merge conflict

Copy link
Member

@drashna drashna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can confirm it doesn't break anything

@drashna drashna merged commit dfc92d8 into qmk:develop Aug 14, 2022
nolanseaton pushed a commit to nolanseaton/qmk_firmware that referenced this pull request Jan 23, 2023
Co-authored-by: Drashna Jaelre <drashna@live.com>
Co-authored-by: Sergey Vlasov <sigprof@gmail.com>
Co-authored-by: yiancar <yiancar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants