-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STM targets are extremely inefficient at toggling GPIOs between input and output #3299
Comments
@Sissors : thanks for the detailed description. We will have a look at it and see if any enhancement to the current implementation can be done. Please also feel free to propose enhancements and we would review and organize the non regression tests for the complete list of supported boards when needed. |
As I know, you don't have to toggle between input and output in this case. Just configure it as output with open drain, you can read the input data register and it will reflect the state of the pin.
|
Those are very good points, however I do have some counter points:
|
shouldn't mbed provide a "mux" function which does all the turning on of power clock and multiplexing whatnot? Also why is the function which is doing something as trivial as changing one bit (at least it should be that trivial) in its own translation unit? This stops the compiler from inlining the function (at least without LTO). Also why are you asserting on the state of input parameters which are arguably always known at compile time? Why no static checking, would be much more safe and efficient. This kind of thing really should be using a split header strategy rather than a split backed. Plus isn't there a race condition here what if two threads configure two different pins and one interrupts the other. oh and here too Anyway I see at least most of the fault at ARM for trying to oversimplify things and cutting corners in API design. See my full rant here http://odinthenerd.blogspot.de/2016/12/rant-on-mbed-design-decisions.html |
There is protection at higher levels: https://github.com/ARMmbed/mbed-os/blob/master/drivers/DigitalInOut.h#L80 (I do wonder how much overhead that gives). While I can see your point, and agree on at least some of them, I do seriously wonder if you can keep the API accessible if you would make everything 100% safe and efficient for every possible MCU architecture. If you are going to make an automotive grade embedded system, then blindly using mbed API is a bad idea. But not every system is that critical. Improvements I am all for, and also sometimes the API does need an overhaul, even if it breaks older things. But its primary goal should be that it is accessible and platform independent, otherwise there is no reason to use the API at all. |
Ok I stand corrected, looks like that is safe if you always use the critical section. Is there documentation keeping the user from calling the wrong function? And oh my god the overhead. This means that although 95% of all cores can change direction in atomic manor the vendors have no way of actually taking advantage of that since the OS will lock any way? Direction should really should be a bit banded / dedicated register write, three or four instructions. This is like 2 orders of magnitude less efficient, which was your initial point, so I guess we agree on that ;) To my point on APIs: you have all the CMSIS SVD data, you know what bit fields there are and what operations on them are ATOMIC or not. Why not create a TMP DSL to express register interactions, allow the vendor to translate the desired IO action to concrete SFR actions expressing themselves in said DSL and then use TMP to decide at compile time if what the vendor did was atomic or not and lock accordingly if needed. You could also move almost all your asserting to compile time and have guaranteed efficiency and safety. |
Picking up the conversation about API improvements: wat do you think about allowing vendor code to add an extra parameter to their functions in order to declare them atomic and then use tag dispatch to either lock or not depending on what the the vendor code specifies, something like this could be done without SFINAE or templates:
On compilers: are you guys moving to require support for at lest C++98 templates? I think a lazy evaluated and compile time optimizing io lib could be written in C++98 template meta programming. If you were to require C++11 we could just use kvasir.io as the foundation but as I understand it that is not coming for a while. |
@odinthenerd , you mean like the FastIO libs? They are more efficient indeed, although some cases (like here toggling between input and output) can be alot better even in the current setup. Personally I dislike the requirement to use the driver libs of manufacturers. If it works, fine, but don't use workarounds just to be able to use those driver libs. Now back to my 'problem', since yet again someone had issues with it, I decided to add specifically for STM devices OpenDrain mode (why not make it default? Because PushPull outputs allow larger distances). So coding a bit, trying it out, first it didn't work, on K22F it did work, but eventually it also worked on the STM. This is mainly because apparently the DS1820 has a higher drive strength than my F030 (at least pulling it down while the F030 is pulling it up): There is one small problem with using OpenDrain mode on STM devices, it isn't implemented... Here is the relevant code: https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_STM/TARGET_STM32F4/pinmap.c#L164, it only sets pull resistors, it never enables OpenDrain mode. So test program where open drain output is set at '1', results in 26mA current if I pull it low externally through my multimeter. If I manually set the registers correct it is 0. Btw gpio->OTYPER |= (uint32_t)(1 << pin_index); is the required function to enable OpenDrain mode. |
Looks promising.
C++98 is the default one currently.
Here is the relevant code: https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_STM/TARGET_STM32F4/pinmap.c#L164, it only sets pull resistors, it never enables OpenDrain mode. So test program where open drain output is set at '1', results in 26mA current if I pull it low externally through my multimeter. If I manually set the registers correct it is 0. |
@0xc0170 I know - this is in my to do list since a while. Should be one of my next tasks |
hello @Sissors @odinthenerd |
The FastIO bench results with this branch. F410RB Starting test bench Measuring fixed write pattern speed Measuring variable write pattern speed Measuring read speed Measuring toggling using operators speed Measuring switching between input and output L030R8 Starting test bench Measuring fixed write pattern speed Measuring variable write pattern speed Measuring read speed Measuring toggling using operators speed Measuring switching between input and output |
At least on first glance to me it looks good. Any idea why switching between input and output, while having a very nice improvement in speed, isn't faster yet? Looking at the code I don't see anything weird really. Although of course FastIO simply has the advantage of being a template class what you can never get from the (current) mbed API. Generally I only update the table when I got a new device, but I'll make an exception for you and update it once this is merged in the main repository, and removing the footnote under the table. Edit: Btw STMs continuous support of their code base is at least by me really appreciated. |
From what I quickly tested, there are 2 major root cause for this. The MBED API not being a template class indeed, and the critical sections protection in input()/output() of the class.
Sure I undertsand. And do you retest all devices when introducing a new device ?
Thanks ! |
I have not had time to make a detailed proposal of how to eliminate the need for the critical section and looking at my schedule I will not have time in the near future either but if someone else wants to take up the cause here is what needs to be done:
This is all theoretical implementation from my head so expect typeos ;) If anything is unclear feel free to contact me (holmes.odin@gmail.com or @odinthenerd on here or twitter) |
Yes, it seems to be much improved now :) |
Thank you for your answer ! @Sissors |
I did promise to do that :). It is updated now. I verified that indeed it now meets what @LMESTM posted: #3665 (comment), so the table now reflects this also. A 4-5x increase in input/output switching is nothing to sneeze at :). Although there is still plenty of reason to use FastIO if more is required ;). On a completely unrelated note: Who broke the automatic clock setup code? My F401 board does not get a clock from the ST-Link module. No idea why, I probably broke something. But with old mbed it immediatly starts up on its internal oscillator. (Which I don't know if you are still only ones who implemented automatic clock selection during runtime, but for sure you were the first ones, and it is really nice). With new mbed it takes roughly 10 seconds before it finally decides to use its internal oscillator. So it still works, but someone put the timeout value higher again :). |
Note: This is just a template, so feel free to use/remove the unnecessary things
Description
This is related to for example: https://developer.mbed.org/questions/69344/Detection-problem-of-a-DS1820-on-a-NUCLE/, especially the slower STM ones can't handle a OneWire sensor using the mbed code. I just spend some time modifying the mbed code, after which it didn't work, I spend alot of time walking through everything, and in the end ended up with the same code that now did work (so I screwed something up in the first try). Conclusion: If I make the toggling between input and output a normal speed, they work properly.
In the past I already noticed that the mbed STM lib is simply horrible at switching to input/output, where my FastIO lib was 100 times faster at this than the mbed lib. So what is the situation? Like pretty much any other MCU, also STM targets have a few registers to set the direction of a pin. Switching the direction is either clearing or setting a single bit in a register, so: read a register, modify bit, write register back. Takes a few clock cycles, lets say with mbed overhead 10 clock cycles, which is fine.
Now look at the current code: https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_STM/TARGET_STM32F0/gpio_api.c#L71, pin_function is called. (NOTE: when this is optimized, do realise that the gpio_init function does not actually init the pin, this is currently done by this gpio_dir function, so the pin_function call should be moved to the gpio_init code!).
pin_function (https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_STM/TARGET_STM32F0/pinmap.c#L92) does a bunch of calculations regarding pin settings we don't care about, since we just wanted to switch between input and output.
Next step it enables the GPIO clock, every single time you switch your pin direction. Finally it calls HAL_GPIO_Init.
HAL_GPIO_Init (https://github.com/ARMmbed/mbed-os/blob/master/targets/TARGET_STM/TARGET_STM32F0/device/stm32f0xx_hal_gpio.c#L188) starts with some asserts, after which it has a while loop. Not a clue what is being looped exactly, but it seems a bit excessive for toggling a pin direction. In this loop we first start by setting the alternate function of the pin. Next on line 224-227 the actual direction of the pin is set. This would make us happy, but this isn't the end. The next step the output speed and things like that are being set. Followed by the interrupt mode (of course no interrupt is active, so the section is skipped, but yet another check).
In the end, this is all a bit excessive when the only requirement is a single bit that needs to be toggled, see for example: https://developer.mbed.org/users/Sissors/code/FastIO/file/45b32f07e790/Devices/FastIO_NUCLEO_F030.h (scroll down).
I would change it myself, but since STM seems to maintain their mbed libs fairly well (kudos for that), and since I only have a few devices myself, it seems to me to be more something for them to do :). Unless you want to send me every STM board you have ;).
The text was updated successfully, but these errors were encountered: