-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgm_read_byte() does not work well without delay() #3140
Comments
Could you please upload the elf file for the code which demonstrates the crash? That would really help. |
Elf file (1.8.0): http://www21.zippyshare.com/v/Vhm4iyHc/file.html (sorry could not find the option to attach files). |
OK, thanks. The compiler sees that offset in the array is constant; so it can calculate everything that happens in the pgm_read_byte macro at compile time. It sees that in this particular case, shifting and masking is not needed (because the byte you are loading happens to be 4-byte aligned) and it optimizes away all the code inside pgm_read_byte, replacing it with a single l8ui instruction. Which obviously doesn't work because the cache port supports only 32-bit access. I guess we can hard-code the load/shift/mask part inside an /cc @Makuna |
Ahhh, OK perfect! Could you also explain why the |
Would also decorating the code with a #pragma that stops optimizations work?
|
Ok also tried |
replace pgm_read_byte with...
I will do some testing tomorrow |
@Makuna, I think what you are trying to do is to trick the compiler while speaking C. I think this won't work for the following reason. What the compiler is doing is fully valid given the architectural model it has. The backend is not aware that there are memory regions from which byte loads are not allowed. The only way to express this (i think) is to write this part in assembly. Every C expression you write which can be evaluated at compile time can be simplified by the compiler to a byte load. |
@igrr Actually it would be a bug if the compiler disregards sizes if defined; the original was written that the size is ambiguous giving the compiler the option to do the wrong thing. I want to try one other option, which is to just insert a cast. These aren't tricks, they are considered hints. Most developers I know would consider reverting to assembler to get the compiler to do the right thing as the last option to try. |
It seems that GCC requires the volatile keyword to force the hints and data size is not enough. There are many references to this when you search online about GCC over optimizing in similar cases across other domains. |
Hi,
I have two questions:
1. Why are these pgm_* implemented as #defines instead of inline functions?
2. Why can't the volatile keyword be used in declaring the intermediate
variables?
Apologies if the questions are noobid, I'm currently on the road, otherwise
I'd take a closer look :p
…On Apr 22, 2017 3:21 PM, "Michael Miller" ***@***.***> wrote:
It seems that GCC requires the volatile keyword to force the hints and
data size is not enough. There are many references to this when you search
online about GCC over optimizing in similar cases across other domains.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3140 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQC6Bsu7UNhO-R4tgcYhIxfHwfmCws67ks5rykUagaJpZM4M-cum>
.
|
@devyte I don't understand the second question, as the change is to define one as volatile. could you be more specific? I don't remember the reason other than following the Arduino originals as close as possible (they were also macros). It was almost two years ago when I wrote those ;-) Do note, the inline prefix does not guarantee to make it inline, its just a hint to the compiler that they can be. |
Maybe I'm misunderstanding the intent of what is being discussed. Here's my
understanding of your discussion: you intend to change the macro content to
an assembler block, involving __asm__ __volatile__ or along those lines.
My question is: why can't we declare some intermediate variable using the
C/C++ volatile keyword? Something like:
volatile uint8_t mytemp = blah...;
I'm no expert on compilers, but I understand that this is a standard
keyword that directs the compiler to not assume anything, which translates
to "don't optimize here", and it should be portable across architectures.
On Apr 22, 2017 6:58 PM, "Michael Miller" <notifications@github.com> wrote:
@devyte <https://github.com/devyte> I don't understand the second question,
as the change is to define one as volatile. could you be more specific?
I don't remember the reason other than following the Arduino originals as
close as possible (they were also macros). It was almost two years ago when
I wrote those ;-)
Do note, the inline prefix does not guarantee to make it inline, its just a
hint to the compiler that they can be.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3140 (comment)>,
or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AQC6BszeZgQ08THEKD0L6fLlJVS6m6XSks5ryngTgaJpZM4M-cum>
.
|
Using an intermediate volatile variable would indeed work; performance hit would probably be comparable to the use of an inline assembly block. I don't mind either solution. Regarding "should be portable across architectures"... we are dealing with a very architecture specific thing (reading from flash as it was RAM). Even for an xtensa-based ESP32 we do a different thing. So I would place portability as the last item on the list of possible requirements. |
Did either of you note the reference to the pull request? |
I did :) Just for comparison, can you check what code is generated for two cases:
|
I guess using volatile would be best. That tells the compiler what to consider and still leave all the optimization up to it. Inline assembly would force you to to what the compiler might do better anyway and as i said writing good online assembly is hard. EDIT: I tried the following:
Same behavior as described above. |
Ok
Works as desired! |
Glad to see you confirmed it. I ran it through a series of sketch's that test it. I suspect we should leave this open until the pull request gets merged in. |
@Makuna poke regarding #3140 (comment) (i.e. it would be great if you attach the generated assembly code to the PR; so that we can compare with the hand-coded assembly approach). |
So the latest code I posted works but I guess it is not because of the volatile at that place. Since the following does not:
But turning
into
makes it work. Puhh I hope I am not annoying someone with my noobish posts... |
@igrr If you wish to give me instructions on how to get that, I can run it and provide it. I always forget the details and have only used it once almost two years ago. |
@milkpirate You are placing the volatile in the wrong location. The issue is that the compiler is optimization the statement |
@Makuna yeah I know. It dosent make sense to me either but thats how it works...
|
Similar behavior with
|
Hey. Just had a look at the release note of the espressif ESP8266_NONOS_SDK v2.1.0. It states:
Could that relate to the problems above? |
No, the problems above were fully addressed in this issue. They stem from the fact that our previous code for reading PROGMEM would in some cases compile to an 8- or 16-bit load instruction, which caused bus fault when reading from cache (which only supports 32-bit loads). Note that this is not an alignment issue — reads were aligned, just the read size was wrong. The issue in SDK release notes is about a bug in the NMI interrupt vector, which was related to unaligned memory access. |
Will keep this open for reference, until the release. |
It looks like, that with the current GITHUB-Version of ESP8266 core for Arduino I have an application based on https://github.com/markruys/arduino-Max72xxPanel, which itself makes use of Adafruits GFX-library (https://github.com/adafruit/Adafruit-GFX-Library). You can see the problems by compiling the Ticker-Example from the max72xx-library, after changing pinCS to 12. The sketch crashes inside the GFX-library, where letters from a font are send to the panel. The font itself is defined: static const unsigned char font[] PROGMEM = {
0x00, 0x00, 0x00, 0x00, 0x00,
0x3E, 0x5B, 0x4F, 0x5B, 0x3E,
//... The crash occurs in Adafruit_GFX::drawChar: uint8_t line = pgm_read_byte(&font[c * 5 + i]);
// c is unsigned char, i is int8_t I tried to cast the result of c*5+i to uint16_t or uint32_t, without success. The library worked with the previous implementation of pgm_read_byte() like a charm. |
This is a reduced example, which shows the crash: #include <SPI.h>
#include <Adafruit_GFX.h>
#include <Max72xxPanel.h> // https://github.com/markruys/arduino-Max72xxPanel.git
Max72xxPanel matrix = Max72xxPanel(12, 1, 1);
void setup() {
matrix.setIntensity(7); // Use a value between 0 and 15 for brightness
Serial.begin(115200);
}
void loop() {
Serial.println("Before drawChar CRASH!");
matrix.drawChar(0, 1, 65, HIGH, LOW, 1); // <<< === Crash
Serial.println("After drawChar, Hooray!");
delay(10000); // for test
}
/* Those are the parts of the library, where pgm_read_byte is crashing!
* The old variant without assembly works fine!
void Adafruit_GFX::drawChar(int16_t x, int16_t y, unsigned char c, uint16_t color, uint16_t bg, uint8_t size) {
...
for(int8_t i=0; i<5; i++ ) { // Char bitmap = 5 columns
uint8_t line = pgm_read_byte(&font[c * 5 + i]);
}
...
}
static const unsigned char font[] PROGMEM = {
0x00, 0x00, 0x00, 0x00, 0x00,
0x3E, 0x5B, 0x4F, 0x5B, 0x3E,
0x3E, 0x6B, 0x4F, 0x6B, 0x3E,
...
0x00, 0x00, 0x00, 0x00, 0x00 // #255 NBSP
};
*/ |
Identified the problem: That kills different libraries, which are assuming, that those two items are // Many (but maybe not all) non-AVR board installs define macros
// for compatibility with existing PROGMEM-reading AVR code.
// Do our own checks and defines here for good measure...
#ifndef pgm_read_byte
#define pgm_read_byte(addr) (*(const unsigned char *)(addr))
#endif
#ifndef pgm_read_word
#define pgm_read_word(addr) (*(const unsigned short *)(addr))
#endif Its now possible, to change those third party libraries with some |
Created a pull request, fixing this issue |
Inline functions is better style over #define macros for several reasons.
There has to be a better way to maintain compatibility than to convert back
to macros.
…On Jun 1, 2017 6:02 AM, "wkraft-fablabka" ***@***.***> wrote:
Created a pull request, fixing this issue
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3140 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQC6BuySlIDP0505HTSMRB3rUU3SYtv0ks5r_oxNgaJpZM4M-cum>
.
|
what about a
with
That way the define would exist and would end up in an inlined function (if the inline function is considered better). |
Agree with @d-a-v suggestion except the 'always inline' part. That's a considerable chunk of code, I think we should let the compiler decide whether to inline it or not in each particular case. The other thing is that the variant proposed in the PR would fail to do truncation to target type if one writes |
I agree with @igrr. |
agree with those arguments! One more question to raise:
and for
I don't know, if that is important. |
Since the immediate is equal to zero, l32i.n can be used. It could have been written as |
Thanks, good to know. That was just one idea, when I searched for the reasons, why my application is now suddenly crashing. |
pgm_read_byte() and pgm_read_word() need to be macros, as some third party libraries and sketches testing macro existence.
pgm_read_byte() and pgm_read_word() need to be macros, as some third party libraries and sketches testing macro existence.
I have the following code:
which results in:
But swapping
Serial.println(data, HEX);
anddelay(250);
then works. Why?EDIT:
Shorter delays make it work as well even a "not existing" one:
The text was updated successfully, but these errors were encountered: