-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix random crashing of ClientContext::write(Stream) and write_P(PGM_P buf, size_t size) (#2504) #4530
Fix random crashing of ClientContext::write(Stream) and write_P(PGM_P buf, size_t size) (#2504) #4530
Conversation
… buf, size_t size) (esp8266#2504)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good !
I understand the explanation and the code changes, but I haven't looked at the callng code, so I'm missing a bit of context. In any case, this looks sane to me. My one question: when they get out of sync, and before calling get_buffer(), which gets them in sync again, what happens with available()? is the current code for that correct? |
You mean DataSource::available()? The behaviour of that didn't change. get_buffer does not alter the available value (which is _size - _pos) because the _pos is only incremented in DataSource::release_buffer(). release_buffer basically put them in sync again. Lets assume get_buffer is called multiple times without calling release_buffer (can happend in ClientContext::_write_some if tcp_write returns error): Note: chunk_size is min(max chunk size,free tcp buffer, data_source available). data_source available does not change if release_buffer is not called. max chunk size is fixed. Only free tcp buffer changes.
However, I just realized, that the assert(_pos == _streamPos) in release_buffer is not optimal. |
…ix/clientcontext_write
…zmaki/Arduino into bugfix/clientcontext_write
I added code to allow partial release of the buffer. Although this is not used right now, it makes the code more future proof IMHO. This can be useful if a buffer of certain size X is requested but for some reason, less bytes are used (e.g. by tcp_write). Than release buffer can be called with less than X. Unused data already read from the stream is saved and returned by the next get_buffer call. |
I dug further into the issue of tcp_write(...) returning ERR_MEM. It seems that I'm getting this issue if I'm running out of heap memory. |
@mongozmaki I once again reviewed your PR and it seems fine to me.
Which bug is it, the same one that made you write this fixing PR or another one ? |
I try to explain it a bit better. This PR is a fix to the problem if Now I come to the issue why I found this bug in the first place (which might or might not be another bug): However, the question still remains, why This happend especially, when compiling with "lwip2-Higher Bandwidth" option. I cannot reproduce this other issue relably for now. All I can guess is, that tcp_write returns ERR_MEM more often, if free heap is low (which is amplified by "lwip2-Higher Bandwidth" option because of larger buffers in mem). I will further investigate and hopefully reproduce this second issue in a sample code. The debug message to check out is the line |
At the end this second issue could also just be an out-of-memory problem. |
I've been peripherally following this and it looks like a good fix, thanks @mongozmaki . If you're getting ERR_MEM it could be that, even though you have enough total free space in the heap the UMM allocator can't find a large enough contiguous block to satisfy the request. Fragmentation, which can hurt on a system with no MMU and a very small shared heap used by everything. So if ERR_MEM is returned only when TCP is doing a malloc()/realloc()/calloc() that returns NULL, I'm not sure it's a bug so much as a "feature." The upper wrapper and app layers would need to handle it appropriately. |
Ok great! |
The explanation for lwIP's ERR_MEM: link. As this name ERR_MEM is misleading (totally unrelated with low HEAP), maybe you can add the explanation in your PR for further reference ? ERR_MEM is to lwIP what is libc's errno==EAGAIN in O_NONBLOCK mode. Whatever error lwIP's tcp_write() is returning (fatal or ERR_MEM/EAGAIN), |
@mongozmaki |
…zmaki/Arduino into bugfix/clientcontext_write
…ix/clientcontext_write
@d-a-v |
@d-a-v I created a test code for the PR bugfix. It provokes ERR_MEM errors and causes the ESP to crash or omit some data. It basically streams a large HTML file which - after loading - checks itself for completeness. Start the code and open the ESP IP address in a browser. After some loading (512kB) it should display For better debuging of tcp_write errors, you can add a debug message in ClientContext::_write_some if an error occurs:
After applying my PR, the HTML page should always report complete. The ESP might crash eventually because it is running out of memory. |
|
@mongozmaki Sorry for the misunderstanding. I was not meaning you add more comments to your PR, which already looks fine. About OOM, the message you get Let us some more time to check again as it is an update to an important part of the WiFi layer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the code and it looks good. Only potential issue might be checking for new[] error when making a larger buffer (since no exceptions on ESP8266), but honestly having it crash sooner, here, may be easier to debug than passing up a NULL and waiting for the main app to use it w/o checking.
… buf, size_t size) (esp8266#2504) (esp8266#4530) * Fix random crashing of ClientContext::write(Stream) and write_P(PGM_P buf, size_t size) (esp8266#2504) * - Allow partial buffer release * - Refined comments (cherry picked from commit 3267443)
I experienced random crashes when sending large progmem files with
ESP8266Webserver.send_P
over the ESP8266 WiFi Access Point.Affects any call of
WiFiClient::write(Stream)
andWiFiClient::write_P(...)
if underlyingtcp_write
returns error (in my case ERR_MEM ("come back later")).Might fix issue #2504 and maybe others.
The bug originates in
BufferedStreamDataSource
(DataSource.h
) (used inClientContext::_write_some()
).In
ClientContext::_write_some
:If
tcp_write(...)
returns error (e.g. ERR_MEM) (File:ClientContext.h:446
), thanDataSource::release_buffer
is not called (which is fine).However, if a
BufferedStreamDataSource
is used, the stream data was already read from the stream byDataSource::get_buffer(next_chunk)
(File:ClientContext.h:441
). Next timeDataSource::get_buffer
is called, data is read from the stream again (wrong data, because stream advanced already).So from this point on, the stream reports less bytes left than the DataSource.
That leads to the assertion
assert(cb == size)
inBufferedStreamDataSource::get_buffer
to fail eventually.The solution is to remember the stream position and recognise if
DataSource::get_buffer
gets called multiple times without the corresponding release.If the stream data was already read earlier, data isn’t read from the stream again.
To reproduce, edit
ClientContext::_write_some