Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enduser_setup "Handling POST" failure on 310faf7 and dev #2931

Closed
KT819GM opened this issue Oct 1, 2019 · 49 comments
Closed

enduser_setup "Handling POST" failure on 310faf7 and dev #2931

KT819GM opened this issue Oct 1, 2019 · 49 comments

Comments

@KT819GM
Copy link

KT819GM commented Oct 1, 2019

Expected behavior

Calling enduser_setup.start(), connecting trough wifi to captive portal at 192.168.4.1, scanning for AP, entering correct password will connect you to that AP.

Actual behavior

You are returned to empty '192.168.4.1/setwifi'
some debug code on failure:

247: 	enduser_setup_check_station
270: 	status=0,chan=13
1469: 	enduser_setup_http_connectcb
1326: 	enduser_setup_http_recvcb
1043: 	enduser_setup_serve_status_as_json
870: 	enduser_setup_http_serve_header
380: 	deferred_close
360: 	handle_remote_close
247: 	enduser_setup_check_station
270: 	status=0,chan=13
1469: 	enduser_setup_http_connectcb
1326: 	enduser_setup_http_recvcb
1123: 	Handling POST
811: 	enduser_setup_http_handle_credentials
820: 	Password or SSID string not found
870: 	enduser_setup_http_serve_header
380: 	deferred_close
360: 	handle_remote_close
360: 	handle_remote_close

Test code

After enabling

#define LUA_USE_MODULES_ENDUSER_SETUP

fire

enduser_setup.start()

NodeMCU version

commit 043046d (HEAD -> dev, origin/dev)

Hardware

Generic ESP-07S

@HHHartmann
Copy link
Member

Looking at the log

820: 	Password or SSID string not found

says it all.

To further investigate I would need to know what browser and operating system you are using?
Can you also open the developer tools of the browser and add the (complete) header information and body of the POST request? (or record the POST request in any other way)
I have an idea what it could be but need more information.

@KT819GM
Copy link
Author

KT819GM commented Oct 2, 2019

Thank you for fast answer. Now the fun part:

Mac safari:

On the post request body is empty:

<body></body>

header

Summary
URL: http://192.168.4.1/setwifi
Status: 400 Bad request
Source: Network
Address: 192.168.4.1:80

Request
POST /setwifi HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Content-Type: application/x-www-form-urlencoded
Origin: http://192.168.4.1
Content-Length: 54
Accept-Language: en-us
Upgrade-Insecure-Requests: 1
Host: 192.168.4.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15
Referer: http://192.168.4.1/
Accept-Encoding: gzip, deflate
Connection: keep-alive

Response
HTTP/1.1 400 Bad request
Connection: close
Content-Length: 0

Firefox on ubuntu:

request

Host: 192.168.4.1
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 54
Connection: keep-alive
Referer: http://192.168.4.1/
Upgrade-Insecure-Requests: 1

response

HTTP/1.1 302 Moved
Location: /?trying=true
Content-Length: 0
Connection: close

As you now see, on Firefox all worked. As I understand, safari did not understood request and sadly it's same on macos and ios (ios is what important to me here). I'm willing to provide more info or to test what is needed further in range of my limited programming knowledge.

Thank you

@HHHartmann
Copy link
Member

I assume that on Firefox on Ubuntu log is shortened by the body, but it was there?

How did you capture the data? using the dump facility in the module

#define ENDUSER_SETUP_DEBUG_SHOW_HTTP_REQUEST 1

If so, we probably need to implement the missing fragmentation part of #2847.
I will test this weekend, as then I might have a mac available. #marcelstoer Marcel can you maybe have a look? you have a mac at hand as I recall.

@KT819GM
Copy link
Author

KT819GM commented Oct 2, 2019

edit: on firefox body was there, on safari it's empty.

This is log from ESP itself on failed attempt (macOS, safari) using:

#define ENDUSER_SETUP_DEBUG_ENABLE 1
#define ENDUSER_SETUP_DEBUG_SHOW_HTTP_REQUEST 1

I've stripped keepalive queries and removed DNS queries from the log:

90: 	close_once_sent
380: 	deferred_close
360: 	handle_remote_close
1469: 	enduser_setup_http_connectcb
1326: 	enduser_setup_http_recvcb
1358: 	GET /status.json?n=0.6682206927015609 HTTP/1.1
Host: 192.168.4.1
Connection: keep-alive
Accept: application/json
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15
Accept-Language: en-us
Referer: http://192.168.4.1/
Accept-Encoding: gzip, deflate


1043: 	enduser_setup_serve_status_as_json
870: 	enduser_setup_http_serve_header
380: 	deferred_close
360: 	handle_remote_close
247: 	enduser_setup_check_station
270: 	status=0,chan=13
1469: 	enduser_setup_http_connectcb
1326: 	enduser_setup_http_recvcb
1358: 	GET /status.json?n=0.5953497358996975 HTTP/1.1
Host: 192.168.4.1
Connection: keep-alive
Accept: application/json
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15
Accept-Language: en-us
Referer: http://192.168.4.1/
Accept-Encoding: gzip, deflate

1043: 	enduser_setup_serve_status_as_json
870: 	enduser_setup_http_serve_header
380: 	deferred_close
360: 	handle_remote_close
247: 	enduser_setup_check_station
270: 	status=0,chan=13
1469: 	enduser_setup_http_connectcb
1326: 	enduser_setup_http_recvcb
1358: 	POST /setwifi HTTP/1.1
Host: 192.168.4.1
Origin: http://192.168.4.1
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15
Referer: http://192.168.4.1/
Content-Length: 49
Accept-Language: en-us


1123: 	Handling POST
811: 	enduser_setup_http_handle_credentials
820: 	Password or SSID string not found
870: 	enduser_setup_http_serve_header
380: 	deferred_close
360: 	handle_remote_close
360: 	handle_remote_close
247: 	enduser_setup_check_station

This is same attempt, just from safari web inspector:

Summary
URL: http://192.168.4.1/setwifi
Status: 400 Bad request
Source: Network
Address: 192.168.4.1:80

Request
POST /setwifi HTTP/1.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Content-Type: application/x-www-form-urlencoded
Origin: http://192.168.4.1
Content-Length: 49
Accept-Language: en-us
Upgrade-Insecure-Requests: 1
Host: 192.168.4.1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.1 Safari/605.1.15
Referer: http://192.168.4.1/
Accept-Encoding: gzip, deflate
Connection: keep-alive

Response
HTTP/1.1 400 Bad request
Connection: close
Content-Length: 0

Request Data
MIME Type: application/x-www-form-urlencoded
aplist: ssid
wifi_ssid: ssid
wifi_password: **********

I will be on trip for next 3 days, so I will have just mac and iphone with me, esp8266 also :) will install firefox also and will try to make logs as soon as I be stationary if they will be needed.

Thank you

@KT819GM
Copy link
Author

KT819GM commented Oct 13, 2019

Had my trip longer than expected. Anyways, having half of Sunday spent on this issue (trying to get usable debug info) I can state, that I will not be able to provide needed info to solve this problem. On mac/ios it seems AppleWebKit is doing something nasty as no matter what browser I try to use (firefox/chrome) it will always fail on empty setwifi as no credentials were provided for nodemcu. Thought - screw you apple, will check other browsers and it seems it's very unstable at the moment. Last test on windows / chrome have showed behaviour like:

  • it will not always load default html.
  • it will find ssid list but will not refresh webpage leaving it stuck on "searching for networks".

Also module will force software restart if will be launched when wifi credentials already available on flash:

1552: 	enduser_setup_ap_start
1581: 	SSID: test_unit_5763B3, CHAN: 5
219: 	enduser_setup_check_station_start
247: 	enduser_setup_check_station
302: 	AP_CHAN: 5, STA_CHAN: 12
312: 	Turning off Station due to different channel than AP
1552: 	enduser_setup_ap_start
1581: 	SSID: test_unit_5763B3, CHAN: 5
231: 	enduser_setup_check_station_stop
1952: 	enduser_setup_stop
1544: 	enduser_setup_ap_stop
206: 	enduser_setup_connected_callback
PANIC: unprotected error in call to Lua API (stdin:3: attempt to concatenate a nil value)

I would happily provide any needed info if exact steps will be asked for this issue to be solved. At the moment rolling back to 2.x.
Thank you

@mccahan
Copy link

mccahan commented Dec 2, 2019

Just to add some extra info here, I was able to reproduce this issue consistently under macOS Safari. I used Safari dev tools to copy the request as cURL and ran the same request from the command line and it functioned correctly.

I then ran both steps above while recording with Wireshark and the big difference between the two requests is that the Safari request comes in two TCP Segments:

image

The HTTP content from within the first segment matches the HTTP request output by ENDUSER_SETUP_DEBUG_SHOW_HTTP_REQUEST, so it doesn't reassemble the segments before attempting to process. As a result, it fails the content length check and wouldn't be able to find the SSID/password strings in the request body.

I'm currently trying to work around the issue by uploading a enduser_setup.html that makes a GET call to /update but having trouble getting it to save the configuration, it seems.

@HHHartmann
Copy link
Member

Great you are drilling down on this bug.
Iirc the get request does not save the configuration as mentioned in the documentation.

@marcelstoer marcelstoer added the bug label Jan 2, 2020
@KT819GM
Copy link
Author

KT819GM commented Jan 14, 2020

@mccahan, maybe you got any progress on this?

@rvalle
Copy link

rvalle commented Mar 24, 2020

@marcelstoer If I understand the bug correctly the HTTP dialog is broken from safari browsers due to content fragmentation, is that right?
If that is the case there must be a magic header or accept directive that will make this browsers talk to us in the way we need, or what?
I will see if I can reproduce it today.

@rvalle
Copy link

rvalle commented Mar 24, 2020

@marcelstoer something that I find probably an issue is the Keep-Alive header.
Meaning that the browser wants to keep the TCP connection open in order to minimise overhead in following requires. That doesn't sound like something that we would want from an ESP device. Even more, it is possible that some event handling is taking place at the close of the TCP session. Anybody looked into this?
It should be possible to disable keep alive.

@rvalle
Copy link

rvalle commented Mar 24, 2020

I can reproduce with safari on both mac and iphone.
Woow, the HTTP conversation is more complex that I expected.
On mac Safari fails but Firefox works.
So I have installed wireshark to see the differences on the wire for the setwifi post.
they are almost identical, most headers are.
I can notice that Safari is adding the header "Origin" which Firefox does not.
For some reason the module accepts Firefox which redirects with a 302 and does not like Safari which it replies 400.
But I cannot see any chunking or anything like that. I would say that the module does not like something in the particular request from safari, but at the same time is 99% equal to Firefox one.
am I lost?

@rvalle
Copy link

rvalle commented Mar 24, 2020

ahhh @mccahan I can now see the two tcp segments.
Not sure what a TCP "segment" is and who does not like it.
I presume you followed the code to stablish that this segments mess up the content length.

@rvalle
Copy link

rvalle commented Mar 24, 2020

TCP is TCP. If Safari is using it conforming the standard we should point the finger to the IP stack that we are using.

It seems to be lwIP, and I had a quick look and seems to be highly tunable.

But then there is also this issue open: #3040 which points that the IP stack is out of date.

Perhaps we should fix the problem by the root instead of working around it. However I have no idea about the difficulty that it implies. Anybody looking at #3040 ?

@marcelstoer
Copy link
Member

I wonder how the game might change if we could motivate Safari to use HTTP 1.0 instead of 1.1.

@mccahan
Copy link

mccahan commented Mar 24, 2020

Yeah, when I looked into it I saw that it stopped reading after the first packet and parsed that, so for browsers sending longer headers that exceeded the payload of a single TCP packet (Safari here, but potentially others) it never checks the buffer again to get the rest of the request.

It's not a case of forcing Safari to "play nice" somehow, or at least not in any way that I can think of; the ideal fix is in having that part of the code actually read in the entire request. Alternatively, my workaround was to modify the code so that it would accept the credentials as GET parameters, since the URL and query string come in as the first part of the HTTP request, and therefore should pretty much always be in the first packet. Unfortunately I'm not comfortable enough in C or this library to apply the proper fix, and I don't think I have the source around here of my workaround using a GET request.

@nwf
Copy link
Member

nwf commented Mar 25, 2020

I have not looked, so this is purely speculation, but: I suspect that this is not a bug in LwIP TCP handling, but rather a bug in enduser_setup's use of the network connection APIs: as read/recv and friends will complete upon receipt of packet, it's very easy to (accidentally) ignore the need to buffer things, because it will very often be that one's entire payload arrives in a single TCP packet. There's a fairly complex state machine in mqtt thanks to @stromnet that could serve as an example.

However, this could be a good motivation to replace our enduser_setup C implementation with one in Lua (like my #2819 effort with sntp) and use @TerryE's lovely pipe module to handle buffering. We do have most of a Lua http server already implemented and committed in-tree, but someone will have to give it some TLC and, more importantly, testing.

@marcelstoer
Copy link
Member

It's not a case of forcing Safari to "play nice" somehow

👍 It would simply be a workaround.

I have not looked, so this is purely speculation, but: I suspect that this is not a bug in LwIP TCP handling, but rather a bug in enduser_setup's use of the network connection APIs

I believe you're correct here.

@rvalle
Copy link

rvalle commented Mar 25, 2020

@nwf @marcelstoer

I have reviewed the http conversations of both firefox and safari and the HTTP layer of the protocol is identical, I belive @mccahan arrived to the same conclusion.

I also noticed just like @mccahan that the 2 segments at a lower TCP layer could be the cause.

Then I went to the LwIP bug database and found similar reports of Application layers failing due to improper handling of multiple TCP segments.

Such as this one:
MQTT broken with muti-segments data

That is when I considered the suspicion of LwIP solid, obviously if this is a malfunction of the IP stack we are not going to be the only ones affected.

I think we can try to make Safari play nice as a workaround, but we also need to asses the implications of keeping #3040 around for long. It that is the source of trouble and this kind of workaround is required at many other places... or not.. etc.

however:

@mccahan, I have not looked into the code yet, but I am not sure what you mean by TCP packet as you know TCP is a stream protocol, one have to be able to "read" information from TCP sockert until the other end closes the connection. As you know, it is common having to read multiple times and not to expect that one read is going to be sufficient for consuming the whole stream. Can you point me at the significant part of the code? sorry to be so lost here.

@rvalle
Copy link

rvalle commented Mar 25, 2020

@mccahan I found old documentation of the LwIP stack here:
https://www.nongnu.org/lwip/1_3_0/tcp_8c.html#f58be9006b4ddb720113f03d56bc6e52
I don't kown which version we have, but must be around there.

The tcp.receiv function sets a callback for receiving TCP information (async style) but does not take any parameter of how much information one is expecting... so buffereing is delegated to application.

I cannot see in the code anything like buffering to ensure the whole post is in a buffer before being processed. Maybe I lost something.

On top of that (both) browsers are sending the connection:keepalive on their post, so, I understand they won't close the tcp connection, something that would be handy for not having the identify when the post is finished.

Assuming that the whole post is coming on a single callback is not an option. Reading until the connection is closed and assembling could be if it is possible to make the browsers post with connection:close, which I dont think it is because HTTP1.1 sets keep-alive as default.

Then we would need an intermediate callback that does a bit of parsing to understand if the buffer is complete or assembling is needed. But that should be as simple as identifying 2 empty lines, one after the headers another after the body... or something like that.

In most cases the is complete for the rest a maximum buffer can be chosen and return x500 if the conversation is bigger.

@mccahan did you see the rest of the post arrive after the x500 was returned? sorry I don't have a debug build to play with yet.

@KT819GM
Copy link
Author

KT819GM commented Mar 25, 2020

Guys, one small thing - this bug is not present on FW prior 3.0. I didn't looked yet what has changed on transition from 2 to 3 regarding LwIP, TCP and so on, but possibly it could be bug derived from the transition? I think @TerryE could add his opinion about this?

@rvalle
Copy link

rvalle commented Mar 25, 2020

@KT819GM that is a good point, if it is still working could provide some more insights.

@KT819GM
Copy link
Author

KT819GM commented Mar 25, 2020

Yes, it's working 100%, literally, few moments ago I've attached 19 units to the network using iPhone and enduser_setup :)

@rvalle
Copy link

rvalle commented Mar 25, 2020

@KT819GM which version you use? latest 2.x?

@KT819GM
Copy link
Author

KT819GM commented Mar 25, 2020

2.2.0.0 build 2019 08 28 powered by Lua 5.1.4 on SDK 2.2.1(6ab97e9)

This one definitely works.

@rvalle
Copy link

rvalle commented Mar 25, 2020

@KT819GM... the POST handling was introduced later... according to git.

@HHHartmann
Copy link
Member

Maybe we could mitigate this by offering an alternative html page which uses the get route to save the credentials.

This would not store the credentials in the file and not allow additional data fields, but it should work for all browsers as the relevant data is right in the URL and not in the body.

@nwf
Copy link
Member

nwf commented Apr 10, 2020

Can we roll back to the version in 2.2.0.0 for the next drop to master, giving someone with motivation to fix this time thereafter?

@marcelstoer
Copy link
Member

Maybe we could mitigate this by offering an alternative html page which uses the get route to save the credentials.

👍 GET really isn't the correct method to modify data. However, I rather have something that works than something that's broken.

This would not store the credentials in the file and not allow additional data fields,

How are the two traits related? Why couldn't this be supported with GET as well?

but it should work for all browsers as the relevant data is right in the URL and not in the body

Can we rule out that browsers send multiple packets for a GET with many parameters?

@nwf
Copy link
Member

nwf commented Apr 10, 2020

I am suspicious that GET won't/doesn't really "work," either, for the same underlying cause. It's just more likely that the whole HTTP request leaves as one packet, but this isn't required. Still, as you say, better than certainly not working.

ETA: Right, the order is "read, comprehend, post". I see you've already asked the right question, @marcelstoer. :)

@mccahan
Copy link

mccahan commented Apr 10, 2020

I imagine for most users passing these credentials in the GET parameters should be fine - you have a significant number of characters before you cross the boundary into multiple segments, so for common cases of a passkey and SSID it should be fine. Safari's going to continue to pass requests that are too large and require multiple segments but that shouldn't affect any but the absolute most advanced use cases here.

It worked fine for me, and definitely an improvement over mobile Safari essentially not working at all with its long headers, assuming the underlying issue of not buffering the entire request isn't fixed in the short term.

@TerryE
Copy link
Collaborator

TerryE commented Apr 11, 2020

Can we roll back to the version in 2.2.0.0 for the next drop to master, giving someone with motivation to fix this time thereafter?

Not a good thing to do. We've made changes which use SDK 3.0 functionality. Trying to back them out is going to be a real PITA.

Or have I misread this and you are talking about just enduser_setup? In which case, I don't have a problem.

@HHHartmann
Copy link
Member

What I meant was to change the html or better offer an alternative html which uses the already existing route /update to push only ssid and password. From the docs:

/update GET Data submission target. Example: http://example.com/update?wifi_ssid=foobar&wifi_password=CorrectHorseBatteryStaple. Will redirect to / when complete. Note that will NOT update the eus_params.lua file i.e. it does NOT support sending arbitrary parameters. See also: /setwifi.

So I would supply an extra HTML file and a note about it in the documentation.

For arbitrary parameters a GET request is not the right thing as the URL might quickly get too long to guarantee transmission in one segment (and comply to standards)

Using GET in this way also does not comply to the REST standard. But we never claimed it to be a RESTful interface.

@HHHartmann
Copy link
Member

@KT819GM Just pushed PR #3069 with an htlm for you to try.

@KT819GM
Copy link
Author

KT819GM commented Apr 13, 2020

Tested, it works perfectly for me on both iPhone and Mac. Thank you!

@KT819GM KT819GM closed this as completed Apr 13, 2020
@HHHartmann
Copy link
Member

This is not fixed but only a workaround which first has to be completed and merged.

@HHHartmann HHHartmann reopened this Apr 13, 2020
@rvalle
Copy link

rvalle commented Apr 15, 2020

thanks for the workarround @HHHartmann will give it a try.

I have also read the limitations. I am not planing to use extra parameters, but:

Note that will NOT update the eus_params.lua file i.e. it does NOT support sending arbitrary parameters.

What does it mean by not updating the eus_params.lua file? isnt this file the deliverable of the end user setup process?

@HHHartmann
Copy link
Member

@rvalle The login information is stored in the ESP flash. Just like calling wifi.sta.config(station_config) with the save option.

@uDude
Copy link

uDude commented Sep 1, 2020

First time I've looked at this issue. Using the GET work around exposes credentials even on https as URLs are not encrypted.
About fragments and fragmentation... TCP is as we know built upon IP. IP datagrams on a network path are definitely fragmented if they exceed the path mtu. Packets may also be arbitrarily fragmented. Back in the day the first 'firewalls' were just packet filters. So to attempt telnet root logins you would intentionally fragment the word root into two or more packets, say 4 packets 't', 'o', 'o', 'r' sent in the reverse order shown, but on the TCP layer apply the sequence numbers for the correct order. So you bypass the packet filter firewall looking for 'root' in a packet and the destination host sees root since telnet is on the TCP layer and the network stack reassembles the packets ... there is more to it as sequencing can be handled at both layers.

What does this have to do with this problem? The IP header contains a flag MF (more-fragments). The sending device sets that flag in all fragments except the last one. On a desktop/server class machine they will buffer and perform reassembly or drop the packet. On an embedded device the challenges are greater. How much buffering should occur per connection? How do you avoid using all the heap (remember we do not have a nic with a ton of ram and dma). I can easily send a set of datagrams in fragments large enough to exhaust the heap.)
I need to look at lwip code as they will have algorithms to determine when to decide things like when to drop remaining fragments, or mayb just shove those frags up to the firmware layer to handle, or drop the whole mess, etc.
It is hopeful that we can tweak what happens during fragmentation a little. Though we may be touching 576...

I need to look today as so far this ia a rare good day for my dementia.

@uDude
Copy link

uDude commented Sep 1, 2020

Glancing at lwip/ip_frag.h i see

#ifndef IP_REASSEMBLY
#define IP_REASSEMBLY 0
#endif

We need to check what is defined at compile time. Note that reassembly has implications that the primaries on the project would have to decide. BSD (apple based on it) does do a little weird fragmenting.

@pjsg
Copy link
Member

pjsg commented Sep 1, 2020 via email

@uDude
Copy link

uDude commented Sep 1, 2020

It is definitely bsd fragmenting the post packet and lwip does not appear to be configured to handle the fragmentation...

a quick test would be for someone to build with DDEFINES setting IP_REASSEMBLY 1. Pbufs and setting the pbuf timeout would b wise, too.

@uDude
Copy link

uDude commented Sep 1, 2020

@pjsg the user seeing a stream just means the network stack reassmbles ip fragments and moving up the tcp layer ensures the tcp Packet order then hands it off to the application layer as a stream. If the ip or tcp stack code fails, broken stream.

@pjsg
Copy link
Member

pjsg commented Sep 1, 2020 via email

@HHHartmann
Copy link
Member

There is a post by @mccahan above showing a screenshot of a wireshark capture. Maybe he can help.
Above that post there are also logs, showing the first packet that is received and leads to the failure.
Please also note that on windows using Fiddler to send the request, the same issue can be provoked.

@nwf
Copy link
Member

nwf commented Sep 6, 2020

The underlying cause, is, I believe well understood: enduser_setup does not use the network APIs correctly:

as read/recv and friends will complete upon receipt of packet, it's very easy to (accidentally) ignore the need to buffer things, because it will very often be that one's entire payload arrives in a single TCP packet.

As shown in the packet trace screenshotted in #2931 (comment), it appears to be TCP segmentation not IP fragmentation that is going on here; this is a subtle distinction, but IP fragmentation is strongly discouraged these days and most TCP engines in modern IP stacks try rather assiduously to avoid it. If this were a matter of LwIP refusing IP reassembly, it is most likely that no data would be delivered to the application as the TCP layer would not see the frame to process and hand up the stack. But because it is TCP segmentation, the first packet can be handed up to the application and read/recv will return with just the data from that frame.

There really are only four(+ epsilon) options, none of them particularly pleasant:

  1. Accept that enduser_setup will work with only a difficult to predict subset of peers and deployment scenarios. This is the "do nothing" option that the community has taken for nearly a year now.
  2. Fix enduser_setup by adding a buffer state machine
    1.1. It may be possible to use the pipe module from C (with, I predict, much pain and suffering) to avoid needing to make one's own buffer state machine. I acknowledge this as a possibility, but fear these things.
  3. Fix the in-Lua httpserver (which uses, or should use, @TerryE's pipe for buffering) and move enduser_setup to be an in-Lua codebase.
  4. Remove enduser_setup from the tree if option 0 is perceived as unacceptable and nobody steps up to do either 1 or 2.

@nwf nwf added the help wanted label Sep 6, 2020
@uDude
Copy link

uDude commented Sep 9, 2020

No, but you're quite rigt not ip frags. All we fave is thescreenshoot the user above sent (wrongpacket expandedI do not hav a mac. Certain i coukd spot the prob givn gut pcap. Oh well aybe u can side load s safarj ongo andrkid.

@uDude
Copy link

uDude commented Sep 9, 2020

Dementia get inmway, sorry.

@rvalle
Copy link

rvalle commented Sep 9, 2020

When I looked into this I was considering option 1. In the particular scenario, 2 reads may be sufficient to download the whole tcp conversation in all cases. It is a very small buffer.

A quick parse can be done on the downloaded buffer to see if it is actually a full HTTP conversation leg or parcial (I believe first read retrieves only the headers).

Then, for this very particular case, the state machine can be quite simple: like checking if the separation between headers and body is there, or checking that length header and body length match, etc.

We have rolled out our prototype in the 0 scenario, and I regret it. We have to ask half of the customers or more to borrow an android phone or use a laptop for setup. Everything else works fine.

IMHO for the long run, I think 2 is the right thing to do. NodeMCU would seriously benefit from improvements of the http implementation.

@marcelstoer
Copy link
Member

Just landed on dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants