Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What sort of http file throughput speed should I expect for an esp8266? #209

Closed
jim80 opened this issue Jun 12, 2019 · 17 comments
Closed

What sort of http file throughput speed should I expect for an esp8266? #209

jim80 opened this issue Jun 12, 2019 · 17 comments

Comments

@jim80
Copy link

jim80 commented Jun 12, 2019

So, I basically modified the bitmapspooler example to "spool" a file and serve it over http chunked, which works. Testing it on a roughly 1/2 Meg jpg file it seems pretty slow - it loads in the browser almost a line at a time, taking in the region of 15 seconds in total.
I do appreciate that this is very limited hardware so I don't expect miracles from it, but I was hoping for something a bit faster!
I don't actually hope to serve largish bitmaps from it, but I do hope to serve a relatively meaty javascript bundle (react and a heap of css etc) gZipped.

Does that seem in the right region to you?

If so, can it be made faster?

If not, and it should be faster than that, I'm wondering what I might be doing wrong?

I'd test on my esp32 to see how that goes there (faster, I'm sure), but I'm having some other issues with that:( !

@wilberforce
Copy link
Contributor

What size 'chunks' are you sending? I have found 512 btyes seem to work - any more and it crashes. A 1/2 Mb jpg file is actually quite large - are you sure that it can't be optismised smaller?

@jim80
Copy link
Author

jim80 commented Jun 12, 2019

Just what the server asks for! From the docs (and as illustated by the bitmap server example)

9 | Get response fragment. The server is ready to transmit another fragment of the response. The val1argument contains the number of bytes that may be transmitted. The callback returns either a String or ArrayBuffer. When all data of the request has been returned, the callback returns undefined.

on esp8266, from (my) memory, it's in the region of 2000 bytes or so.
I'll try 512 and see if anything changes...

A 1/2 M jpg file was just what I had to hand - it's just a test file, not what I'd use in the real world. It was meant to be about that size just to give me some idea of the speed I'd get.

@jim80
Copy link
Author

jim80 commented Jun 12, 2019

512 bytes made no difference I'm afraid.
When serving fragments as in example https://github.com/Moddable-OpenSource/moddable/blob/public/examples/network/http/httpserverbmp/main.js it seems the httpserver can happily serve around 2kbytes at a time, but just returning a string at one time it seems to crash out with disconnection errors at much less.
An html string I was trying to serve would easily crash it unless I used the chunk encoding code , in which case it transmitted it within a single chunk just fine! I have no idea why that might be...

Anyway, I've just noticed I'm not sending a content-length header, so I'll just drop that in and see if it makes any difference...

@jim80
Copy link
Author

jim80 commented Jun 13, 2019

OK, so adding the content-length header has resulted in a speed up in the region of 4x, so that file loads in the browser in roughly 3 seconds. That is about what I'd expected/hoped for from an esp8266.
I suppose I've just learned a little more about http ;)
@wilberforce many thanks for your thoughts. if I'd not revisited this with your input I may not have spotted my mistake.

I'm still not quite sure about how many bytes i can send at once without crashing http or using chunked encoding, but I'll have a play and a further read about it before posting another issue, if I need to.

@wilberforce
Copy link
Contributor

but just returning a string at one time it seems to crash out with disconnection errors at much less.

This how I found the 512 limit by experimenting.

If the content is less than 512 - I just serve the string, if longer then chunk it.

I think we are on the same path here... I'm just a bit further along! I've stopped using the spiffs file system ( I had set up so I could do a http put to update the content) . Now I serving the content from the resource class, and the build packages up the files for me and I don't have to upload any changes, as the build includes it in the firmware. I'm using vuejs rather than react and bulma for the css framework, trying to use the most light weight frameworks I can. The build process gzips the files and these are stored as resources in my build.

@barbiani
Copy link

barbiani commented Jun 13, 2019

The arduino asyncwebserver does 7Mbps so we know that the hardware is capable. The regular webserver is slower and speed compatible with moddable. Unfortunately the asyncweb is not stable.

@wilberforce
Copy link
Contributor

The arduino asyncwebserver does 7Mbps so we know that the hardware is capable

How was this measured?

@barbiani
Copy link

Connected my desktop computer to the esp32 wifi and streamed data. It is very fast.

@wilberforce
Copy link
Contributor

wilberforce commented Jun 13, 2019

[EDITED add esp32]
Ok. I have a gz zipped app.js getting served from moddable using the webserver.

The gz file is 89kb.

I found this curl script to measure:
https://stackoverflow.com/questions/18215389/how-do-i-measure-request-and-response-times-at-once-using-curl

This is from windows using the simulator:

root@office-pc:~# curl -w "@curl-format.txt" -o /dev/null -s "http://192.168.15.16/app.js"
 time_namelookup:  0.000
       time_connect:  0.001
    time_appconnect:  0.000
   time_pretransfer:  0.001
      time_redirect:  0.000
 time_starttransfer:  0.003
                    ----------
         time_total:  0.018

and this is on the esp8266:

root@office-pc:~# curl -w "@curl-format.txt" -o /dev/null -s "http://192.168.15.15/app.js"
 time_namelookup:  0.000
       time_connect:  0.019
    time_appconnect:  0.000
   time_pretransfer:  0.019
      time_redirect:  0.000
 time_starttransfer:  0.121
                    ----------
         time_total:  0.482

esp32:

root@office-pc:~# curl -w "@curl-format.txt" -o /dev/null -s "http://192.168.15.35/app.js"
 time_namelookup:  0.000
       time_connect:  0.042
    time_appconnect:  0.000
   time_pretransfer:  0.043
      time_redirect:  0.000
 time_starttransfer:  0.282
                    ----------
         time_total:  0.477

or:
win sim:

root@office-pc:~# curl -o /dev/null  "http://192.168.15.16/app.js"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 90113  100 90113    0     0  3148k      0 --:--:-- --:--:-- --:--:-- 3259k

esp 8266:

root@office-pc:~# curl -o /dev/null  "http://192.168.15.15/app.js"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 90113  100 90113    0     0   147k      0 --:--:-- --:--:-- --:--:--  147k

esp32:

root@office-pc:~# curl -o /dev/null  "http://192.168.15.35/app.js"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 90113  100 90113    0     0   120k      0 --:--:-- --:--:-- --:--:--  120k

@barbiani
Copy link

barbiani commented Jun 13, 2019

The thread is for the 8266, but here is a comparision.

This is the asyncwebserver streaming from ram with esp32.

curl -o /dev/null http://192.168.4.1/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1875k  100 1875k    0     0  1011k      0  0:00:01  0:00:01 --:--:-- 1011k

The server callback gives a buffer and its size for you to fill with data and wait for the next call.

These are the buffer sizes on the first calls:

5634
1436
4308
1436
4308
1436
4308
1436
4308
1436
4308
1436
4308

@wilberforce
Copy link
Contributor

This is the page load of my app:

win sim:
image

esp8266:
image

esp32:
image

@jim80
Copy link
Author

jim80 commented Jun 13, 2019

Ah, the search for an easy to use performant espXX server! I've not used the arduino async one for a couple of years, as I remember it was good, but prone to crash. I checked a few months ago and development looked dead, sadly. I was surprised to not find it forked and further developed, maybe I didn't look hard enough.

@jim80
Copy link
Author

jim80 commented Jun 13, 2019

@wilberforce : "I think we are on the same path here... I'm just a bit further along! " ....
Yup, I think so, thanks for the tip - I've just been coming round to that way of working being the way to go, great to have that thinking validated. The smallest framework I've used so far is "preact" - good for me since I'm comfortable in react, as as I recall it's something like 10k (or less) with react-compat, gzipped.
I've just finished a contract using reasonML though, and I really like it - It works well with react and I was hoping to use it here, but the full react framework is a little bigger than I'd like.
Interestingly, it's another syntax for Ocaml, which apparently does c interop very well, and has previously been made to work on microprocessors (SMT32? Not entirely sure without looking it up).
I do wonder if it could me made to work on the ESPs, but my c skills go no further than the odd arduino sketch :(

@phoddie
Copy link
Collaborator

phoddie commented Jul 10, 2019

A few notes:

  • On ESP8266 and ESP32 the networking is implemented using lwip. The Moddable SDK socket uses the lowest level lwip API (callback) which gives the most control.
  • lwip provides the number of bytes that can be written with the tcp_sndbuf API.
  • The Moddable SDK passes the current value of tcp_sndbuf to the script using the HTTP server as part of the Request.responseFragment callback message. Attempts to write more than that will likely fail. Writing less is fine, however it will may not use the network bandwidth efficiently.
  • The HTTP server waits for confirmation from lwip that it has delivered (at least part of) the data before sending more.
  • This behavior means that transmit performance is gated by the response from the remote HTTP client. It may well be possible to push performance further, but it isn't obvious how to do so reliably. Getting these things wrong can cause all network traffic on the microcontroller to be permanently blocked (I know from experience...). If you are aware of techniques that achieve better performance and remain stable, please share. I'm interested to learn more.
  • The httpserverchunked example may be a more straightforward starting point to explore HTTP server performance than the bitmap spooler.

I'm closing this issue as its (interesting) discussion seems to have run its course. If there are new findings, please open a new issue focused on that. Thank you.

@phoddie phoddie closed this as completed Jul 10, 2019
@phoddie
Copy link
Collaborator

phoddie commented Jul 11, 2019

One more note... Nagle's algorithm may be relevant here. It is enabled by default in lwip. As an experiment, you can easily disable it by adding the following line to the top of configureSocketTCP in modSocket.c:

  tcp_nagle_disable(xss->skt);

@phoddie
Copy link
Collaborator

phoddie commented Jul 11, 2019

A little more...

The next push (later today) of the Moddable SDK includes support to disable Nagle's algorithm from script. It is disabled by default now in the HTTP server.

I adapted the httpserverchunked example for benchmarking. That is below. It generates an 10 MB stream of zero data. On an ESP8266 it runs at about 330 KB/sec on our office network downloading to browsers on macOS. On an ESP32, it runs at about 520 KB/sec. On both devices, there is some variation (both faster and slower). There wasn't any special effort to optimize the network environment, Wi-Fi antenna placement, etc. The benchmarks were run on a release build.

import {Server} from "http"

let server = new Server({});
server.callback = function(message, value) {
	if (Server.status === message) {
		this.path = value;
		this.remaining = 10 * 1024 * 1024;
		this.buffer = new ArrayBuffer(1);
	}

	if (Server.prepareResponse === message)
		return {headers: ["Content-type", "application/octet-stream"], body: true};

	if (Server.responseFragment === message) {
		if (this.remaining <= 0)
			return;

		this.remaining -= value;

		let buffer = this.buffer;
		if (buffer.byteLength !== value) {
			delete this.buffer;
			buffer = this.buffer = new ArrayBuffer(value);
		}
		return buffer;
	}
}

N.B. The reuse of the ArrayBuffer is to minimize the load on the garbage collector. It makes a measurable difference when there are hundreds of relatively large blocks being allocated per second.

N.B. Disabling the Nagle algorithm doesn't make a huge difference, but it seems like the right thing to do in the server given the way modSocket implements writes.

@jim80
Copy link
Author

jim80 commented Jul 11, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants