Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use os.sendfile() when available #1448

Closed
scoder opened this issue Jun 3, 2015 · 9 comments
Closed

use os.sendfile() when available #1448

scoder opened this issue Jun 3, 2015 · 9 comments
Labels

Comments

@scoder
Copy link

scoder commented Jun 3, 2015

The StaticFileHandler should use os.sendfile() if available. While static files would commonly be sent by an external webserver (nginx etc.) in a production environment, sometimes files are generated on the fly and need to be written out through the IOLoop. sendfile() would reduce the overhead here.

I'd also like to have a general high-level RequestHandler.send_file(file_path, content_type=None) method for this case that the StaticFileHandler would call, but that other user code could conveniently use as well. If sendfile() is not available, it would simply fall back to forwarding chunks, but user code wouldn't have to care any more how exactly it works on the current system.

@bdarnell
Copy link
Member

bdarnell commented Jun 3, 2015

How much time does your profiling data say you're spending in this part of the code?

@scoder
Copy link
Author

scoder commented Jun 5, 2015

It wastes both time and memory. Sending files is something that can be done very efficiently with sendfile (and in fact directly in the IO loop driven by "socket free to send" events), whereas application level code that does this efficiently is quite tricky to write, so the normal way people would do it is to read the entire file into memory in chunks and call write() for each chunk, and then letting the IO loop send everything out afterwards. I'm therefore really more concerned about the memory impact than the time impact. And implementing something that reads the next chunk from the file when the socket says that it can send it is really low-level and involved and not something that an application should have to deal with. Plus, it could still block reading the file.

With sendfile on a non-blocking socket, the IO loop can tell the socket to send out the next bytes whenever it's ready to send, and the socket will tell the IO loop when it's done. Meaning that the file will be sent almost entirely without impacting the rest of the Python process, neither CPU-wise nor memory-wise.

@scoder
Copy link
Author

scoder commented Jun 5, 2015

Here is an example where sendfile() is used on a non-blocking socket:
https://code.google.com/p/pyftpdlib/source/browse/tags/release-0.7.0/pyftpdlib/ftpserver.py#1035

@bdarnell
Copy link
Member

bdarnell commented Jun 6, 2015

When sendfile can be used, it is undoubtedly better than the alternatives in several dimensions. But it cannot be used in all cases, so the non-sendfile version must be kept as well. So the question is a cost/benefit tradeoff: is the benefit of sendfile large enough to justify the cost of a second implementation (and the associated difficulties in testing and ensuring that both implementations are equivalent)?

The loop in StaticFileHandler is already fairly simple and memory-efficient (note that content is an iterator, not a list): it only reads one chunk more than what will fit in the socket buffer.

@lithammer
Copy link

Just wanted to mention this, but I'm sure you're already aware. But in Python 3.5, there's now socket.socket.sendfile() (which uses os.sendfile()).

@giampaolo
Copy link

giampaolo commented Nov 12, 2016

I ended up here by accident. Just wanted to point out that using sendfile() makes an actual difference. In pyftpdlib (I'm the author) transfers on localhost by using sendfile() vs. send() are nearly 3 times as fast so I think such an addition to Tornado should be seriously taken into consideration. Famous softwares such as proftpd, vsftpd, nginx and samba all implement sendfile support.

As for how to integrate this into Tornado IO loop you can take a look at how this is done in pyftpdlib:
https://github.com/giampaolo/pyftpdlib/blob/6165a40fd4f7c7cd2d549dbb816b3208091ffdff/pyftpdlib/handlers.py#L658
Roughly:

  • every time the socket is ready to be written you call sendfile() instead of socket.send (you'll never do file.read())
  • sendfile needs 2 fds (socket fd and file fd), the file offset and the max number of bytes to send on each call; here is the function signature
  • sendfile returns how many bytes were written; you have to keep track of that so that you can pass an updated "offset" parameter on each call
  • sendfile() returns 0 when EOF is reached (meaning you're done)
  • in case the file is not a "regular" file, sendfile() will fail with EINVAL; you have to take that into account when you call it the first time: if it fails with EINVAL you can fallback on using the plain send() method / logic
  • sendfile will raise exceptions similar to send() (EAGAIN, ECONNRESET etc), see here

Last: I am thinking about rewriting pyftpdlib by using Tornado because it provides an HTTP client, coroutines and support for easily integrating threads/processes to avoid blocking calls.
As such, I may be able to contribute a sendfile() patch for Tornado, but I'm not sure when as it's a lot of work (pyftpdlib side).

@honglei
Copy link

honglei commented Jun 17, 2020

Any News?

@glasses-on
Copy link

Any latest news on this? Any way we can use os.sendfile with some http server in Python?

@bdarnell
Copy link
Member

I think this is unlikely to happen so I'm going to go ahead and close this issue as "unplanned".

Now that we're unconditionally wrapping the asyncio event loop, the functionality is available at the lowest level via that interface. But at the HTTP level, supporting sendfile would involve breaking a lot of abstractions and would only work in limited circumstances (no TLS, etc). I don't think this is a good fit for Tornado.

That leaves IOStream, where I think there is some chance that a sendfile interface might make sense. I might be willing to accept a PR to add this. But overall my recommendation would be that if you care about sendfile you're probably better off working with asyncio protocols and transports rather than IOStream (which is designed to do a lot of buffering).

@bdarnell bdarnell closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants