Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep-Alive missing for HTTP responses #279

Closed
2 tasks done
basisbit opened this issue Feb 12, 2021 · 13 comments
Closed
2 tasks done

Keep-Alive missing for HTTP responses #279

basisbit opened this issue Feb 12, 2021 · 13 comments

Comments

@basisbit
Copy link
Contributor

basisbit commented Feb 12, 2021

Describe the bug
The HTTP server implementation of OME advertises that it supports http/1.1. However, it ignores the Keep-Alive header in http requests, does not send Keep Alive header (Connection: Keep-Alive) and also not the keep-alive parameters as headers (for example Keep-Alive: timeout=5, max=1000) in http responses.
This makes it impossible to successfully use HLS/DASH streaming with small chunk lengths in environments with high latency between edge and web-browser.

Hls.js and dash.js do not download .ts chunk files concurrently, but only one at a time. With the current implementation of OME, the latency/round-trip-time to the edge server will hit a couple of times, because for each chunk, it will start a new tcp connection, wait for the reply, start the TLS handshake with TLS hello, wait for reply containing the TLS server hello, send client key exchange stuff, wait for the server to respond change cipher spec stuff, and then send the HTTP GET plus wait for the response.

If all of this waiting because of network latency time plus the time to actually download the video chunk is bigger than the chunk length, then the video will always stutter, no matter the settings.

Expected behavior
Correctly implement HTTP/1.1

  • Add missing keep-alive headers to the HTTP response
  • Clean up / close connections that surpassed the timeout, maybe see the implementation for the socket support

Server (please complete the following information):

  • OS: Ubuntu 20.04
  • OvenMediaEngine Version: todays master
  • Branch: master

I think the problem is the implementation in https://github.com/AirenSoft/OvenMediaEngine/blob/master/src/projects/publishers/segment/segment_stream/segment_stream_server.cpp#L207

@getroot
Copy link
Member

getroot commented Feb 13, 2021

Yes. OME does not implement the full specification of the HTTP protocol.

But I have a question. If the TS download time is longer than the length of TS, and if it is a live stream, is it possible to play without stuttering? It is possible in VoD, but in live, it will eventually stutter again in a situation where it is underflowed more than the server's buffer. Any comments on this would be appreciated.

@basisbit
Copy link
Contributor Author

basisbit commented Feb 14, 2021

If the TS download time is longer than the length of TS, and if it is a live stream, is it possible to play without stuttering?

Yes, it is possible if the code running in the webbrowser can download multiple .ts files at the same time. For example if the list has 6 chunks of 2s and the webbrowser is 4 chunks behind (between 8 and 10 seconds), then the webbrowser could download multiple chunks (5 and 6) at the same time (if the internet connection is not saturated downstream), and thus have the next chunks available when they are needed.

However, that is not what this issue here is about, not at all. This here issue is about latency between the edge server and the client's web browser hitting each download of a .ts file three times, and thus severely impacting the available time that the web browser has to actually download the next chunk.
If someone form Australia wants to watch a live-stream from an edge-server in Europe or USA, then 900ms of waiting time for each .ts file are easily avoidable just by supporting Keep-Alive from http/1.1. That is 900ms for each chunk which the client could ave used for downloading the file content instead. Also this severely wastes CPU time of the clients and the server for connection handshakes and TLS handling.
I'd consider this to be a very critical bug, especially because the server advertises supporting it.

@getroot
Copy link
Member

getroot commented Feb 16, 2021

I'll review this to see if it's a feature that can be easily added in the current structure.

@basisbit
Copy link
Contributor Author

basisbit commented Mar 4, 2021

Any chance to get a first review of this topic? Without support for reusing tcp connections (a major part of the http/1.1 standard), it is impossible to HLS/DASH/LL-DASH stream anything with more than 400Kb/s to clients who have >550ms latency (under load) and some packet loss, thus OME is currently not really usable for international video streaming from the usual cheap server locations.

What I have already tried to improve the issue:

  • changing tcp congestion avoidance algorithm to hybla
  • increasing init-cwnd to 46 and similar optimizations
  • tried getting http keep-alive to work by not closing the tcp connection within the segment stream server and adding the http headers

However, for each .ts file download it still takes at least 1,5 seconds for the initial connection setup plus TLS 1.3, and then roughly two more seconds for tcp SlowStart to reach a reasonable download bitrate on a connection with for example 550ms latency. If the transport then also has for example a 0,5% chance of packet loss, even a 6 seconds chunk size and 500 Kb/s CBR video quality will not play without stuttering after almost each chunk.

If the webbrowser would be allowed to reuse the tcp connection, playback would be just fine, even for much higher packet loss and higher latency (for example a 3G cellular internet user in australia, trying to watch a stream form a server in Europe).

@getroot
Copy link
Member

getroot commented Mar 5, 2021

@basisbit We will implement this. However, these days we are having a very busy day working on other commercial projects, and for OME we are working on the highest priority (WebRTC Input, improvement of socket performance, Signed Callback) tasks. We will consider this as the next priority.

@basisbit
Copy link
Contributor Author

linking this to #268 as the long connection-establishing time seems to be one of the major reasons why LL-DASH from an OME server dies after a few seconds (approximately one or two chunk sizes) of playback, when used over typical internet connections with not so good latency.

@basisbit
Copy link
Contributor Author

basisbit commented Apr 14, 2021

any update on this? It makes it very difficult to use OME for live-videostreaming for world-wide visitors even when using servers in multiple regions to stream 1 Mb/s video streams...

Edit: I wonder if maybe it would be easier to always use NginX or HAProxy in front of the HTTP port of OME, thus making it easier for server operators to adjust any HTTP or TLS related options using all the documentation of these tools. Did anyone here already benchmark how using such reverse proxy in front of OME would impact CPU and memory requirements?

@dimiden
Copy link
Member

dimiden commented Apr 15, 2021

@basisbit
I'm going to do this task, but I'm working on something else these days, so I don't have time to develop it. I'll let you know again if there's any update. Thank you for your patience.
And we don't have benchmarking results with NginX or HAProxy. I'm also curious, but I can't afford it.

@dimiden
Copy link
Member

dimiden commented Jul 13, 2021

Once again, I apologize for the delay in this task.

We also think we need Keep-alive, and I would like to remind you that this task has the highest priority for me, but it is being delayed due to other commercial projects. (This commercial project is necessary for our "survival", so I can't lower its priority.)

I will try to complete the development before the end of this year.

@getroot
Copy link
Member

getroot commented Mar 22, 2022

It took too long to release the HTTP/1.1 persistent connections feature. Many thanks to everyone who has been waiting for this feature.

I am working hard to support LLHLS these days.
Because LLHLS is based on HTTP/2.0, I have redesigned the existing HTTP module. Persistent Connection in HTTP/1.1 was implemented naturally as a restructuring for HTTP/2.0, and I was able to commit this feature today. I'm now ready to implement HTTP/2.0 and LLHLS.

The new HTTP module has only been tested in my environment yet. So many bugs can be found. So you should not use the master branch in a commercial environment yet.

I hope many people test new HTTP module. I am well aware that OME is becoming a more stable system as many contributors test OME in various environments. Thanks to all contributors.

@bchah
Copy link
Collaborator

bchah commented Mar 22, 2022

Hi @getroot thank you for this. I tried a new build and API calls are not working, showing this error:

[2022-03-22 20:18:59.388] E [SPAPIServer-T80:647] HttpServer | http_transaction.cpp:338 | Invalid parse status: 200

Is that enough of a clue?

@getroot
Copy link
Member

getroot commented Mar 23, 2022

@bchah Thanks for the quick report. I fixed the problem and committed it.

@getroot
Copy link
Member

getroot commented Jun 2, 2022

Since this feature has been released, I am closing this issue. If you need a discussion about Keep-Alive or Persistent Connection, please create a new issue.

@getroot getroot closed this as completed Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants