Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP PUT output #149

Closed
colleenkhenry opened this issue Sep 13, 2016 · 107 comments
Closed

HTTP PUT output #149

colleenkhenry opened this issue Sep 13, 2016 · 107 comments
Assignees
Labels
flag: seeking PR We are actively seeking PRs for this; we do not currently expect the core team will resolve this status: archived Archived and locked; will not be updated type: enhancement New feature or request
Milestone

Comments

@colleenkhenry
Copy link

Is it possible to, instead of writing to a file system, output the results of the muxer via HTTP PUT?

Similar to this patch for the live webm dash muxer.

http://ffmpeg.org/pipermail/ffmpeg-devel/2016-September/199424.html

If it is not yet possible, it would be very very useful!

@kqyang
Copy link
Contributor

kqyang commented Sep 13, 2016

No, that is not supported right now.

It can be supported with a new File implementation (https://github.com/google/shaka-packager/blob/master/packager/media/file/file.h) supporting HTTP protocol, possibly called HttpFile.

We don't have any resource to work on it right now. Any help / contribution will be welcomed!

@kqyang kqyang added flag: seeking PR We are actively seeking PRs for this; we do not currently expect the core team will resolve this type: enhancement New feature or request labels Sep 13, 2016
@vaage vaage added this to the Backlog milestone Feb 14, 2018
@kqyang kqyang modified the milestones: Backlog, v2.2 Apr 10, 2018
@kqyang kqyang modified the milestones: v2.2, v2.3 Apr 27, 2018
@kqyang kqyang modified the milestones: v2.3, v2.4 Aug 23, 2018
@JuliusThms
Copy link

@colleenkhenry I have written a bash script for uploading / syncing manifests and segments via http in the right order. If some of you guys are interested, let me know.

@colleenkhenry
Copy link
Author

Thanks @JuliusThms

However, I think for best latency we would want to have something that is uploading the segments as they are being created (this is for live). Maybe I'm wrong, but I suspect that requires direct integration?

@JuliusThms
Copy link

JuliusThms commented Sep 5, 2018

It's working for live using the right functions and logic to have the files uploaded immediately after they are written. But you're right; a direct Adapter suggested by @kqyang would be much smarter and less complex. I'm thinking about implementing it; Keep u updated.

@kqyang
Copy link
Contributor

kqyang commented Sep 6, 2018

@JuliusThms Thanks for jumping onto this issue! Let us know if you decide to do it and don't hesitate to reach out to us if you have any questions!

@colleenkhenry
Copy link
Author

Thanks guys!

@amotl
Copy link
Contributor

amotl commented Oct 15, 2018

Dear @kqyang, @colleenkhenry and all contributors,

@JuliusThms asked us to have a look at this and so we are making an attempt to implement HttpFile support as outlined in this discussion, especially to improve live mode publishing performance. For doing so, we humbly ask for guidance from the core developers.

Introduction

We are aiming to map the File::Write call to the HTTP domain in the most optimal way. As File::Write might be called multiple times per file by BufferWriter::WriteToFile, the implementation should support buffered writing by mapping it to incremental uploads.

About HTTP PUT

HTTP PUT is exclusively designed for requests submitting the full representation of a resource:

An origin server that allows PUT on a given target resource MUST send a 400 (Bad Request) response to a PUT request that contains a Content-Range header field (Section 4.2 of [RFC7233]), since the payload is likely to be partial content that has been mistakenly PUT as a full representation.

-- via: HTTP/1.1 Semantics and Content, section 4.3.4. PUT

About HTTP PATCH

sabre/dav HTTP PATCH shows how to implement HTTP PATCH requests using the "[X-]Update-Range: append" header, which would perfectly match the requirements regarding incremental uploads originating from buffered writes.

Using curl

The http_key_fetcher.cc guided us towards using the curl library, saving us from having to introduce another dependency to the project. So, we will try to run the HTTP PATCH requests through libcurl.

State of the onion

First steps

We added a basic stub for HttpFile support (http_file.h, http_file.cc). You can follow the development on the http-upload branch where you can also display the delta to the master branch when we forked off, see master...http-upload.

Request for comments

We would be happy to receive feedback, recommendations or objections about whether we are on the right track with this.

Cheers,
Andreas.

@kqyang
Copy link
Contributor

kqyang commented Oct 15, 2018

@amotl Thanks for helping out on this issue!

I have to admit that I have never heard of HTTP PATCH before. The proposal looks good other than a concern on HTTP PATCH. Looks like PATCH was added to the HTTP protocol in 2010 (RFC 5789). Do you know what is the support status in common cloud providers (AWS, Google Cloud etc)?

@rkuroiwa
Copy link

Maybe we should have two modes. One mode that use HTTP PATCH so that there's no/minimal buffering and a mode that doesn't do HTTP PATCH. The latter could buffer the data on File::Write and send one PUT/POST on File::Close.

@colleenkhenry
Copy link
Author

colleenkhenry commented Oct 15, 2018 via email

@kqyang
Copy link
Contributor

kqyang commented Oct 16, 2018

The latter could buffer the data on File::Write and send one PUT/POST on File::Close.

We are going to support CMAF low latency next quarter, which requires the data to be written ASAP, so we cannot buffer the data and write once in the end.

@colleenkhenry is right. HTTP PUT with chunked transfer should work. Unfortunately, with libcurl, the data has to be transferred using a read callback. It seems that we need to run libcurl in a separate thread which consumes the data cached by File::Write, where we can use the existing IoCache class.

Here is a sample code how it can be done:

    bool HttpFile::Open() {
      base::WorkerPool::PostTask(
        FROM_HERE,
        base::Bind(&HttpFile::CurlPut, base::Unretained(this)),
        true /* task_is_slow */);
    }

    static size_t read_callback(void *dest, size_t size, size_t nmemb, void *userp) {
        IoCache* cache = userp;
        return cache->Read(dest, size * nmemb);
    }

    int64_t HttpFile::Write(const void* buffer, uint64_t length) {
      if (internal_file_error_.load(std::memory_order_relaxed))
        return internal_file_error_.load(std::memory_order_relaxed);

      uint64_t bytes_written = cache_.Write(buffer, length);
      position_ += bytes_written;
      if (position_ > size_)
        size_ = position_;

      return bytes_written;
    }

    bool HttpFile::Close() {
      cache_.Close();
      delete this;
      return true;
    }

    void HttpFile::CurlPut() {
      curl_easy_setopt(curl, CURLOPT_READFUNCTION, CurlReadCallback);
      curl_easy_setopt(curl, CURLOPT_READDATA, &cache_);
      curl_easy_setopt(curl, CURLOPT_UPLOAD, 1L);
      struct curl_slist *chunk = NULL;
      chunk = curl_slist_append(chunk, "Transfer-Encoding: chunked");
      curl_easy_setopt(curl, CURLOPT_HTTPHEADER, chunk);
      ..
    } 

It should be more efficient than HTTP PATCH as only one HTTP connection needs to be setup.

@amotl
Copy link
Contributor

amotl commented Oct 16, 2018

Hi there,

thanks a bunch for all of your suggestions!

"HTTP PUT with chunked transfer" will probably be the most efficient way of uploading to HTTP, thanks @colleenkhenry for suggesting this and even more @kqyang for providing an implementation based on IoCache already - we would never have known about this. Nevertheless, we also like the idea by @rkuroiwa about having different transport modes.

As we have been half-way through the implementation using HTTP PATCH already, we wanted to finish it before taking the next steps. We also added corresponding documentation which already mentions the other HTTP transport flavors, see http_upload.rst.

We are looking forward to continue working on this feature. Until then, we will be happy to receive your feedback. Thanks again for your support!

With kind regards,
Andreas.

@colleenkhenry
Copy link
Author

Thank you so much for working on this @amotl . I think this will be a really awesome feature when combined with @kqyang @'s low latency CMAF work.

@rkuroiwa
Copy link

@amotl Thanks for doing this.
I think we want something first, and then iterate on it. It's not like we're going to throw out the HTTP PATCH implementation :) (unless... idk maybe the world starts thinking that it's a bad practice). So the next step, after HTTP PATCH impl, might be to support chunked transfer mode.

Frankly, I don't know much about HTTP; I completely forgot about chunked transfer encoding, thanks for reminding me @colleenkhenry!

@amotl
Copy link
Contributor

amotl commented Oct 19, 2018

Hi there,

we just added the HTTP PUT transfer mode as suggested by @colleenkhenry and sketched out by @kqyang in a way to generally support multiple transfer modes as suggested by @rkuroiwa. The corresponding documentation at http_upload.rst was updated accordingly. Thanks for your help on this!

While the HTTP PATCH transfer mode seems to work flawlessly already, we are struggling with the HTTP PUT transfer mode. We are apologizing for our limited knowledge of C++ and humbly ask whether one of you could take a more detailed look at the code (http_file.h and http_file.cc). While it already issues requests towards the designated HTTP sink, its payloads are empty and on top of that it will segfault after a few seconds. Things like proper object lifecycle and/or concurrency handling are obviously not quite right yet.

@kqyang: The flow between HTTP PATCH and HTTP PUT is a bit different. While there will be a single HTTP connection for both cases, the former will issue a full HTTP request per chunk: As you can see, each call to HttpFile::Write will call HttpFile::Request when using the PATCH_APPEND transfer mode. On the other hand, when using PUT_CHUNKED, it will simply stuff the buffer into the IoCache object there, as outlined by your sketch. This should be okay, as the libcurl handle has been configured by HttpFile::SetupRequestData to shuffle the data using read_callback. However, i assume that something on the interaction between both goes haywire at runtime.

The code is still sitting in the http-upload branch of our fork. Please try to be merciful with the current shape: We promise to clean it up thoroughly if there's a chance to include it into the upstream code base. For doing that, we will be happy to receive further guidance.

Thanks again for looking after this, we appreciate your efforts.

Cheers,
Andreas.

@rkuroiwa
Copy link

rkuroiwa commented Oct 26, 2018

Figured out some bugs. I am going to try to explain in words but it would be very wordy.

A: Expect: 100-continue
libcurl automatically puts Expect: 100-continue in the HTTP header. This (IIUC) means that it expects status 100 as an ACK for chunked transfer. So if the server responds with 200, it closes the connection.
This can be disabled with
chunk = curl_slist_append(chunk, "Expect:");

B: HttpFile::Close() deletes the instance immediately.
When curl_easy_perform() is called, libcurl uses the current thread to call the "read callback". This means curl_easy_perform() blocks until all data is "read". This is the reason why @kqyang recommended to put it on a worker pool (i.e. on a separate thread).
I wrote a small program (actually a unit test) that does

Open();
Write();
Close();

The problem with this is that

  1. Open() returns immediately since all it does is post a task.
  2. Write() returns immediately as well since it writes the data to a buffer (IoCache).
  3. Close() returns immediately as well since it only flags IoCache as closed, and deletes the instance (delete this).

The problem with the last step is that it deletes the instance. It basically makes the entire instance, including curl context and IoCache instance invalid while its still trying to transfer data.

A way to solve this problem is to use base::WaitableEvent like ThreadedIoFile's task_exit_event_. For example, after performing curl_easy_perform() right before returning in HttpFile::Request(), signal the event.
And in HttpFile::Close(), wait for the event.

Now I hacked it to the point where curl prints
* Signaling end of chunked upload via terminating chunk.
But I'm not sure if this hack has broken the PATCH method. Since one is "synchronous" and the other is "asynchronous" it might be easier to manage if they were separated.

@amotl
Copy link
Contributor

amotl commented Oct 26, 2018

Dear Rintaro,

thank you so much for taking the time to investigate the nitty gritty details of C++ object lifecycles and multithreading. I also think we should probably implement different HTTP transfer modes separately from each other due to their different nature of being "synchronous" vs. "asynchronous". It was just an attempt to make them work together in the same code - if we should split them apart, then let's try to do it.

Now I hacked it to the point where curl prints
* Signaling end of chunked upload via terminating chunk.

This sounds promising!

But I'm not sure if this hack has broken the PATCH method.

Nevermind ;]! It will find its way back if still required, maybe we should just put our feet on the HTTP PUT with chunked transfer mode only if everything turns out well.

Is there a way to pick up your current work? I will be back at the keyboard next week and will be happy to continue the work if i am able to fill in more gaps if even required ;].

Together with @JuliusThms, we are planning to implement a little Lua backend thing for Nginx which should receive the chunks and write the contents to the filesystem properly.

Thanks again for your efforts!

Cheers,
Andreas.

@colleenkhenry
Copy link
Author

colleenkhenry commented Oct 26, 2018 via email

@amotl
Copy link
Contributor

amotl commented Oct 26, 2018

Thanks Colleen, we will definitively have a look!

@rkuroiwa
Copy link

I uploaded it to http-fix-attempt branch

My text editor automatically formats code, so the first commit is formatting and adding the unit test file; nothing to do with the fix.

Please look at the second commit.
rkuroiwa@5bab0bd
There is some noise, but the change itself isn't very big.

@amotl
Copy link
Contributor

amotl commented Jul 13, 2019

Dear Colleen,

@JuliusThms is still having performance/latency issues with this in their specific production environment. Nevertheless, we should continue to rebase this onto master and finally conclude the merge into mainline because the baseline thing is fully functional already and might help others to push streams around more efficient.

The most likely reason that things stalled here is that nobody asked and we are probably lacking the right opportunity to finish work on this piece.

With kind regards,
Andreas.

@colleenkhenry
Copy link
Author

colleenkhenry commented Jul 13, 2019 via email

@kqyang
Copy link
Contributor

kqyang commented Jul 15, 2019

@amotl I have created another issue #620 to track the HLS live packaging performance improvement.

It will be appreciated if you can submit a pull request to merge the changes you have done back to master. It will also make it easier for other people to try and evaluate the feature.

@amotl
Copy link
Contributor

amotl commented Jul 16, 2019

Dear @kqyang, @colleenkhenry and @fwoelffel,

I hear you and recognize your interest in bringing this feature into mainline. While the majority of the implementation is already finished also thanks to support from all of you, I would need someone to support this work (getting into the topic again, refreshing the sandbox, rebasing onto master, maybe add some bells) so I will talk to @JuliusThms in this regard. Thanks also for diverting the performance-related discussion into #620, @kqyang.

With kind regards,
Andreas.

@kqyang
Copy link
Contributor

kqyang commented Jan 30, 2020

@amotl Are you still interested in sending a pull request on your work on HTTP file support?

@amotl
Copy link
Contributor

amotl commented Feb 10, 2020

Dear Kongqun,

thanks for following up on this task. I will be happy to get back to work on this and will ask @JuliusThms about support for that.

With kind regards,
Andreas.

@termoose
Copy link
Contributor

Would be great to have @amotl's contributions merged upstream. I added the possibility to set a custom User-Agent string to be used for HTTP PUT requests: termoose@439c30e

As for HTTPS it seems the code in your branch is mostly finished (the one that's commented out), just missing command line arguments for specifying certificate paths? Will run some tests on this and update my branch if I find something that works

@amotl
Copy link
Contributor

amotl commented Mar 19, 2020

Dear Ole Andre,

thanks for picking this up!

With kind regards,
Andreas.

@kqyang
Copy link
Contributor

kqyang commented Mar 19, 2020

@amotl @JuliusThms What is your plan on merging it upstream?

@termoose
Copy link
Contributor

I added support for HTTPS, it has been successfully tested with Akamai MSL4 entry points on Widevine encrypted segments. Akamai does not require a specific client certificate so I don't have any available entry points to test that feature against. The code is the one that @amotl already added from playready_key_encryption_flags.cc with pretty much no modifications.

I opted out of fetching FLAGS_ for each HTTP PUT, assuming this is some map lookup which can then be avoided for each request. Command line arguments are instead initialised once in the constructor.

I submitted the CLA mentioned in the contribution guidelines and updated the AUTHORS and CONTRIBUTORS accordingly, I guess @amotl should do the same before a merge.
My branch is a fork of https://github.com/3QSDN/shaka-packager so it should be merged back in there, CLA details added, and then merged into the Google repository. I can also cherry-pick any changes you have @amotl and submit a merge request for my branch directly if you prefer!

https://github.com/termoose/shaka-packager/tree/http-upload

@amotl
Copy link
Contributor

amotl commented Mar 20, 2020

Dear Ole Andre,

I added support for HTTPS, it has been successfully tested [...]

Thank you so much!

I submitted the CLA mentioned in the contribution guidelines and updated the AUTHORS and CONTRIBUTORS accordingly, I guess @amotl should do the same before a merge.

Done with 8ddbbf0. Thanks.

I can also submit a merge request for my branch directly if you prefer!

Please go ahead! Thanks again for your efforts adding HTTPS support and thanks in advance for bringing this upstream!

With kind regards,
Andreas.

@termoose
Copy link
Contributor

I cherry-picked it in, great!

There is another thing I'd like to add before publishing the PR, the ability to have multiple outputs (primary/backup) defined, with either a comma separated list or a new command line flag for --mpd_output_backup or something similar. Will try to implement this and run some tests.

@kqyang
Copy link
Contributor

kqyang commented Mar 24, 2020

@termoose Thanks for working on it! I'll prefer keeping the PR focusing on a single feature. It will also be easier for discussion and code review :). For multiple outputs (primary/backup), can you file a separate issue and a separate PR instead?

@termoose
Copy link
Contributor

For sure, will make that a separate PR! As for the Google CLA, I'm still waiting for our legal department, should be solved shortly

@kqyang kqyang modified the milestones: v2.5, v2.6 Aug 20, 2020
TheModMaker pushed a commit that referenced this issue Feb 2, 2021
Issue #149

Co-authored-by: Andreas Motl <andreas.motl@elmyra.de>
Co-authored-by: Rintaro Kuroiwa <rkuroiwa@google.com>
Co-authored-by: Ole Andre Birkedal <o.birkedal@sportradar.com>
@kqyang kqyang modified the milestones: v2.6, v2.5 Feb 3, 2021
@amotl
Copy link
Contributor

amotl commented Feb 28, 2021

Hi again,

the patch from #737 has been integrated by @TheModMaker and according to @kqyang's flags, the feature might be integrated into the upcoming 2.5 release if we are lucky. Thank you all who collaborated on this!

With kind regards,
Andreas.

@kqyang
Copy link
Contributor

kqyang commented Feb 28, 2021

There is one more clean-up CL from @TheModMaker to be pushed soon, then we can claim that it is done.

Thanks everyone for helping land the feature.

And yes, it will be included in v2.5 release, which will be released in the end of March.

@termoose
Copy link
Contributor

termoose commented Mar 1, 2021

Thank you for taking over this @kqyang and @TheModMaker, really looking forward to the v2.5 release! 🎈

@shaka-bot shaka-bot added the status: archived Archived and locked; will not be updated label May 1, 2021
@shaka-project shaka-project locked and limited conversation to collaborators May 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flag: seeking PR We are actively seeking PRs for this; we do not currently expect the core team will resolve this status: archived Archived and locked; will not be updated type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants