Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for transcoding by remote workers #947

Closed
ghost opened this issue Aug 22, 2018 · 54 comments
Closed

Support for transcoding by remote workers #947

ghost opened this issue Aug 22, 2018 · 54 comments
Labels
Component: Transcoding Priority: Roadmap Feature planned to be developed in the annual roadmap Type: Feature Request ✨

Comments

@ghost
Copy link

ghost commented Aug 22, 2018

I'm trying to run a peertube instance with a large volume of 1080p60 content and the transcode bottleneck is significant. One potentially easy way to improve this when additional CPU power is available on the local network could be to scale out the transcode with a worker daemon.

I'm opening this in the hope of a discussion for a future enhancement, and to keep track of that work.

In my case I'm imagining the workers being located in the same data center, but it seems like this could be extended to geographically diverse or even crowdsourced transcode sites.

Personally I'd be happy with a solution where the remote workers run a userspace daemon that manages ffmpeg with little to no local storage, accessing the peertube server's disks either over an existing network storage protocol or existing media streaming protocol.

I'm curious what takes other folks have. I know this is unlikely to be interesting for people running peertube on a small VPS, unless the transcode workers can be elsewhere on cheaper hosting (like at your house, on DSL).

@rigelk
Copy link
Collaborator

rigelk commented Aug 23, 2018

A simpler way to implement is for clients to transcode first and then let the server decide if the video is properly transcoded.

But adding workers is another level of implementation. It's also something we need to define, because we have multiple possibilities: parallel transcoding of chunks of the same video, or transcoding of different videos. It also means the workers have to register at the PeerTube instance, and the instance has to monitor their progress with the trancoding tasks it gives them. Arguably, that's way more work than just making the client and server agree on whether or not the video needs additional trancoding.

@ghost
Copy link
Author

ghost commented Aug 23, 2018

I'm interested in client-side transcodes too, but I thought that seemed much harder?

  • it means running ffmpeg or similar, efficiently, in the browser or requiring all uploads to be done via a native app

  • verifying that a transcode was correct seems non-trivial, if we want to do more than just check that the format and header look okay

@rigelk
Copy link
Collaborator

rigelk commented Aug 23, 2018

Well, they are two different kind of "hard" 😛

I'm not sure we want to check everything either ; checking for faststart and proper codec use via ffprobe should be enough.

@utack
Copy link

utack commented Jan 9, 2019

Maybe this helps you
https://github.com/tdaede/dve
Might however not be a method that is usable, as it includes usage of ssh

@Nutomic
Copy link
Contributor

Nutomic commented Jan 9, 2019

@utack It can only generate mkv containers, which isn't very helpful for Peertube. Plus it hasn't been maintained in years.

@njourdane
Copy link
Contributor

I totally support this issue.

The is a real use-case where a federation of video producers wants to create many PeerTube instances, but use a unique transcoding server in order to reduce instance hosting costs.

@Nutomic
Copy link
Contributor

Nutomic commented Nov 28, 2019

This should actually be pretty easy to implement with ssh. I'm sure there is some ssh library that could be used. Basically, Peertube would do this:

scp video-to-transcode.mp4 transcoding-host:~
ssh transcoding-host ffmpeg ...
scp transcoding-host:transcoded-video.mp4 .

@rigelk
Copy link
Collaborator

rigelk commented Nov 28, 2019

scp video-to-transcode.mp4 transcoding-host:~
ssh transcoding-host ffmpeg ...
scp transcoding-host:transcoded-video.mp4 .

That would require to translate all the logic we have in https://github.com/Chocobozzz/PeerTube/blob/develop/server/helpers/ffmpeg-utils.ts to bash, so not so easy. An intermediate, easier-than-translating-to-bash step before your solution becomes possible, would be to decouple that logic in a standalone module (keeping it in Typescript), so that we could install it on the remote worker and just send it the job parameters. Plex does this with https://github.com/wnielson/Plex-Remote-Transcoder because it has a re-usable binary for the transcoding part, for instance.

@Nutomic
Copy link
Contributor

Nutomic commented Nov 28, 2019

Doesnt the library execute an ffmpeg command via bash anyway. It should be possible to get that command out, maybe with a patch.

The other option is to install nodejs on the worker, and have it execute the same library.

@olragon
Copy link

olragon commented Dec 24, 2019

Doesnt the library execute an ffmpeg command via bash anyway. It should be possible to get that command out, maybe with a patch.

The other option is to install nodejs on the worker, and have it execute the same library.

ffmpeg('/path/to/file.avi')
  .on('start', function(commandLine) {
    console.log('Spawned Ffmpeg with command: ' + commandLine);
  });

https://github.com/fluent-ffmpeg/node-fluent-ffmpeg#start-ffmpeg-process-started

@JohnXLivingston
Copy link
Contributor

In this forum post https://framacolibri.org/t/adding-a-new-resolution-to-an-existing-video-playlist/6247 , plhardy made this script with remote transcoding: https://framagit.org/artlog/piretubehack
It still is work in progress.

@kontrollanten
Copy link
Contributor

May it be a first step to just do the transcoding in the browser? https://github.com/ffmpegwasm/ffmpeg.wasm seems to be a pretty easy way to solve that. It's not a perfect user experience, but at least it's a way to avoid heavy weight on the server side.

@ghost
Copy link
Author

ghost commented Dec 2, 2020 via email

@johanfleury
Copy link
Contributor

johanfleury commented Dec 26, 2020

Hey, just adding my use-case to the discussion: I run Peertube in Kubernetes with a limited amount of resources and, at the moment, ffmpeg always end up OOM killed as it try to get as much memory as it think possible (regardless of its cgroup’s limit).

Having remote workers would allow me to spawn pods with much higher memory and CPU limits, or even use dedicated VMs to do the transcoding.

@zblace
Copy link

zblace commented Jan 10, 2021

Would it be possible theoretically to imagine and technically plan use a kind of 'proxy' transcoding via proxy-partner-server...for example if I have my (public-domain) video uploaded in high resolution first to Wikimedia Commons or Archive.org, then it is shared to my (to weak to transcode) PeerTube server?

@johanfleury
Copy link
Contributor

@rigelk, I was looking at #3383 and while I think this is a good first step (thanks for you work), I was wondering if this couldn’t be implemented as plugins?

I was thinking of something similar to what Nextcloud does with server side encryption: part of it is implemented in its core, part is delegated to plugins and NC ships with a default one. In PeerTube, the default plugin could do transcoding in a subprocess, the same way it’s currently implemented.

On the reasons why I think using plugins would be a better solution is that it would be easier to share and it would ease maintenance as PeerTube already provides everything needed to discover, install and upgrade plugins. Also, in environment like Docker or Kubernetes, I feel like it’ll be easier to install a plugin rather than to extend the container image or to mount a script in the container instance.

There’s a few issues that arised to me while writing this comment:

  • What if there are multiple transcoding plugins enabled at the same time?
  • Should PeerTube ship with that default plugin, or should we let the administrator install it?

@rigelk
Copy link
Collaborator

rigelk commented Jan 16, 2021

@johanfleury thanks for noticing 🙂

Also, in environment like Docker or Kubernetes, I feel like it’ll be easier to install a plugin rather than to extend the container image or to mount a script in the container instance.

Sadly true.

What if there are multiple transcoding plugins enabled at the same time?

Very interesting question! We don't have locks on plugin hooks, but that would deserve its own issue. Could you open it, citing your example?

I was thinking of something similar to what Nextcloud does with server side encryption: part of it is implemented in its core, part is delegated to plugins and NC ships with a default one. In PeerTube, the default plugin could do transcoding in a subprocess, the same way it’s currently implemented.

I don't think moving the core transcoding to a plugin makes as much sense as you think. Developing within/for PeerTube is easier in great part with the proper type inference, which is lost anyway once you execute a ffmpeg command. My work only allows a plugin to replace the ffmpeg executable, effectively giving them their own context in the language that they want. They can but are not forced to rely on NodeJS, for which at that stage parsing ffmpeg options is more interesting than having type definitons from PeerTube core 🙂

@pierreozoux
Copy link

pierreozoux commented Apr 2, 2021

Having this feature on the server side would be needed by big instances, that's for sure.
(Using browsers to do so is interesting on the paper, but.. let's be realistic and pragmatic here :) )
(Asking the end user to do the right thing is a workaround, but if we want a nice software that regular people want to use, we have to help them by reencoding.)

I think that #3661 (ObjectStore Support) would help a lot in that regard.
Then, you don't have to pass around the video, but can reference it.

And then in term of implementation :) I have 3 ideas, and each has its pro and cons I guess.

1. Delayed jobs

In rails there is this concept. Usually, when you have a web app, you need 2 things:

  • serve http requests
  • do some processing in the backend (like sending an email, making a backup..)

In rails, there is this concept of delayed jobs and usually is implemented with sidekiq. If you know discourse, it is what they have, an app server and sidekiq.

The nice thing is that you can scale both independently.

As a dev, you just create a job, and then this process kicks in when available. And it has all the logic of retrying failed ones and so on.

Here some potential library:
https://github.com/topics/job-queue?l=javascript

And this doesn't need to only be for transcoding, it can be for email, or.. Anything that would make the http request slower and can be done async.

2. kubernetes native

Kubernetes is becoming the cloud API. And kubernetes as a concept of jobs as well.
So it would be possible to create native kubernetes jobs from the peertube app, and monitor them and so on.
The problem with that approach is that not everybody has kubernetes :)

3. Event driven

I'm quiet fascinated by the "serverless" movement (or amazon lambda). I think it has 3 components:

  • function as a service
  • event driven
  • same code complexity if you serve no users, 1 user, or 1Million users. (and "cloud" cost that scale with it)

If we use s3 or ceph rado gateway you can get notifications.

This is the classic "serverless" use case. User uploads a video to the objectStore (in the incoming folder for instance), once upload is done, an event is fired, and then a worker works on this event, and moves the video from the incoming folder to the public folder for instance.

I find it really elegant solution as these 2 functions "upload" and "transcode" are really appart, well identified and have a clear "API" to discuss together.

But again, not everybody uses objectStore (and not all objectstore have notification).. so then you have to maintain 2 logics, if it is a small server or a bigger one.. (sometimes it is easier to build one product on one infra, than building a very configurable solution..)

Given all of this, I think the first option is the way to go.

Hope it helps!

@mikekasprzak
Copy link

Just adding my support to this, though admittedly we haven't started using PeerTube yet, and my use case might be more for live streaming.

Context: I run a community/event for game makers. Lately Twitch has been doing some really stupid stuff, banning some of our users and events like ours for laughable reasons. We do have the option of recommending that folks "switch-to-YouTube", but I'd like to provide a 3rd option.

In our case, we have fixed times throughout the year when we're busy: when we run events. If I can spin-up some helper encoding servers for a few weeks, then shut them down, that would go a long way towards helping us scale.

@Benau
Copy link

Benau commented May 25, 2021

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <thread>
#include <sstream>
#include <stdarg.h>
#include <syslog.h>

std::string addr = "ffmpeg@x.x.x.x";
std::string scp_port = "12345";
std::string work_dir = "/home/ffmpeg";
std::string ssh_cmd = std::string("ssh -p ") + scp_port + " " +  addr;
std::string scp_path = addr + ":" + work_dir + "/";

std::string g_real_output;

std::string getBasename(const std::string& filename)
{
    for (int i = int(filename.size()) - 1; i >= 0; --i)
    {
        if (filename[i] == '/' || filename[i] == '\\')
            return filename.substr(i + 1);
    }
    return filename;
}

std::string handleOutput(const std::string& out)
{
    if (!g_real_output.empty())
        return out;
    if (out.size() > 3 && (out[0] == '/' || out.substr(0, 2) == "./" || out.substr(0, 3) == "../") &&
        out.find('.') != std::string::npos)
    {
        g_real_output = out;
        syslog(LOG_INFO, "out %s", g_real_output.c_str());
        return work_dir + "/" + getBasename(out);
    }
    return out;
}

int main(int argc, char* argv[])
{
    std::string input_file;
    int ret = -1;

    std::string cmd;
    FILE* output = NULL;
    std::stringstream stm;
    std::string output_file;

    for (int i = 1; i < argc; i++)
    {
        std::string arg = argv[i];
        size_t dot = arg.find_last_of(".");
        if (dot != std::string::npos && dot != arg.size() - 1)
        {
            if (arg.substr(dot + 1) == "png" || arg.substr(dot + 1) == "jpg")
            {
                syslog(LOG_INFO, "system_ffmpeg %s", argv[i]);
                goto system_ffmpeg;
            }
        }
    }

    for (int i = 1; i < argc; i++)
    {
        FILE* file = NULL;
        std::string argv_add;
        if (file = fopen(argv[i], "r"))
        {
            fclose(file);
            input_file = argv[i];
            syslog(LOG_INFO, "input %s", input_file.c_str());
            system((std::string("scp -P ") + scp_port + " " + input_file + " " + scp_path).c_str());
            input_file = getBasename(input_file);
            argv_add = work_dir + "/" + input_file;
        }
        else
            argv_add = handleOutput(argv[i]);
        if (argv_add.find('(') != std::string::npos)
        {
            argv_add.insert(0, "\"");
            argv_add += '"';
        }
        cmd += argv_add;
        if (i != argc -1)
            cmd += " ";
    }

    syslog(LOG_INFO, "cmdline %s", cmd.c_str());

    ret = system((ssh_cmd + " \'ffmpeg " + cmd + "\'").c_str());
    if (input_file.empty())
        return ret;

    output = popen((ssh_cmd + " " + std::string("ls ") + work_dir).c_str(), "r");
    if (output)
    {
        constexpr std::size_t MAX_LINE_SZ = 1024 ;
        char line[MAX_LINE_SZ] ;
        while(fgets(line, MAX_LINE_SZ, output)) stm << line;
        pclose(output);
    }

    while (std::getline(stm, output_file, '\n'))
    {
        if (output_file.empty() || output_file == input_file)
            continue;
        for (int i = 1; i < argc; i++)
        {
            if (output_file == argv[i] || output_file == getBasename(g_real_output))
            {
                std::string real_output = g_real_output.empty() ? std::string(" .") : std::string(" ") + g_real_output;
                ret = system((std::string("scp -P ") + scp_port + " " + addr + ":" + work_dir + "/" + output_file +
                    real_output).c_str());
                break;
            }
        }
    }
    return ret;

system_ffmpeg:
    cmd = "ffmpeg ";
    for (int i = 1; i < argc; i++)
    {
        std::string argv_add = argv[i];
        if (argv_add.find('(') != std::string::npos)
        {
            argv_add.insert(0, "\"");
            argv_add += '"';
        }
        cmd += argv_add;
        if (i != argc -1)
            cmd += " ";
    }
    return system(cmd.c_str());
}

It only calls remote ffmpeg (scp encode) for video files, png (thumbnail) will use local ffmpeg instead

Need to edit peertube to allow using ffmpeg of custom name (or you can swap remote_ffmpeg and ffmpeg if ffmpeg is only used by peertube in your server)

I use this in my server with transcoding disabled, and I manually run transcoding job if seeing new videos uploaded

@MCDuQuesne

This comment has been minimized.

@ghost

This comment has been minimized.

@johanfleury

This comment has been minimized.

@Tr4sK
Copy link

Tr4sK commented Jun 22, 2022

@JacksonChen666 good for you ?

@JacksonChen666
Copy link
Contributor

@Tr4sK LGTM IG

@Agorise
Copy link

Agorise commented Jun 29, 2022

I run a site wit 180TB of videos. We have got to get remote transcoding soon if possible. This github thread seems to be a dead end and @Chocobozzz has removed this feature from the To do in Roadmap 2022. Did the feature get scrapped? Please advise as to where I can follow along with remote transcoding.. Please and thanx in advance :)

@Nalem14
Copy link

Nalem14 commented Sep 14, 2022

Hi, I have seen that on the peertube doc
image
https://docs.joinpeertube.org/contribute-architecture?id=the-cachejob-queue

But also read that on Bull repo
image
OptimalBits/bull#1213

Do I make mistake ?

@Kinuseka
Copy link

Kinuseka commented Dec 9, 2022

https://github.com/joshuaboniface/rffmpeg

This is an interesting project. Maybe peertube can utilize it?

@JacksonChen666
Copy link
Contributor

rffmpeg is a remote FFmpeg wrapper used to execute FFmpeg commands on a remote server via SSH.

(emphasis added by me)

maybe it could work, but it also seems limited and made for a specific use case (transcoding media server things) after skimming the README.

@WingsLikeEagles
Copy link
Contributor

This is really important to implement for anything bigger than simple home use. I want to recommend this to several organizations, but the lack of this feature kills the idea. Without being able to offload transcoding, it is just not workable for large organizations.

@JacksonChen666
Copy link
Contributor

I want to recommend this to several organizations, but the lack of [remote transcoding] kills the idea. Without being able to offload transcoding, it is just not workable for large organizations.

why is it that it's just not possible because of the lack of remote transcoding? do the large organizations have a very special need for their videos to transcode faster instead of longer? would they be OK with transcoding taking longer or scaling up the CPU?

@boomaker
Copy link

I want to recommend this to several organizations, but the lack of [remote transcoding] kills the idea. Without being able to offload transcoding, it is just not workable for large organizations.

why is it that it's just not possible because of the lack of remote transcoding? do the large organizations have a very special need for their videos to transcode faster instead of longer? would they be OK with transcoding taking longer or scaling up the CPU?

It's not only that. It's impossible actually to scale PeerTube horizontally because of all the transcoding process.
I.e if deployed in kubernetes, the transcoding and all the temporary files are in one of the containers: data could be lost, and the Redis batching does not handle the instance in which files are located.

@Agorise

This comment was marked as off-topic.

@Kinuseka
Copy link

This should be the top priority on the next 6.0.0 update.

  • Even if transcoding runs at lower priority threads, at a given situation, the webserver and API uses CPU on a higher priority thread. Federation of videos also take up CPU processing time, and when you are following a huge instance, this often hinder the transcoding of videos.

  • Remote Transcoding allows the server instance to be highly scalable. This is usually an advantage that allows you to take advantage of multiple machines at once rather than be limited at the mercy of a single machine.

  • Horizontal scaling is generally preferred for a smooth and better user experience. And it is also much better maintenance wise.

@Kinuseka

This comment was marked as off-topic.

@Chocobozzz
Copy link
Owner

Chocobozzz commented Jan 11, 2023

We had a funding for this feature so we'll implement it in a few months. We'll officially announce it and detail the use cases soon.

@Chocobozzz
Copy link
Owner

Implemented in #5769

PeerTube runner doc on https://docs.joinpeertube.org/maintain/tools#peertube-runner
Miss the runner registration process admin documentation

@johanfleury
Copy link
Contributor

This looks awesome, thank you @Chocobozzz for all the hard work you do!

@Chocobozzz
Copy link
Owner

Video demo available on https://peertube2.cpy.re/w/oJwHHYwt4oKjKhLNh2diAY

@johanfleury
Copy link
Contributor

Hey @Chocobozzz I’m working on an implementation of the runner for Kubernetes, and I just have a quick feedback/question. Do you think the POST /api/v1/runners/jobs/request endpoint could be made into a streaming endpoint?

I was thinking of something like Mastodon’s timelines API.

It would prevent the runner server from having to poll the API every now and then.

@Chocobozzz
Copy link
Owner

Hi, there is an undocumented socket.io endpoint available on instance.tld/runners

You can see the implementation on the server: https://github.com/Chocobozzz/PeerTube/blob/develop/server/lib/peertube-socket.ts#L60
And the client implementation: https://github.com/Chocobozzz/PeerTube/blob/develop/packages/peertube-runner/server/server.ts#L117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Transcoding Priority: Roadmap Feature planned to be developed in the annual roadmap Type: Feature Request ✨
Projects
None yet
Development

No branches or pull requests