-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for transcoding by remote workers #947
Comments
A simpler way to implement is for clients to transcode first and then let the server decide if the video is properly transcoded. But adding workers is another level of implementation. It's also something we need to define, because we have multiple possibilities: parallel transcoding of chunks of the same video, or transcoding of different videos. It also means the workers have to register at the PeerTube instance, and the instance has to monitor their progress with the trancoding tasks it gives them. Arguably, that's way more work than just making the client and server agree on whether or not the video needs additional trancoding. |
I'm interested in client-side transcodes too, but I thought that seemed much harder?
|
Well, they are two different kind of "hard" 😛 I'm not sure we want to check everything either ; checking for faststart and proper codec use via ffprobe should be enough. |
Maybe this helps you |
@utack It can only generate mkv containers, which isn't very helpful for Peertube. Plus it hasn't been maintained in years. |
I totally support this issue. The is a real use-case where a federation of video producers wants to create many PeerTube instances, but use a unique transcoding server in order to reduce instance hosting costs. |
This should actually be pretty easy to implement with ssh. I'm sure there is some ssh library that could be used. Basically, Peertube would do this:
|
That would require to translate all the logic we have in https://github.com/Chocobozzz/PeerTube/blob/develop/server/helpers/ffmpeg-utils.ts to bash, so not so easy. An intermediate, easier-than-translating-to-bash step before your solution becomes possible, would be to decouple that logic in a standalone module (keeping it in Typescript), so that we could install it on the remote worker and just send it the job parameters. Plex does this with https://github.com/wnielson/Plex-Remote-Transcoder because it has a re-usable binary for the transcoding part, for instance. |
Doesnt the library execute an ffmpeg command via bash anyway. It should be possible to get that command out, maybe with a patch. The other option is to install nodejs on the worker, and have it execute the same library. |
https://github.com/fluent-ffmpeg/node-fluent-ffmpeg#start-ffmpeg-process-started |
In this forum post https://framacolibri.org/t/adding-a-new-resolution-to-an-existing-video-playlist/6247 , plhardy made this script with remote transcoding: https://framagit.org/artlog/piretubehack |
May it be a first step to just do the transcoding in the browser? https://github.com/ffmpegwasm/ffmpeg.wasm seems to be a pretty easy way to solve that. It's not a perfect user experience, but at least it's a way to avoid heavy weight on the server side. |
transcoding in wasm is going to be especially slow until browsers and wasm
have SIMD support, and they might need improvements in multithreading too.
setting aside efficiency concerns, there are also security and correctness
implications in taking pre-encoded video from clients. most of all though,
I don't know if folks are going to want to leave one browser tab open for
hours or possibly days in order to get a video published
…On Fri, Oct 30, 2020 at 7:37 AM kontrollanten ***@***.***> wrote:
May it be a first step to just do the transcoding in the browser?
https://github.com/ffmpegwasm/ffmpeg.wasm seems to be a pretty easy way
to solve that. It's not a perfect user experience, but at least it's a way
to avoid heavy weight on the server side.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#947 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHBRJGXZQ6EVGJR3UHPYKTSNLFTJANCNFSM4FRBAFHQ>
.
|
Hey, just adding my use-case to the discussion: I run Peertube in Kubernetes with a limited amount of resources and, at the moment, ffmpeg always end up OOM killed as it try to get as much memory as it think possible (regardless of its cgroup’s limit). Having remote workers would allow me to spawn pods with much higher memory and CPU limits, or even use dedicated VMs to do the transcoding. |
Would it be possible theoretically to imagine and technically plan use a kind of 'proxy' transcoding via proxy-partner-server...for example if I have my (public-domain) video uploaded in high resolution first to Wikimedia Commons or Archive.org, then it is shared to my (to weak to transcode) PeerTube server? |
@rigelk, I was looking at #3383 and while I think this is a good first step (thanks for you work), I was wondering if this couldn’t be implemented as plugins? I was thinking of something similar to what Nextcloud does with server side encryption: part of it is implemented in its core, part is delegated to plugins and NC ships with a default one. In PeerTube, the default plugin could do transcoding in a subprocess, the same way it’s currently implemented. On the reasons why I think using plugins would be a better solution is that it would be easier to share and it would ease maintenance as PeerTube already provides everything needed to discover, install and upgrade plugins. Also, in environment like Docker or Kubernetes, I feel like it’ll be easier to install a plugin rather than to extend the container image or to mount a script in the container instance. There’s a few issues that arised to me while writing this comment:
|
@johanfleury thanks for noticing 🙂
Sadly true.
Very interesting question! We don't have locks on plugin hooks, but that would deserve its own issue. Could you open it, citing your example?
I don't think moving the core transcoding to a plugin makes as much sense as you think. Developing within/for PeerTube is easier in great part with the proper type inference, which is lost anyway once you execute a ffmpeg command. My work only allows a plugin to replace the ffmpeg executable, effectively giving them their own context in the language that they want. They can but are not forced to rely on NodeJS, for which at that stage parsing ffmpeg options is more interesting than having type definitons from PeerTube core 🙂 |
Having this feature on the server side would be needed by big instances, that's for sure. I think that #3661 (ObjectStore Support) would help a lot in that regard. And then in term of implementation :) I have 3 ideas, and each has its pro and cons I guess. 1. Delayed jobsIn rails there is this concept. Usually, when you have a web app, you need 2 things:
In rails, there is this concept of delayed jobs and usually is implemented with sidekiq. If you know discourse, it is what they have, an app server and sidekiq. The nice thing is that you can scale both independently. As a dev, you just create a job, and then this process kicks in when available. And it has all the logic of retrying failed ones and so on. Here some potential library: And this doesn't need to only be for transcoding, it can be for email, or.. Anything that would make the http request slower and can be done async. 2. kubernetes nativeKubernetes is becoming the cloud API. And kubernetes as a concept of jobs as well. 3. Event drivenI'm quiet fascinated by the "serverless" movement (or amazon lambda). I think it has 3 components:
If we use s3 or ceph rado gateway you can get notifications. This is the classic "serverless" use case. User uploads a video to the objectStore (in the incoming folder for instance), once upload is done, an event is fired, and then a worker works on this event, and moves the video from the incoming folder to the public folder for instance. I find it really elegant solution as these 2 functions "upload" and "transcode" are really appart, well identified and have a clear "API" to discuss together. But again, not everybody uses objectStore (and not all objectstore have notification).. so then you have to maintain 2 logics, if it is a small server or a bigger one.. (sometimes it is easier to build one product on one infra, than building a very configurable solution..) Given all of this, I think the first option is the way to go. Hope it helps! |
Just adding my support to this, though admittedly we haven't started using PeerTube yet, and my use case might be more for live streaming. Context: I run a community/event for game makers. Lately Twitch has been doing some really stupid stuff, banning some of our users and events like ours for laughable reasons. We do have the option of recommending that folks "switch-to-YouTube", but I'd like to provide a 3rd option. In our case, we have fixed times throughout the year when we're busy: when we run events. If I can spin-up some helper encoding servers for a few weeks, then shut them down, that would go a long way towards helping us scale. |
#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <thread>
#include <sstream>
#include <stdarg.h>
#include <syslog.h>
std::string addr = "ffmpeg@x.x.x.x";
std::string scp_port = "12345";
std::string work_dir = "/home/ffmpeg";
std::string ssh_cmd = std::string("ssh -p ") + scp_port + " " + addr;
std::string scp_path = addr + ":" + work_dir + "/";
std::string g_real_output;
std::string getBasename(const std::string& filename)
{
for (int i = int(filename.size()) - 1; i >= 0; --i)
{
if (filename[i] == '/' || filename[i] == '\\')
return filename.substr(i + 1);
}
return filename;
}
std::string handleOutput(const std::string& out)
{
if (!g_real_output.empty())
return out;
if (out.size() > 3 && (out[0] == '/' || out.substr(0, 2) == "./" || out.substr(0, 3) == "../") &&
out.find('.') != std::string::npos)
{
g_real_output = out;
syslog(LOG_INFO, "out %s", g_real_output.c_str());
return work_dir + "/" + getBasename(out);
}
return out;
}
int main(int argc, char* argv[])
{
std::string input_file;
int ret = -1;
std::string cmd;
FILE* output = NULL;
std::stringstream stm;
std::string output_file;
for (int i = 1; i < argc; i++)
{
std::string arg = argv[i];
size_t dot = arg.find_last_of(".");
if (dot != std::string::npos && dot != arg.size() - 1)
{
if (arg.substr(dot + 1) == "png" || arg.substr(dot + 1) == "jpg")
{
syslog(LOG_INFO, "system_ffmpeg %s", argv[i]);
goto system_ffmpeg;
}
}
}
for (int i = 1; i < argc; i++)
{
FILE* file = NULL;
std::string argv_add;
if (file = fopen(argv[i], "r"))
{
fclose(file);
input_file = argv[i];
syslog(LOG_INFO, "input %s", input_file.c_str());
system((std::string("scp -P ") + scp_port + " " + input_file + " " + scp_path).c_str());
input_file = getBasename(input_file);
argv_add = work_dir + "/" + input_file;
}
else
argv_add = handleOutput(argv[i]);
if (argv_add.find('(') != std::string::npos)
{
argv_add.insert(0, "\"");
argv_add += '"';
}
cmd += argv_add;
if (i != argc -1)
cmd += " ";
}
syslog(LOG_INFO, "cmdline %s", cmd.c_str());
ret = system((ssh_cmd + " \'ffmpeg " + cmd + "\'").c_str());
if (input_file.empty())
return ret;
output = popen((ssh_cmd + " " + std::string("ls ") + work_dir).c_str(), "r");
if (output)
{
constexpr std::size_t MAX_LINE_SZ = 1024 ;
char line[MAX_LINE_SZ] ;
while(fgets(line, MAX_LINE_SZ, output)) stm << line;
pclose(output);
}
while (std::getline(stm, output_file, '\n'))
{
if (output_file.empty() || output_file == input_file)
continue;
for (int i = 1; i < argc; i++)
{
if (output_file == argv[i] || output_file == getBasename(g_real_output))
{
std::string real_output = g_real_output.empty() ? std::string(" .") : std::string(" ") + g_real_output;
ret = system((std::string("scp -P ") + scp_port + " " + addr + ":" + work_dir + "/" + output_file +
real_output).c_str());
break;
}
}
}
return ret;
system_ffmpeg:
cmd = "ffmpeg ";
for (int i = 1; i < argc; i++)
{
std::string argv_add = argv[i];
if (argv_add.find('(') != std::string::npos)
{
argv_add.insert(0, "\"");
argv_add += '"';
}
cmd += argv_add;
if (i != argc -1)
cmd += " ";
}
return system(cmd.c_str());
} It only calls remote ffmpeg (scp encode) for video files, png (thumbnail) will use local ffmpeg instead Need to edit peertube to allow using ffmpeg of custom name (or you can swap remote_ffmpeg and ffmpeg if ffmpeg is only used by peertube in your server) I use this in my server with transcoding disabled, and I manually run transcoding job if seeing new videos uploaded |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@JacksonChen666 good for you ? |
@Tr4sK LGTM IG |
I run a site wit 180TB of videos. We have got to get remote transcoding soon if possible. This github thread seems to be a dead end and @Chocobozzz has removed this feature from the To do in Roadmap 2022. Did the feature get scrapped? Please advise as to where I can follow along with remote transcoding.. Please and thanx in advance :) |
Hi, I have seen that on the peertube doc But also read that on Bull repo Do I make mistake ? |
https://github.com/joshuaboniface/rffmpeg This is an interesting project. Maybe peertube can utilize it? |
(emphasis added by me) maybe it could work, but it also seems limited and made for a specific use case (transcoding media server things) after skimming the README. |
This is really important to implement for anything bigger than simple home use. I want to recommend this to several organizations, but the lack of this feature kills the idea. Without being able to offload transcoding, it is just not workable for large organizations. |
why is it that it's just not possible because of the lack of remote transcoding? do the large organizations have a very special need for their videos to transcode faster instead of longer? would they be OK with transcoding taking longer or scaling up the CPU? |
It's not only that. It's impossible actually to scale PeerTube horizontally because of all the transcoding process. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This should be the top priority on the next 6.0.0 update.
|
This comment was marked as off-topic.
This comment was marked as off-topic.
We had a funding for this feature so we'll implement it in a few months. We'll officially announce it and detail the use cases soon. |
Implemented in #5769 PeerTube runner doc on https://docs.joinpeertube.org/maintain/tools#peertube-runner |
This looks awesome, thank you @Chocobozzz for all the hard work you do! |
Video demo available on https://peertube2.cpy.re/w/oJwHHYwt4oKjKhLNh2diAY |
Hey @Chocobozzz I’m working on an implementation of the runner for Kubernetes, and I just have a quick feedback/question. Do you think the I was thinking of something like Mastodon’s timelines API. It would prevent the runner server from having to poll the API every now and then. |
Hi, there is an undocumented socket.io endpoint available on You can see the implementation on the server: https://github.com/Chocobozzz/PeerTube/blob/develop/server/lib/peertube-socket.ts#L60 |
I'm trying to run a peertube instance with a large volume of 1080p60 content and the transcode bottleneck is significant. One potentially easy way to improve this when additional CPU power is available on the local network could be to scale out the transcode with a worker daemon.
I'm opening this in the hope of a discussion for a future enhancement, and to keep track of that work.
In my case I'm imagining the workers being located in the same data center, but it seems like this could be extended to geographically diverse or even crowdsourced transcode sites.
Personally I'd be happy with a solution where the remote workers run a userspace daemon that manages ffmpeg with little to no local storage, accessing the peertube server's disks either over an existing network storage protocol or existing media streaming protocol.
I'm curious what takes other folks have. I know this is unlikely to be interesting for people running peertube on a small VPS, unless the transcode workers can be elsewhere on cheaper hosting (like at your house, on DSL).
The text was updated successfully, but these errors were encountered: