-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for saving video files to object storage #4290
Add support for saving video files to object storage #4290
Conversation
I already started working on the new documentation: https://framagit.org/pingiun/documentation/-/blob/pingiun-master-patch-80848/admin-remote-storage.md |
Elegant! Impressive work.
Can we add a video state called "processing" (or something similar) until the upload to s3 is done? That would mean that the video isn't available until the files are in the bucket. A future request would be the possibility to set the s3 download base URL. That way it would be possible to set the file URL to a CDN, to reduce the costs. |
This might actually be the most elegant solution, good idea. |
Isn't it already possible to do this by caching the /w/xyz and streaming-playlists portion of the URL using varnish or nuster or similar? |
Today that's possible, but as I understand this PR it will change the video file URL so that the browser will download it directly from S3 (correct me if I'm wrong, @pingiun ). |
This is correct |
Then there needs to be a way to cache the video. This isn't a good idea. Cache mirrors are cheap. I can get high bandwidth servers for a fairly cheap price. Direct S3 would be many times more expensive. |
With these changes it's still possible to use the old method of s3 storage via s3fs. Adding the suggestion from @kontrollanten should be fairly simple though, I can try to add that to this PR |
@SimplyCorbett Can you take a look at 26cabf2 and see if this change adds the right support for you? |
Changed in 03c6d12 |
Have you any ideas about how to solve this? I'm thinking about the following solution
What do you think about that? |
Hello and thanks @pingiun Since it's still a draft, here's some code remarks:
|
Chocobozzz#4290 (comment) The remarks in question: Try to use objectStorage prefix instead of s3 prefix for your function/variables/config names Prefer to use a tree for the config: s3.streaming_playlists_bucket -> object_storage.streaming_playlists.bucket Use uppercase for config: S3.STREAMING_PLAYLISTS_BUCKETINFO.bucket -> OBJECT_STORAGE.STREAMING_PLAYLISTS.BUCKET (maybe BUCKET_NAME instead of BUCKET) I suggest to rename moveJobsRunning to pendingMovingJobs (or better, create a dedicated videoJobInfo table with a pendingMove & videoId columns so we could also use this table to track pending transcoding jobs) https://github.com/Chocobozzz/PeerTube/pull/4290/files#diff-3e26d41ca4bda1de8e1747af70ca2af642abcc1e9e0bfb94239ff2165acfbde5R19 uses a string instead of an integer I think we should store the origin object storage URL in fileUrl, without base_url injection. Instead, inject the base_url at "runtime" so admins can easily change this configuration without running a script to update DB URLs
@Chocobozzz thanks for the good suggestions, I implemented them all in 105cc1f Do you have an idea at what places the file should be locally available? This is my first time working on this codebase so not everything is in my head yet. I'm not sure in what circumstances transcodes can run for example, so I don't exactly know how I can make sure the lifecycle of local files stays correct. |
@pingiun this was tested with files larger than 5 gigabytes? |
Thanks for telling me about this limit, I didn't know before that the official SDK would not split up when sending large files. I added basic support for this in ee4c80a, but it might require some more thinking about aborting multipart requests when something went wrong so parts aren't left in the bucket. Another option would just to request the user to have a lifecycle rule setup in the bucket that deletes old object parts. As can be seen in the commit I made a hacky addition to the ReadStream to make streaming to the API work. I also created a PR at the official s3 sdk repo so that we might change this back in the future. |
ea8917b
to
ee4c80a
Compare
I think you can put the files in https://github.com/Chocobozzz/PeerTube/blob/develop/config/production.yaml.example#L78 |
I'm not sure why the tests are failing, the same thing is happening when I run the tests locally.
I think you may have misunderstood, I meant which places in the code. So which parts in the code need to have the files stored locally, and need a change to support files that may not exist locally? |
Oh sorry, then I think everywhere we use
The server build fails because of the |
Thanks, I didn't catch that. I see that module was only added in Node 15.0 so I will try to remove the usage |
Uses two config keys to support url generation that doesn't directly go to (compatible s3). Can be used to generate urls to any cache server or CDN.
I'll update this PR, thanks @pingiun |
39eb385
to
2b71af9
Compare
PR is ready @pingiun Could you open a MR for the documentation? |
|
||
const createMultipartCommand = new CreateMultipartUploadCommand({ | ||
Bucket: bucketInfo.BUCKET_NAME, | ||
Key: key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be setting ACL
here, as uploads are private by default for a lot of S3 providers.
Something like this:
Key: key | |
Key: key, | |
ACL: "public-read" |
I'm happy to follow this up in a separate PR. For now I'm just setting the bucket ACL to public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can review this PR and I'll resolve the conversations directly on develop :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only tested with backblaze b2 before, this does indeed seem to be necessary for AWS S3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this a bit more, the ACL can likely be handled by the bucket policy.
We could follow up to specifically handle private buckets, pre-signing each URL before handing it off to the client.
Here's an example policy that disables file listing but allows retrieving objects:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
@Chocobozzz it appears through my testing that you changed something where local files are needed after deleting them. I now see the following in my logs:
So files are uploaded, deleted and then downloaded again. This wasn't happening in my code and doesn't seem very efficient (especially for large uploads) |
Another thing that now doesn't work anymore is updating the endpoint url, video's will still be loaded from the old url because it's saved in the database. My code would always generate the URL from scratch (following your suggestion) |
In my comment I only talked about
Thanks, it should be fixed in 1f6125b |
A nice extension to this could be pre-signing the upload URL and passing it to the client, so they can upload directly to object storage without needing to proxy via Peertube. Peertube could wait until the upload's finished, and then pull it from object storage for transcoding. |
Of course, this makes sense. I created the MR for documentation here: https://framagit.org/framasoft/peertube/documentation/-/merge_requests/98 |
Perfect, I'll also update the MR when I'll time. Thanks @pingiun |
Description
I've reworked this PR a bit so here is the new description. I've kept the old description included below.
This PR adds a job that normally runs after transcoding (unless disabled) which uploads video files to an s3 compatible object store. This improves on the current situation because files are uploaded in a job instead of a request, and they use the native APIs to do this, which makes sure files aren't being rewritten. Some model fields are added to keep track of the moving state and to make sure the last part of the job is only run once.
The
movingJobsRunning
field in the video model is incremented whenever a transcode job is created and only decremented once the uploads for the created videoFile are done. Then, the new job can detect when the last videoFile was uploaded becausemovingJobsRunning
will be 0 and the cleanup part of the job can run. This uploads the master HLS playlist, removes the files from the local disk, and sets the video to published.The
storage
field on videoFile and videoStreamingPlaylist is used to create the correct url to send in the API. The client then correctly loads the files from object storage (als long as CORS is correctly configured on the bucket). Thestreaming_playlists_url_template
config field is used for custom URL generation, this is explained in the comment in default.yamlOld PR text before rework
This adds a job that runs after all transcodes are done that uploads all the video files to object storage (s3 compatible). New fields are added to the video, videoFile, and videoStreamingPlaylist model to keep track of transcodes and where files are stored.The
transcodeJobsRunning
field in the video model is incremented whenever a transcode job is created and decremented whenever one finishes. The job that moves files to object storage is added after each transcode job is finished, but only runs whentranscodeJobsRunning
is 0 so that all files can be uploaded at once.The
storage
field on videoFile and videoStreamingPlaylist is used to create the correct url to send in the API. The client then correctly loads the files from object storage (als long as CORS is correctly configured on the bucket).Currently the old files are not removed, because I wanted to ask feedback on how to best do this. In principle the job that moves files to object storage could delete the files right after the files are uploaded, because we are sure that no transcoding jobs are running anymore. However it might be the case that a user is currently watching a video that is streaming from those files, and I think removing the files might cause the stream to drop.
In addition to this I also need to implement a mechanism to get the files back to the local filesystem whenever local operations need to be done. I think this is only the case when transcoding but I'm not sure about that.
TODO (need help):
Related issues
Fixes #3661
Has this been tested?