Add support for saving video files to object storage #4290

pingiun · 2021-07-30T11:44:15Z

Description

I've reworked this PR a bit so here is the new description. I've kept the old description included below.

This PR adds a job that normally runs after transcoding (unless disabled) which uploads video files to an s3 compatible object store. This improves on the current situation because files are uploaded in a job instead of a request, and they use the native APIs to do this, which makes sure files aren't being rewritten. Some model fields are added to keep track of the moving state and to make sure the last part of the job is only run once.

The movingJobsRunning field in the video model is incremented whenever a transcode job is created and only decremented once the uploads for the created videoFile are done. Then, the new job can detect when the last videoFile was uploaded because movingJobsRunning will be 0 and the cleanup part of the job can run. This uploads the master HLS playlist, removes the files from the local disk, and sets the video to published.

The storage field on videoFile and videoStreamingPlaylist is used to create the correct url to send in the API. The client then correctly loads the files from object storage (als long as CORS is correctly configured on the bucket). The streaming_playlists_url_template config field is used for custom URL generation, this is explained in the comment in default.yaml

Old PR text before rework

This adds a job that runs after all transcodes are done that uploads all the video files to object storage (s3 compatible). New fields are added to the video, videoFile, and videoStreamingPlaylist model to keep track of transcodes and where files are stored.

The transcodeJobsRunning field in the video model is incremented whenever a transcode job is created and decremented whenever one finishes. The job that moves files to object storage is added after each transcode job is finished, but only runs when transcodeJobsRunning is 0 so that all files can be uploaded at once.

The storage field on videoFile and videoStreamingPlaylist is used to create the correct url to send in the API. The client then correctly loads the files from object storage (als long as CORS is correctly configured on the bucket).

Currently the old files are not removed, because I wanted to ask feedback on how to best do this. In principle the job that moves files to object storage could delete the files right after the files are uploaded, because we are sure that no transcoding jobs are running anymore. However it might be the case that a user is currently watching a video that is streaming from those files, and I think removing the files might cause the stream to drop.

In addition to this I also need to implement a mechanism to get the files back to the local filesystem whenever local operations need to be done. I think this is only the case when transcoding but I'm not sure about that.

TODO (need help):

Remove old files from the filesystem
Mechanism to retrieve files from object storage whenever new transcodes are needed

Related issues

Fixes #3661

Has this been tested?

👍 yes, I added tests to the test suite
💭 no, because this PR is a draft and still needs work
🙅 no, because this PR does not update server code
🙋 no, because I need help

server/types/models/video/video.ts

pingiun · 2021-07-30T12:08:24Z

I already started working on the new documentation: https://framagit.org/pingiun/documentation/-/blob/pingiun-master-patch-80848/admin-remote-storage.md

kontrollanten · 2021-07-30T19:52:17Z

Elegant! Impressive work.

Currently the old files are not removed, because I wanted to ask feedback on how to best do this. In principle the job that moves files to object storage could delete the files right after the files are uploaded, because we are sure that no transcoding jobs are running anymore. However it might be the case that a user is currently watching a video that is streaming from those files, and I think removing the files might cause the stream to drop.

Can we add a video state called "processing" (or something similar) until the upload to s3 is done? That would mean that the video isn't available until the files are in the bucket.

A future request would be the possibility to set the s3 download base URL. That way it would be possible to set the file URL to a CDN, to reduce the costs.

pingiun · 2021-07-30T19:56:48Z

Can we add a video state called "processing" (or something similar) until the upload to s3 is done? That would mean that the video isn't available until the files are in the bucket.

This might actually be the most elegant solution, good idea.

SimplyCorbett · 2021-07-30T20:31:23Z

A future request would be the possibility to set the s3 download base URL. That way it would be possible to set the file URL to a CDN, to reduce the costs.

Isn't it already possible to do this by caching the /w/xyz and streaming-playlists portion of the URL using varnish or nuster or similar?

kontrollanten · 2021-07-31T09:02:08Z

A future request would be the possibility to set the s3 download base URL. That way it would be possible to set the file URL to a CDN, to reduce the costs.

Isn't it already possible to do this by caching the /w/xyz and streaming-playlists portion of the URL using varnish or nuster or similar?

Today that's possible, but as I understand this PR it will change the video file URL so that the browser will download it directly from S3 (correct me if I'm wrong, @pingiun ).

pingiun · 2021-07-31T09:16:09Z

Today that's possible, but as I understand this PR it will change the video file URL so that the browser will download it directly from S3 (correct me if I'm wrong, @pingiun ).

This is correct

SimplyCorbett · 2021-07-31T23:16:53Z

Today that's possible, but as I understand this PR it will change the video file URL so that the browser will download it directly from S3 (correct me if I'm wrong, @pingiun ).

This is correct

Then there needs to be a way to cache the video. This isn't a good idea. Cache mirrors are cheap. I can get high bandwidth servers for a fairly cheap price.

Direct S3 would be many times more expensive.

pingiun · 2021-08-01T06:07:56Z

Today that's possible, but as I understand this PR it will change the video file URL so that the browser will download it directly from S3 (correct me if I'm wrong, @pingiun ).

This is correct

Then there needs to be a way to cache the video. This isn't a good idea. Cache mirrors are cheap. I can get high bandwidth servers for a fairly cheap price.

Direct S3 would be many times more expensive.

With these changes it's still possible to use the old method of s3 storage via s3fs.

Adding the suggestion from @kontrollanten should be fairly simple though, I can try to add that to this PR

pingiun · 2021-08-01T08:34:49Z

@SimplyCorbett Can you take a look at 26cabf2 and see if this change adds the right support for you?

pingiun · 2021-08-02T13:44:52Z

I think a base URL should be enough for the most cases.

Changed in 03c6d12

kontrollanten · 2021-08-02T18:17:05Z

Mechanism to retrieve files from object storage whenever new transcodes are needed

Have you any ideas about how to solve this? I'm thinking about the following solution

In handleHLSJob, a check is done if the video file exists in PeerTube tmp folder. If not, it will download the file from s3.
When handleHLSJob is done, a new job, cleanupTmpVideo, is created.
cleanupTmpVideo job will check if there's any video transcoding jobs in queue, if not it will remove the tmp file.

What do you think about that?

Chocobozzz · 2021-08-03T09:02:54Z

Hello and thanks @pingiun

Since it's still a draft, here's some code remarks:

Try to use objectStorage prefix instead of s3 prefix for your function/variables/config names
Prefer to use a tree for the config: s3.streaming_playlists_bucket -> object_storage.streaming_playlists.bucket
Use uppercase for config: S3.STREAMING_PLAYLISTS_BUCKETINFO.bucket -> OBJECT_STORAGE.STREAMING_PLAYLISTS.BUCKET (maybe BUCKET_NAME instead of BUCKET)
I suggest to rename moveJobsRunning to pendingMovingJobs (or better, create a dedicated videoJobInfo table with a pendingMove & videoId columns so we could also use this table to track pending transcoding jobs)
https://github.com/Chocobozzz/PeerTube/pull/4290/files#diff-3e26d41ca4bda1de8e1747af70ca2af642abcc1e9e0bfb94239ff2165acfbde5R19 uses a string instead of an integer
I think we should store the origin object storage URL in fileUrl, without base_url injection. Instead, inject the base_url at "runtime" so admins can easily change this configuration without running a script to update DB URLs

Chocobozzz#4290 (comment) The remarks in question: Try to use objectStorage prefix instead of s3 prefix for your function/variables/config names Prefer to use a tree for the config: s3.streaming_playlists_bucket -> object_storage.streaming_playlists.bucket Use uppercase for config: S3.STREAMING_PLAYLISTS_BUCKETINFO.bucket -> OBJECT_STORAGE.STREAMING_PLAYLISTS.BUCKET (maybe BUCKET_NAME instead of BUCKET) I suggest to rename moveJobsRunning to pendingMovingJobs (or better, create a dedicated videoJobInfo table with a pendingMove & videoId columns so we could also use this table to track pending transcoding jobs) https://github.com/Chocobozzz/PeerTube/pull/4290/files#diff-3e26d41ca4bda1de8e1747af70ca2af642abcc1e9e0bfb94239ff2165acfbde5R19 uses a string instead of an integer I think we should store the origin object storage URL in fileUrl, without base_url injection. Instead, inject the base_url at "runtime" so admins can easily change this configuration without running a script to update DB URLs

pingiun · 2021-08-03T14:56:09Z

@Chocobozzz thanks for the good suggestions, I implemented them all in 105cc1f

Do you have an idea at what places the file should be locally available? This is my first time working on this codebase so not everything is in my head yet. I'm not sure in what circumstances transcodes can run for example, so I don't exactly know how I can make sure the lifecycle of local files stays correct.

server/models/video/video-job-info.ts

kukhariev · 2021-08-03T15:34:08Z

@pingiun this was tested with files larger than 5 gigabytes?

pingiun · 2021-08-03T21:25:01Z

@pingiun this was tested with files larger than 5 gigabytes?

Thanks for telling me about this limit, I didn't know before that the official SDK would not split up when sending large files.

I added basic support for this in ee4c80a, but it might require some more thinking about aborting multipart requests when something went wrong so parts aren't left in the bucket. Another option would just to request the user to have a lifecycle rule setup in the bucket that deletes old object parts.

As can be seen in the commit I made a hacky addition to the ReadStream to make streaming to the API work. I also created a PR at the official s3 sdk repo so that we might change this back in the future.

server/controllers/api/videos/upload.ts

server/initializers/checker-after-init.ts

server/initializers/migrations/0660-object-storage.ts

server/lib/object-storage.ts

server/lib/video-paths.ts

server/models/video/video-file.ts

server/models/video/video-job-info.ts

Chocobozzz · 2021-08-05T10:00:17Z

Do you have an idea at what places the file should be locally available?

I think you can put the files in https://github.com/Chocobozzz/PeerTube/blob/develop/config/production.yaml.example#L78

pingiun · 2021-08-05T15:54:33Z

I'm not sure why the tests are failing, the same thing is happening when I run the tests locally.

I think you can put the files in https://github.com/Chocobozzz/PeerTube/blob/develop/config/production.yaml.example#L78

I think you may have misunderstood, I meant which places in the code. So which parts in the code need to have the files stored locally, and need a change to support files that may not exist locally?

Chocobozzz · 2021-08-06T07:33:23Z

I think you may have misunderstood, I meant which places in the code.

Oh sorry, then I think everywhere we use getVideoFilePath (we use it to generate torrents, thumbnails, for transcoding, and playlist generation that should only happen on video upload/update and in scripts).

I'm not sure why the tests are failing, the same thing is happening when I run the tests locally.

The server build fails because of the stream/promises import: https://github.com/Chocobozzz/PeerTube/pull/4290/checks?check_run_id=3251005770#step:10:3480

pingiun · 2021-08-06T11:24:34Z

I'm not sure why the tests are failing, the same thing is happening when I run the tests locally.

The server build fails because of the stream/promises import: https://github.com/Chocobozzz/PeerTube/pull/4290/checks?check_run_id=3251005770#step:10:3480

Thanks, I didn't catch that. I see that module was only added in Node 15.0 so I will try to remove the usage

Uses two config keys to support url generation that doesn't directly go to (compatible s3). Can be used to generate urls to any cache server or CDN.

server/lib/object-storage.ts

Chocobozzz · 2021-08-11T14:42:22Z

I'll update this PR, thanks @pingiun

config/default.yaml

server/lib/object-storage/shared/object-storage-helpers.ts

server/lib/video.ts

server/models/video/video-job-info.ts

Chocobozzz · 2021-08-16T16:27:25Z

PR is ready

@pingiun Could you open a MR for the documentation?

hugomd · 2021-08-17T12:11:15Z

server/lib/object-storage/shared/object-storage-helpers.ts

+
+  const createMultipartCommand = new CreateMultipartUploadCommand({
+    Bucket: bucketInfo.BUCKET_NAME,
+    Key: key


We should be setting ACL here, as uploads are private by default for a lot of S3 providers.

Something like this:

Suggested change

Key: key

Key: key,

ACL: "public-read"

I'm happy to follow this up in a separate PR. For now I'm just setting the bucket ACL to public.

You can review this PR and I'll resolve the conversations directly on develop :)

I only tested with backblaze b2 before, this does indeed seem to be necessary for AWS S3

Reflecting on this a bit more, the ACL can likely be handled by the bucket policy.

We could follow up to specifically handle private buckets, pre-signing each URL before handing it off to the client.

Here's an example policy that disables file listing but allows retrieving objects:

{ "Version": "2008-10-17", "Statement": [ { "Sid": "AddPerm", "Effect": "Allow", "Principal": "*", "Action": "s3:GetObject", "Resource": "arn:aws:s3:::my-bucket/*" } ] }

pingiun · 2021-08-17T17:22:42Z

@Chocobozzz it appears through my testing that you changed something where local files are needed after deleting them. I now see the following in my logs:

[kubernetes.docker.internal:9000] 2021-08-17 17:21:46.663 info: Moving video 011c85e3-0c5e-459f-9f8b-4870b746ca41 in job 8.
[kubernetes.docker.internal:9000] 2021-08-17 17:21:48.332 info: Fetching HLS file hls_011c85e3-0c5e-459f-9f8b-4870b746ca41/7c6c9838-4d4f-4280-bf82-435b96d53465-360-fragmented.mp4 from object storage to /data/tmp/ecf8ea4b-217e-417f-9ed1-864c34dc47e1.mp4.
[kubernetes.docker.internal:9000] 2021-08-17 17:21:48.631 info: Creating torrent /data/torrents/e44ecb3d-b22f-4aad-a848-f4fa19cde296-360-hls.torrent.
[kubernetes.docker.internal:9000] 2021-08-17 17:21:48.639 info: Running cleanup after moving files to object storage (video 011c85e3-0c5e-459f-9f8b-4870b746ca41 in job 8)
[kubernetes.docker.internal:9000] 2021-08-17 17:21:48.842 info: Publishing video 011c85e3-0c5e-459f-9f8b-4870b746ca41.

So files are uploaded, deleted and then downloaded again. This wasn't happening in my code and doesn't seem very efficient (especially for large uploads)

pingiun · 2021-08-17T17:28:06Z

Another thing that now doesn't work anymore is updating the endpoint url, video's will still be loaded from the old url because it's saved in the database. My code would always generate the URL from scratch (following your suggestion)

Chocobozzz · 2021-08-18T07:20:48Z

Another thing that now doesn't work anymore is updating the endpoint url, video's will still be loaded from the old url because it's saved in the database.

In my comment I only talked about base_url and not endpoint. IMHO when the admin change the endpoint, they do not expect that peertube automatically considers that objects previously uploaded use the new endpoint. Because admins would need to migrate old objects, and so it's part of a migration process where peertube should be included. It's the same if admins disable object storage at some point, we should still serve old files from object storage.
However, the base_url should stay dynamic because a proxy that just cache files in front of a s3 storage could be easily changed.

So files are uploaded, deleted and then downloaded again. This wasn't happening in my code and doesn't seem very efficient (especially for large uploads)

Thanks, it should be fixed in 1f6125b

@hugomd I fixed your comment in 0d4a3c6

hugomd · 2021-08-18T07:45:59Z

@hugomd I fixed your comment in 0d4a3c6

Thanks for that, I'll pull it in and see how I go with a private bucket!

hugomd · 2021-08-18T08:23:22Z

A nice extension to this could be pre-signing the upload URL and passing it to the client, so they can upload directly to object storage without needing to proxy via Peertube.

Peertube could wait until the upload's finished, and then pull it from object storage for transcoding.

pingiun · 2021-08-18T11:50:23Z

In my comment I only talked about base_url and not endpoint. IMHO when the admin change the endpoint, they do not expect that peertube automatically considers that objects previously uploaded use the new endpoint. Because admins would need to migrate old objects, and so it's part of a migration process where peertube should be included. It's the same if admins disable object storage at some point, we should still serve old files from object storage.
However, the base_url should stay dynamic because a proxy that just cache files in front of a s3 storage could be easily changed.

Of course, this makes sense.

I created the MR for documentation here: https://framagit.org/framasoft/peertube/documentation/-/merge_requests/98

Chocobozzz · 2021-08-18T12:08:34Z

Perfect, I'll also update the MR when I'll time. Thanks @pingiun

pingiun commented Jul 30, 2021

View reviewed changes

server/types/models/video/video.ts Outdated Show resolved Hide resolved

pingiun commented Aug 3, 2021

View reviewed changes

server/models/video/video-job-info.ts Outdated Show resolved Hide resolved

pingiun commented Aug 3, 2021

View reviewed changes

server/models/video/video-job-info.ts Outdated Show resolved Hide resolved

pingiun force-pushed the add-object-storage-support branch from ea8917b to ee4c80a Compare August 3, 2021 21:25

Chocobozzz mentioned this pull request Aug 5, 2021

Handle video file move in a job #4282

Closed

4 tasks

Chocobozzz reviewed Aug 5, 2021

View reviewed changes

pingiun marked this pull request as ready for review August 6, 2021 15:06

pingiun added 3 commits August 6, 2021 17:06

Add support for saving video files to object storage

4145ba0

Add support for custom url generation on s3 stored files

4f5d19b

Uses two config keys to support url generation that doesn't directly go to (compatible s3). Can be used to generate urls to any cache server or CDN.

Upload files to s3 concurrently and delete originals afterwards

67f7339

pingiun requested a review from Chocobozzz August 6, 2021 15:09

kukhariev reviewed Aug 6, 2021

View reviewed changes

server/lib/object-storage.ts Outdated Show resolved Hide resolved

pingiun added 2 commits August 7, 2021 12:42

Make upload size configurable and increase default

50d6556

Prune webtorrent files that are stored in object storage

a54e28a

Chocobozzz self-assigned this Aug 11, 2021

Chocobozzz added the Status: In Progress 🔜 label Aug 11, 2021

Chocobozzz added 3 commits August 13, 2021 13:46

Move files after transcoding jobs

72eb147

Merge branch 'develop' into add-object-storage-support

b884938

Fix federation

831f973

kukhariev reviewed Aug 13, 2021

View reviewed changes

config/default.yaml Outdated Show resolved Hide resolved

pingiun commented Aug 14, 2021

View reviewed changes

config/default.yaml Outdated Show resolved Hide resolved

server/lib/object-storage/shared/object-storage-helpers.ts Show resolved Hide resolved

server/lib/video.ts Outdated Show resolved Hide resolved

server/models/video/video-job-info.ts Outdated Show resolved Hide resolved

Add video path manager

2b71af9

Chocobozzz force-pushed the add-object-storage-support branch from 39eb385 to 2b71af9 Compare August 16, 2021 15:44

Chocobozzz added 2 commits August 16, 2021 17:48

Support move to external storage job in client

ffa0c8c

Fix live object storage tests

6b7c3bc

Chocobozzz merged commit 0305db2 into Chocobozzz:develop Aug 17, 2021

hugomd reviewed Aug 17, 2021

View reviewed changes

decentral1se mentioned this pull request Sep 3, 2021

v3.3 upgrade: config loading error when loading admin page #4371

Closed

djmaze mentioned this pull request Oct 31, 2021

Allow setting path-style access for object storage #4510

Closed

4 tasks

Chocobozzz mentioned this pull request Mar 14, 2022

Customize upload ACL when uploading in object storage #4850

Closed

pingiun deleted the add-object-storage-support branch September 17, 2022 10:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for saving video files to object storage #4290

Add support for saving video files to object storage #4290

pingiun commented Jul 30, 2021 •

edited

Loading

pingiun commented Jul 30, 2021 •

edited

Loading

kontrollanten commented Jul 30, 2021

pingiun commented Jul 30, 2021 •

edited

Loading

SimplyCorbett commented Jul 30, 2021 •

edited

Loading

kontrollanten commented Jul 31, 2021

pingiun commented Jul 31, 2021

SimplyCorbett commented Jul 31, 2021 •

edited

Loading

pingiun commented Aug 1, 2021

pingiun commented Aug 1, 2021

pingiun commented Aug 2, 2021

kontrollanten commented Aug 2, 2021

Chocobozzz commented Aug 3, 2021 •

edited

Loading

pingiun commented Aug 3, 2021

kukhariev commented Aug 3, 2021

pingiun commented Aug 3, 2021 •

edited

Loading

Chocobozzz commented Aug 5, 2021

pingiun commented Aug 5, 2021

Chocobozzz commented Aug 6, 2021 •

edited

Loading

pingiun commented Aug 6, 2021

Chocobozzz commented Aug 11, 2021

Chocobozzz commented Aug 16, 2021

hugomd Aug 17, 2021

Chocobozzz Aug 17, 2021

pingiun Aug 17, 2021 •

edited

Loading

hugomd Aug 18, 2021 •

edited

Loading

pingiun commented Aug 17, 2021

pingiun commented Aug 17, 2021

Chocobozzz commented Aug 18, 2021 •

edited

Loading

hugomd commented Aug 18, 2021

hugomd commented Aug 18, 2021

pingiun commented Aug 18, 2021

Chocobozzz commented Aug 18, 2021

Add support for saving video files to object storage #4290

Add support for saving video files to object storage #4290

Conversation

pingiun commented Jul 30, 2021 • edited Loading

Description

TODO (need help):

Related issues

Has this been tested?

pingiun commented Jul 30, 2021 • edited Loading

kontrollanten commented Jul 30, 2021

pingiun commented Jul 30, 2021 • edited Loading

SimplyCorbett commented Jul 30, 2021 • edited Loading

kontrollanten commented Jul 31, 2021

pingiun commented Jul 31, 2021

SimplyCorbett commented Jul 31, 2021 • edited Loading

pingiun commented Aug 1, 2021

pingiun commented Aug 1, 2021

pingiun commented Aug 2, 2021

kontrollanten commented Aug 2, 2021

Chocobozzz commented Aug 3, 2021 • edited Loading

pingiun commented Aug 3, 2021

kukhariev commented Aug 3, 2021

pingiun commented Aug 3, 2021 • edited Loading

Chocobozzz commented Aug 5, 2021

pingiun commented Aug 5, 2021

Chocobozzz commented Aug 6, 2021 • edited Loading

pingiun commented Aug 6, 2021

Chocobozzz commented Aug 11, 2021

Chocobozzz commented Aug 16, 2021

hugomd Aug 17, 2021

Choose a reason for hiding this comment

Chocobozzz Aug 17, 2021

Choose a reason for hiding this comment

pingiun Aug 17, 2021 • edited Loading

Choose a reason for hiding this comment

hugomd Aug 18, 2021 • edited Loading

Choose a reason for hiding this comment

pingiun commented Aug 17, 2021

pingiun commented Aug 17, 2021

Chocobozzz commented Aug 18, 2021 • edited Loading

hugomd commented Aug 18, 2021

hugomd commented Aug 18, 2021

pingiun commented Aug 18, 2021

Chocobozzz commented Aug 18, 2021

pingiun commented Jul 30, 2021 •

edited

Loading

pingiun commented Jul 30, 2021 •

edited

Loading

pingiun commented Jul 30, 2021 •

edited

Loading

SimplyCorbett commented Jul 30, 2021 •

edited

Loading

SimplyCorbett commented Jul 31, 2021 •

edited

Loading

Chocobozzz commented Aug 3, 2021 •

edited

Loading

pingiun commented Aug 3, 2021 •

edited

Loading

Chocobozzz commented Aug 6, 2021 •

edited

Loading

pingiun Aug 17, 2021 •

edited

Loading

hugomd Aug 18, 2021 •

edited

Loading

Chocobozzz commented Aug 18, 2021 •

edited

Loading