Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[content-service] Refactor upload to GCS #9836

Merged
merged 6 commits into from
May 12, 2022
Merged

[content-service] Refactor upload to GCS #9836

merged 6 commits into from
May 12, 2022

Conversation

aledbf
Copy link
Member

@aledbf aledbf commented May 6, 2022

Description

Remove creation of chunks and CRC from the upload of backups. This is one the sources of CPU/RAM utilization in ws-daemon

How to test

  • Check tests
cd test && \
go test -timeout 300s \
  -run ^TestObjectUpload$ github.com/gitpod-io/gitpod/content-service/pkg/storage -v

or use docker

go test -timeout 300s \
  -run ^TestObjectUpload$ github.com/gitpod-io/gitpod/content-service/pkg/storage -v -args -with-docker
=== RUN   TestObjectUpload
=== RUN   TestObjectUpload/valid_1M_backup
 - - [06/May/2022:15:55:09 -0400] "POST /storage/v1/b?alt=json&prettyPrint=false&project=fake-project HTTP/1.1" 200 161
 - - [06/May/2022:15:55:09 -0400] "GET /storage/v1/b/gitpod-dev-user-fake-owner/o/workspaces%2Ffake-workspace%2Ffull.tar?alt=json&prettyPrint=false&projection=full HTTP/1.1" 404 59
 - - [06/May/2022:15:55:09 -0400] "POST /upload/storage/v1/b/gitpod-dev-user-fake-owner/o?alt=json&name=workspaces%2Ffake-workspace%2Ffull.tar&prettyPrint=false&projection=full&uploadType=multipart HTTP/1.1" 200 536
 - - [06/May/2022:15:55:09 -0400] "GET /gitpod-dev-user-fake-owner/workspaces/fake-workspace/full.tar HTTP/1.1" 200 1001984
=== RUN   TestObjectUpload/valid_100M_backup
 - - [06/May/2022:15:55:09 -0400] "POST /storage/v1/b?alt=json&prettyPrint=false&project=fake-project HTTP/1.1" 200 161
 - - [06/May/2022:15:55:10 -0400] "GET /storage/v1/b/gitpod-dev-user-fake-owner/o/workspaces%2Ffake-workspace%2Ffull.tar?alt=json&prettyPrint=false&projection=full HTTP/1.1" 404 59
 - - [06/May/2022:15:55:10 -0400] "POST /upload/storage/v1/b/gitpod-dev-user-fake-owner/o?alt=json&name=workspaces%2Ffake-workspace%2Ffull.tar&prettyPrint=false&projection=full&uploadType=resumable HTTP/1.1" 200 337
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 474
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 474
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 474
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 474
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 474
 - - [06/May/2022:15:55:10 -0400] "POST /upload/resumable/0a59d243db09ba73ba636e0dc4c23eb3 HTTP/1.1" 200 538
 - - [06/May/2022:15:55:10 -0400] "GET /gitpod-dev-user-fake-owner/workspaces/fake-workspace/full.tar HTTP/1.1" 200 100001792
--- PASS: TestObjectUpload (0.97s)
    --- PASS: TestObjectUpload/valid_1M_backup (0.02s)
    --- PASS: TestObjectUpload/valid_100M_backup (0.95s)
PASS
ok      github.com/gitpod-io/gitpod/content-service/pkg/storage 0.979s

Release Notes

[content-service] Refactor upload to GCS

@aledbf
Copy link
Member Author

aledbf commented May 6, 2022

❯ go test -timeout 60s -run ^TestObjectUpload$ github.com/gitpod-io/gitpod/content-service/pkg/storage -count=1 -v
=== RUN   TestObjectUpload
=== RUN   TestObjectUpload/valid_1M_backup
 - - [06/May/2022:18:47:27 -0400] "POST /storage/v1/b?alt=json&prettyPrint=false&project=fake-project HTTP/1.1" 200 161
 - - [06/May/2022:18:47:27 -0400] "GET /storage/v1/b/gitpod-dev-user-fake-owner/o/workspaces%2Ffake-workspace%2Ffull.tar?alt=json&prettyPrint=false&projection=full HTTP/1.1" 404 59
 - - [06/May/2022:18:47:27 -0400] "POST /upload/storage/v1/b/gitpod-dev-user-fake-owner/o?alt=json&name=workspaces%2Ffake-workspace%2Ffull.tar&prettyPrint=false&projection=full&uploadType=multipart HTTP/1.1" 200 536
 - - [06/May/2022:18:47:27 -0400] "GET /gitpod-dev-user-fake-owner/workspaces/fake-workspace/full.tar HTTP/1.1" 200 1050112
=== RUN   TestObjectUpload/valid_100M_backup
 - - [06/May/2022:18:47:27 -0400] "POST /storage/v1/b?alt=json&prettyPrint=false&project=fake-project HTTP/1.1" 200 161
 - - [06/May/2022:18:47:29 -0400] "GET /storage/v1/b/gitpod-dev-user-fake-owner/o/workspaces%2Ffake-workspace%2Ffull.tar?alt=json&prettyPrint=false&projection=full HTTP/1.1" 404 59
 - - [06/May/2022:18:47:29 -0400] "POST /upload/storage/v1/b/gitpod-dev-user-fake-owner/o?alt=json&name=workspaces%2Ffake-workspace%2Ffull.tar&prettyPrint=false&projection=full&uploadType=resumable HTTP/1.1" 200 337
 - - [06/May/2022:18:47:29 -0400] "POST /upload/resumable/69e4d23e240f9dcb137c28eb4b9fe2ef HTTP/1.1" 200 474
 - - [06/May/2022:18:47:29 -0400] "POST /upload/resumable/69e4d23e240f9dcb137c28eb4b9fe2ef HTTP/1.1" 200 538
 - - [06/May/2022:18:47:29 -0400] "GET /gitpod-dev-user-fake-owner/workspaces/fake-workspace/full.tar HTTP/1.1" 200 104859136
=== RUN   TestObjectUpload/valid_1GB_backup
 - - [06/May/2022:18:47:30 -0400] "POST /storage/v1/b?alt=json&prettyPrint=false&project=fake-project HTTP/1.1" 200 160
 - - [06/May/2022:18:47:33 -0400] "GET /storage/v1/b/gitpod-dev-user-fake-owner/o/workspaces%2Ffake-workspace%2Ffull.tar?alt=json&prettyPrint=false&projection=full HTTP/1.1" 404 59
 - - [06/May/2022:18:47:33 -0400] "POST /upload/storage/v1/b/gitpod-dev-user-fake-owner/o?alt=json&name=workspaces%2Ffake-workspace%2Ffull.tar&prettyPrint=false&projection=full&uploadType=resumable HTTP/1.1" 200 337
 - - [06/May/2022:18:47:33 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:33 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:33 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:33 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:34 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:34 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:35 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:35 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:36 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:37 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:38 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:39 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:40 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:41 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:42 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:43 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 474
 - - [06/May/2022:18:47:44 -0400] "POST /upload/resumable/4f9700cc631a2df2cae7cf8ad7c097a2 HTTP/1.1" 200 539
 - - [06/May/2022:18:47:45 -0400] "GET /gitpod-dev-user-fake-owner/workspaces/fake-workspace/full.tar HTTP/1.1" 200 1073743360
--- PASS: TestObjectUpload (19.70s)
    --- PASS: TestObjectUpload/valid_1M_backup (0.01s)
    --- PASS: TestObjectUpload/valid_100M_backup (2.41s)
    --- PASS: TestObjectUpload/valid_1GB_backup (17.27s)
PASS
ok  	github.com/gitpod-io/gitpod/content-service/pkg/storage	19.733s

@aledbf aledbf force-pushed the aledbf/gcs-upload branch 4 times, most recently from c7f149c to c8c1a40 Compare May 7, 2022 23:37
@aledbf aledbf marked this pull request as ready for review May 7, 2022 23:38
@aledbf aledbf requested a review from a team May 7, 2022 23:38
@aledbf aledbf requested a review from Furisto as a code owner May 7, 2022 23:38
@aledbf aledbf requested review from a team May 7, 2022 23:38
@aledbf aledbf requested review from csweichel and geropl as code owners May 7, 2022 23:38
@aledbf
Copy link
Member Author

aledbf commented May 7, 2022

/hold

@github-actions github-actions bot added team: IDE team: delivery Issue belongs to the self-hosted team team: workspace Issue belongs to the Workspace team labels May 7, 2022
@aledbf aledbf force-pushed the aledbf/gcs-upload branch from 7434300 to 4f5697d Compare May 8, 2022 14:03
Copy link
Contributor

@csweichel csweichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took the liberty to push a commit which simplifies the content upload to GCloud further.

I'm not all that hopeful that this change will actually decrease memory use in practice, because of
image

If anything, the code that existed prior (albeit a ton more complicated) would give us some control over the amount of buffering we do.

@aledbf
Copy link
Member Author

aledbf commented May 9, 2022

Took the liberty to push a commit which simplifies the content upload to GCloud further.

That was on purpose :)
I want to ensure we run the upload in a dedicated goroutine and remove any doubt of interference with other parts of ws-daemon

@aledbf
Copy link
Member Author

aledbf commented May 9, 2022

If anything, the code that existed prior (albeit a ton more complicated) would give us some control over the amount of buffering we do.

but we are setting a value https://github.com/gitpod-io/gitpod/pull/9836/files#diff-c42db953f8fdf32db52a02ec07b01ac4193206a1fe63ffa86201a31dc7e83bd8R374

@csweichel
Copy link
Contributor

csweichel commented May 9, 2022

That was on purpose :)
I want to ensure we run the upload in a dedicated goroutine and remove any doubt of interference with other parts of ws-daemon

Ooops. Dropped that commit again.

@csweichel csweichel force-pushed the aledbf/gcs-upload branch from bd91014 to 4f5697d Compare May 9, 2022 18:54
@csweichel
Copy link
Contributor

If anything, the code that existed prior (albeit a ton more complicated) would give us some control over the amount of buffering we do.

but we are setting a value https://github.com/gitpod-io/gitpod/pull/9836/files#diff-c42db953f8fdf32db52a02ec07b01ac4193206a1fe63ffa86201a31dc7e83bd8R374

We are indeed - good point. Max memory usage this should produce then is numberOfWorkspaces * 64MiB.

@aledbf
Copy link
Member Author

aledbf commented May 9, 2022

Max memory usage this should produce then is numberOfWorkspaces * 64MiB.

I cannot use more than 200MB after the refactoring (I even tried to upload 10GB)

Copy link
Contributor

@csweichel csweichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Member

@akosyakov akosyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@aledbf aledbf force-pushed the aledbf/gcs-upload branch from 4f5697d to 3af1791 Compare May 10, 2022 12:56
@aledbf aledbf force-pushed the aledbf/gcs-upload branch from 3af1791 to 4fb08c0 Compare May 10, 2022 13:55
@geropl
Copy link
Member

geropl commented May 10, 2022

@aledbf I bet you consider that, but: What's the trade-off in terms of upload speed?

@aledbf
Copy link
Member Author

aledbf commented May 10, 2022

/werft run

👍 started the job as gitpod-build-aledbf-gcs-upload.18
(with .werft/ from main)

@aledbf
Copy link
Member Author

aledbf commented May 10, 2022

What's the trade-off in terms of upload speed?

Good question. Right now we lose nodes due to the memory utilization. That said, I don't see the current behavior to be too fast.

Screenshot from 2022-05-10 11-11-52
Screenshot from 2022-05-10 11-11-31

(last 7hs)

@aledbf aledbf force-pushed the aledbf/gcs-upload branch from 4fb08c0 to 19cef8d Compare May 10, 2022 15:35
Copy link
Member

@geropl geropl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes LGTM, did not test. 👍

@roboquat roboquat merged commit bbacba4 into main May 12, 2022
@roboquat roboquat deleted the aledbf/gcs-upload branch May 12, 2022 11:09
@roboquat roboquat added the deployed: IDE IDE change is running in production label May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: IDE IDE change is running in production release-note size/XXL team: delivery Issue belongs to the self-hosted team team: IDE team: workspace Issue belongs to the Workspace team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants