Implementation of a Primary/Archive setup for object storage. #13397

ewdurbin · 2023-04-09T16:02:38Z

This implements a Primary/Archive setup for our object storage.

Uploads are synchronously uploaded to the Primary object storage (namely Backblaze B2 as target) and then archived to Archive object store via task.

A task is implemented here which reports the number of files marked as having not been archived yet, by way of the new database column File.archived.

Additionally our CDN configuration is updated in pypi/infra@2b101c5...6c0e47f to report a log event to our metrics provider whenever a file is fetched from Archival storage rather than Primary. This will be useful to determine if we missed any files in migration, but also in the long run to catch missed archive tasks.

One notable missing piece from design is a task to automatically reconcile the state between the two buckets. I plan to use the visibility from the task and logs to design something that handles real failure modes, rather than trying to preempt.

miketheman

Not an exhaustive review, by any means, but looks like a reasonable approach to start with.

There's a little bit of risk for disk exhaustion of too many tasks run concurrently on a worker and download too many files before they can be closed out and removed, but I guess we'll have other alarms go off if that happens. We have disk alarms, right?

I don't see the "reconciliation" task in here yet - are you imagining this automated, or potentially an Admin page button to help with the occasional manual overrides?

docker-compose.yml

requirements/main.in

warehouse/packaging/models.py

warehouse/packaging/tasks.py

ewdurbin · 2023-04-09T16:35:37Z

There's a little bit of risk for disk exhaustion of too many tasks run concurrently on a worker and download too many files before they can be closed out and removed, but I guess we'll have other alarms go off if that happens. We have disk alarms, right?

In regular operation it's not really any more concerning than our current exposure on the upload nodes (which write the uploaded file out to temporary storage) unless we're in a "catchup" scenario.

I don't see the "reconciliation" task in here yet - are you imagining this automated, or potentially an Admin page button to help with the occasional manual overrides?

I'm imagining a weekly (ish) automated run with reporting via metrics.

requirements/main.txt

ppolewicz · 2023-04-10T09:52:18Z

I hope to get that done rapidly

ewdurbin · 2023-04-10T23:06:57Z

OK, this is ready for full review.

The metric from https://github.com/pypi/warehouse/pull/13397/files#diff-d1bab9cf44a4f8cead347679b4d5efc2949fe9a7cbaf97a381e26d90c08dff2dR47-R58 combined with the logs from pypi/infra@2b101c5...6c0e47f should be enough to give us visibility into failover at least to start.

I plan to postpone implementation of the auto reconciliation task for the time being as I am 1) exhausted 2) out of time for this 3) interested to see what failure modes arise before trying to imagine them.

see https://www.backblaze.com/b2/docs/files.html#file-names

pyproject.toml

miketheman

Wow, this is looking awesome!

I had one non-blocking question - if it works, it works! 😁

tests/unit/packaging/test_services.py

miketheman reviewed Apr 9, 2023

View reviewed changes

docker-compose.yml Show resolved Hide resolved

requirements/main.in Show resolved Hide resolved

warehouse/packaging/models.py Outdated Show resolved Hide resolved

warehouse/packaging/tasks.py Show resolved Hide resolved

ewdurbin force-pushed the primary_archive_storage branch from f842eed to fef6dd2 Compare April 9, 2023 16:45

ppolewicz reviewed Apr 10, 2023

View reviewed changes

requirements/main.txt Show resolved Hide resolved

ewdurbin mentioned this pull request Apr 10, 2023

B2 -> GCS fallback, log to datadog pypi/infra#127

Merged

di approved these changes Apr 10, 2023

View reviewed changes

ewdurbin force-pushed the primary_archive_storage branch from 1f9c8f7 to 83677a7 Compare April 10, 2023 22:32

ewdurbin added 8 commits April 10, 2023 18:41

initial working primary/archive setup

f283ab8

configure b2

d9a1f0e

lint (with implementations of get* for b2)

c0db3e3

no types for b2sdk

cc95ab7

B2FileStorage: correctly apply prefix

2fe3cca

add an S3ArchiveFileStorage class

7b87c46

fix existing tests

ede969f

add comment to new column

624ab05

ewdurbin force-pushed the primary_archive_storage branch 2 times, most recently from 2d6c458 to fc56c9f Compare April 10, 2023 22:47

tests and linting

2228129

ewdurbin force-pushed the primary_archive_storage branch from fc56c9f to 2228129 Compare April 10, 2023 22:51

ewdurbin marked this pull request as ready for review April 10, 2023 23:05

ewdurbin requested a review from a team as a code owner April 10, 2023 23:05

ewdurbin changed the title ~~initial working primary/archive setup~~ Implementation of a Primary/Archive setup for object storage. Apr 10, 2023

ewdurbin added 3 commits April 11, 2023 05:09

block characters explicitly unsupported by B2

8e907cd

see https://www.backblaze.com/b2/docs/files.html#file-names

Merge branch 'main' into primary_archive_storage

c657eed

update migration

906a1e7

ewdurbin force-pushed the primary_archive_storage branch from 70226fc to 906a1e7 Compare April 11, 2023 10:17

miketheman reviewed Apr 11, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

add note to b2 typing mypy exclusion

a3d0780

miketheman approved these changes Apr 11, 2023

View reviewed changes

tests/unit/packaging/test_services.py Show resolved Hide resolved

fix ref to b2sdk exceptions, stay in the version

ec47f8d

ewdurbin merged commit 629e546 into main Apr 11, 2023

ewdurbin deleted the primary_archive_storage branch April 11, 2023 11:02

ewdurbin mentioned this pull request Apr 12, 2023

sync_file_to_archive task should also sync signature files #13414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of a Primary/Archive setup for object storage. #13397

Implementation of a Primary/Archive setup for object storage. #13397

ewdurbin commented Apr 9, 2023 •

edited

Loading

miketheman left a comment

ewdurbin commented Apr 9, 2023

ppolewicz commented Apr 10, 2023 via email

ewdurbin commented Apr 10, 2023

miketheman left a comment

Implementation of a Primary/Archive setup for object storage. #13397

Implementation of a Primary/Archive setup for object storage. #13397

Conversation

ewdurbin commented Apr 9, 2023 • edited Loading

miketheman left a comment

Choose a reason for hiding this comment

ewdurbin commented Apr 9, 2023

ppolewicz commented Apr 10, 2023 via email

ewdurbin commented Apr 10, 2023

miketheman left a comment

Choose a reason for hiding this comment

ewdurbin commented Apr 9, 2023 •

edited

Loading