Refactor storage #643

jyn514 · 2020-03-17T13:48:41Z

~~This could definitely use some more work, but I thought I'd get an initial draft up to see if this is the right approach. Thanks to @pietroalbini for doing the initial work :)~~

Separate storage backends into S3Backend and DatabaseBackend instead of selecting each backend at runtime for each file.
Split up add_path_into_database into 3 functions:
1. add_path_into_database so I don't have to change a bunch of unrelated code. This just delegates to store_all and then turns the returned HashMap into JSON to be stored in the DB.
2. store_all, which takes the folder to upload and what path to upload it to. store_all reads all the files in the folder into memory, then divides them into batches of size MAX_CONCURRENT_UPLOADS, and forwards those batches on to store_batch
3. store_batch, which differs based on whether we're uploading to S3 or storing in the database.
Added tests for storing and retrieving files. This tests S3 using min.io and Postgres using the same temporary tables we've been using
Remove unused move_to_s3 function
Don't create a new tokio::Runtime for each batch of files

I tried to reuse code wherever possible so @Mark-Simulacrum 's hard work on simultaneous uploads is still here. ~~However, I don't yet have tests for simultaneous uploads.~~

src/db/file.rs

src/storage/s3.rs

Kixiron · 2020-04-13T19:17:47Z

src/storage/s3.rs

+    }
+
+    pub(super) fn store_batch(&self, batch: &[Blob]) -> Result<(), Error> {
+        let mut rt = tokio::runtime::Runtime::new().unwrap();


Is creating a new tokio runtime every time we store files the best way to do things? Tokio already uses a fairly global state, so that could just be reused

Without creating the runtime, how would I use block_on? Or would I use await instead?

src/storage/s3.rs

src/storage/database.rs

src/storage/s3.rs

jyn514 · 2020-04-22T12:52:16Z

Status update: this has most of the code changes I want but is missing any tests for S3, which kind of defeats the point. It sounds like rusoto_mock is not going to be flexible enough for our needs so the next best thing is minio. We have minio set up with docker-compose already, but it needs to be tied into the test suite somehow, probably by having test::wrapper start a new instance of the docker images.

Given the scale of the task, I plan to make a separate PR so it's not lost in this details of the storage refactor. This should also help with testing on other platforms since developers will no longer need a local postgres database.

jyn514 · 2020-04-26T02:02:59Z

Status update to the status update: it ended up being really easy to add min.io, so I just did it here.

Note that this will require contributors to have min.io running before running tests. They can do this the same way as on CI: docker-compose up -d s3. This is documented in the README.

jyn514 · 2020-04-26T02:07:54Z

r? @pietroalbini

pietroalbini

The general structure of the code looks good, and I left some small feedback on how to organize the code.

This PR is already too big, but in a followup I'd love to see how the storage is initialized changed: instead of having a Storage::new() that picks on its own between S3 and the database, there should be individual constructors for database storage and S3 storage, and the choice of which to instantiate should be moved to the CLI. Then, each function that interacts with the storage should receive its own &Storage, and tests should call env.storage() or env.storage_s3() (preferring the former) to get the prepopulated instance.

src/storage/s3.rs

The only difference was between `pub(crate)` and private, so probably not worth it.

Co-Authored-By: Chase Wilson <buckshot1233@gmail.com>

This was only used for a one-time migration and is no longer necessary.

`join_all` polls every future when any wakes up `FuturesOrdered` has fancier tracking of which one triggered the wakeup so it only polls what it needs to

- remove unused imports - remove unused rusoto_mock dependency

src/storage/mod.rs

jyn514 force-pushed the refactor-storage branch 3 times, most recently from 86ace11 to f8312f1 Compare April 13, 2020 18:07

Kixiron reviewed Apr 13, 2020

View reviewed changes

pietroalbini reviewed Apr 15, 2020

View reviewed changes

src/storage/database.rs Outdated Show resolved Hide resolved

src/storage/s3.rs Outdated Show resolved Hide resolved

jyn514 force-pushed the refactor-storage branch from 12dea4a to 57733d8 Compare April 15, 2020 14:37

jyn514 mentioned this pull request Apr 15, 2020

Use ON CONFLICT DO UPDATE instead of rewriting it manually #719

Closed

jyn514 force-pushed the refactor-storage branch from a11d818 to a6e75b8 Compare April 16, 2020 18:41

jyn514 force-pushed the refactor-storage branch 3 times, most recently from 4d97861 to ee6829f Compare April 25, 2020 06:43

jyn514 changed the title ~~[WIP] Refactor storage~~ Refactor storage Apr 26, 2020

jyn514 changed the title ~~Refactor storage~~ [WIP] Refactor storage Apr 26, 2020

jyn514 force-pushed the refactor-storage branch from 521adf3 to cfc873c Compare April 26, 2020 02:05

jyn514 changed the title ~~[WIP] Refactor storage~~ Refactor storage Apr 26, 2020

pietroalbini reviewed Apr 28, 2020

View reviewed changes

src/storage/s3.rs Outdated Show resolved Hide resolved

src/storage/s3.rs Outdated Show resolved Hide resolved

src/storage/s3.rs Outdated Show resolved Hide resolved

pietroalbini and others added 11 commits April 28, 2020 13:53

storage: move db::file::Blob to storage::Blob

e49396d

wip

69b36c4

Add some impls

fc944a1

Remove unnecessary copy

b868fc5

Use new Storage backend

15ec1c6

[BROKEN] try to add rusoto_mock

75e30f7

Get basic tests working

7727a98

Add 404 test

b980ebb

Refactor out shared code

2cc30a6

Split uploading into different functions

8c44199

Add tests for store_all

42d82c1

jyn514 and others added 21 commits April 28, 2020 13:53

Fix rebase conflicts

c4f3ecf

Run rustfmt

6d18790

Encapsulate the s3_client

81b7325

Remove code duplication

abbb3e0

The only difference was between `pub(crate)` and private, so probably not worth it.

Pre-allocate vectors of known size

c06e21c

Co-Authored-By: Chase Wilson <buckshot1233@gmail.com>

Use 2018 idioms

1341332

Co-Authored-By: Chase Wilson <buckshot1233@gmail.com>

Remove move-to-s3 command

cd99af0

This was only used for a one-time migration and is no longer necessary.

Use ON CONFLICT instead of rewriting it manually

4549ed8

haha rustfmt go brrr

9213286

Only create runtime once, instead of for each batch

c6b79f3

Don't poll each future on each wakeup

a12aef4

`join_all` polls every future when any wakes up `FuturesOrdered` has fancier tracking of which one triggered the wakeup so it only polls what it needs to

Make rustfmt happy

30ff8ad

Don't hold all PutObjectOutputs in memory

e89dab8

Fix test failure

7251900

Use min.io to test S3 uploads

478dcb9

[BROKEN] start adding more tests

df9de24

Add more tests

cb9f5fe

rustfmt go brrr

4c4c623

Cleanup

2194aa8

- remove unused imports - remove unused rusoto_mock dependency

Add instructions to README for running s3 tests

2911c7a

Move some code around

8268445

jyn514 force-pushed the refactor-storage branch from a24cf06 to 8268445 Compare April 28, 2020 17:56

jyn514 commented Apr 28, 2020

View reviewed changes

src/storage/mod.rs Outdated Show resolved Hide resolved

jyn514 added 4 commits April 28, 2020 15:10

Don't read all files into memory at once

ec9280e

Add test for uploading more than a single batch of files

034b8ca

Move TestS3 into a submodule

e0104d0

test_s3 -> test

5d982d7

jyn514 merged commit 3dd32ec into rust-lang:master May 1, 2020

jyn514 deleted the refactor-storage branch May 1, 2020 18:01

jyn514 mentioned this pull request May 28, 2020

Add compression for uploaded documentation #780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor storage #643

Refactor storage #643

Uh oh!

jyn514 commented Mar 17, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Kixiron Apr 13, 2020

Uh oh!

jyn514 Apr 13, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jyn514 commented Apr 22, 2020

Uh oh!

jyn514 commented Apr 26, 2020

Uh oh!

jyn514 commented Apr 26, 2020

Uh oh!

pietroalbini left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Refactor storage #643

Refactor storage #643

Uh oh!

Conversation

jyn514 commented Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kixiron Apr 13, 2020

Choose a reason for hiding this comment

Uh oh!

jyn514 Apr 13, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jyn514 commented Apr 22, 2020

Uh oh!

jyn514 commented Apr 26, 2020

Uh oh!

jyn514 commented Apr 26, 2020

Uh oh!

pietroalbini left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jyn514 commented Mar 17, 2020 •

edited

Loading