Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Middleware to shard objects across a configurable number of buckets #351

Closed
timuralp opened this issue Mar 31, 2021 · 3 comments
Closed

Comments

@timuralp
Copy link
Collaborator

Some object store providers impose rate limits or perform poorly as the number of objects in a bucket exceeds certain amounts. It would be great to have a middleware in S3Proxy that shards content across a configurable number of buckets and presents a virtual bucket to the end user.

@timuralp
Copy link
Collaborator Author

A scheme like this would depend on hashing the object name and placing it in the appropriate shard. There are a few notable issues with this middleware:

  1. reconfiguring the number of shards would result in existing data being inaccessible
  2. creating the shards as part of create container may be expensive, but lazy-creating them could fail otherwise successful requests (or if the bucket creation takes time, some requests would fail while the shard bucket is created)
  3. unclear how to reshard existing virtual buckets or move to this scheme with an existing bucket

@gaul
Copy link
Owner

gaul commented Mar 31, 2021

Resharding is technically and operationally difficult. Perhaps you could defer it to an external mechanism, for example requiring users to duplicate objects into twice as many buckets? This would require a careful use of the hash function so that the mapping works. I believe the Swift ring only supports a fixed range of growth, e.g., starting with 256 virtual buckets in 8 physical buckets that can only grow to the virtual size. You might want to read about extendible hashing more sophisticated approaches.

If you want to support listing buckets in lexicographical order it will limit your ability to distribute objects evenly between buckets. Maybe you can just raise HTTP 501 NotImplemented and distribute using a hash of the name instead?

I worry that creating many buckets could trigger rate limiting on some providers. It might make more sense to defer this to an external program to provision.

I recommend putting a superblock or other kind of metadata in each physical bucket that encodes the overall virtual scheme, e.g., number of buckets, hashing scheme, and a version number.

@timuralp
Copy link
Collaborator Author

timuralp commented Apr 8, 2021

Great points!

Resharding is technically and operationally difficult. Perhaps you could defer it to an external mechanism, for example requiring users to duplicate objects into twice as many buckets?

Agreed. I will defer the resharding to the consumer. I think the way this could work is by setting up a second "virtual" bucket with the new desired number of shards and doing server side copy from one "virtual bucket" to the other.

If you want to support listing buckets in lexicographical order it will limit your ability to distribute objects evenly between buckets. Maybe you can just raise HTTP 501 NotImplemented and distribute using a hash of the name instead?

What do you think about listing all the shards in parallel and stitching the responses? This would amplify the number of requests by the number of shards, which may be undesirable.

I worry that creating many buckets could trigger rate limiting on some providers. It might make more sense to defer this to an external program to provision.

I think both could be possible: CreateContainer could attempt to create the buckets in parallel, but nothing would preclude the end user from creating the shard buckets themselves, as long as the naming scheme is documented.

I recommend putting a superblock or other kind of metadata in each physical bucket that encodes the overall virtual scheme, e.g., number of buckets, hashing scheme, and a version number.

Good idea.

timuralp added a commit to timuralp/s3proxy that referenced this issue Apr 8, 2021
Adds the sharded bucket middleware, which allows for splitting objects
across multiple backend buckets for a given virtual bucket. The
middleware should be configured as:
s3proxy.sharded-blobstore.<bucket name>.shards=<number of shards>
s3proxy.sharded-blobstore.<bucket name>.prefix=<prefix>.

All shards are named <prefix>-<index>, where index is an
integer from 0 to <number of shards> - 1. If the <prefix> is not
supplied, the <bucket name> is used as the prefix.

Listing the virtual bucket and multipart uploads are not supported. When
listing all containers, the shards are elided from the result.

Fixes gaul#325
Fixes gaul#351
timuralp added a commit to timuralp/s3proxy that referenced this issue Apr 8, 2021
Adds the sharded bucket middleware, which allows for splitting objects
across multiple backend buckets for a given virtual bucket. The
middleware should be configured as:
s3proxy.sharded-blobstore.<bucket name>.shards=<number of shards>
s3proxy.sharded-blobstore.<bucket name>.prefix=<prefix>.

All shards are named <prefix>-<index>, where index is an
integer from 0 to <number of shards> - 1. If the <prefix> is not
supplied, the <bucket name> is used as the prefix.

Listing the virtual bucket and multipart uploads are not supported. When
listing all containers, the shards are elided from the result.

Fixes gaul#325
Fixes gaul#351
timuralp added a commit to timuralp/s3proxy that referenced this issue May 3, 2021
Adds the sharded bucket middleware, which allows for splitting objects
across multiple backend buckets for a given virtual bucket. The
middleware should be configured as:
s3proxy.sharded-blobstore.<bucket name>.shards=<number of shards>
s3proxy.sharded-blobstore.<bucket name>.prefix=<prefix>.

All shards are named <prefix>-<index>, where index is an
integer from 0 to <number of shards> - 1. If the <prefix> is not
supplied, the <bucket name> is used as the prefix.

Listing the virtual bucket and multipart uploads are not supported. When
listing all containers, the shards are elided from the result.

Fixes gaul#325
Fixes gaul#351
timuralp added a commit to timuralp/s3proxy that referenced this issue May 8, 2021
Adds the sharded bucket middleware, which allows for splitting objects
across multiple backend buckets for a given virtual bucket. The
middleware should be configured as:
s3proxy.sharded-blobstore.<bucket name>.shards=<number of shards>
s3proxy.sharded-blobstore.<bucket name>.prefix=<prefix>.

All shards are named <prefix>-<index>, where index is an
integer from 0 to <number of shards> - 1. If the <prefix> is not
supplied, the <bucket name> is used as the prefix.

Listing the virtual bucket and multipart uploads are not supported. When
listing all containers, the shards are elided from the result.

Fixes gaul#325
Fixes gaul#351
timuralp added a commit to timuralp/s3proxy that referenced this issue May 8, 2021
Adds the sharded bucket middleware, which allows for splitting objects
across multiple backend buckets for a given virtual bucket. The
middleware should be configured as:
s3proxy.sharded-blobstore.<bucket name>.shards=<number of shards>
s3proxy.sharded-blobstore.<bucket name>.prefix=<prefix>.

All shards are named <prefix>-<index>, where index is an
integer from 0 to <number of shards> - 1. If the <prefix> is not
supplied, the <bucket name> is used as the prefix.

Listing the virtual bucket and multipart uploads are not supported. When
listing all containers, the shards are elided from the result.

Fixes gaul#325
Fixes gaul#351
@gaul gaul closed this as completed in 0d8f9aa Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants