-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for sub-directories in object stores #472
Conversation
d0b4959
to
5fb7156
Compare
src/config/context.rs
Outdated
@@ -85,6 +84,8 @@ pub fn build_object_store( | |||
bucket, | |||
cache_properties, | |||
}) => { | |||
// Bucket from the config may contain some path so extract the proper bucket | |||
let bucket = bucket.split('/').next().unwrap_or(bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with overloading the bucket config param so that it can also hold a path to stay backward-compatible.
But I think the preferable way to do this is to ditch the type
and bucket
fields and replace them with a location
field which can store all the relevant info (i.e. type, bucket and optionally path) at the expense of breaking backward-compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could keep the bucket
and have a separate prefix
field that is optional and defaults to no prefix where it doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that a separate prefix
field wasn't worth the increased config complexity, whereas the location
approach felt more consolidated than either alternative.
To me it seems like it's natural to think about the full url location
as one thing from the user perspective, and leave the parsing of the individual details (type, bucket, path) to Seafowl. This would also neatly get rid of the endpoint
field on S3, as this is covered by location
too. Moreover, currently we don't support passing the bucket/path in the endpoint
, though in principle there's no reason not to. GlareDB uses the location
+ storage options map approach and does away with config file completely due to the conciseness of this approach, though this wouldn't work for Seafowl as we have a bunch of other configurations as well.
To clarify, I definitely do prefer a separate path
(or prefix
) field over overloading the bucket
field, I just think location
would be even better from a usability standpoint. Let me know what are your preferences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I made the change to use the prefix
field for now as that's good enough for a merge I believe, but we can revisit whether it makes sense to go with the location
approach later on.
src/config/context.rs
Outdated
@@ -85,6 +84,8 @@ pub fn build_object_store( | |||
bucket, | |||
cache_properties, | |||
}) => { | |||
// Bucket from the config may contain some path so extract the proper bucket | |||
let bucket = bucket.split('/').next().unwrap_or(bucket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could keep the bucket
and have a separate prefix
field that is optional and defaults to no prefix where it doesn't exist?
// In principle for this test we could use any object store since we only exercise the | ||
// prefix/log store uri logic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're worried about this, you could break out table_prefix
into a separate function that operates on (schema::ObjectStore, Url) -> Url
and test it without having to create an object store (don't know how much it's worth it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really worried, intended more like a clarification. The test also exercises the log store uri, which does need some object store.
src/object_store/wrapped.rs
Outdated
// In case when there's an endpoint specified, the bucket is not the host but the | ||
// first element in the path, so we need to discard it when creating a prefix. | ||
let mut segments = | ||
self.root_uri.path_segments().unwrap().collect::<Vec<_>>(); | ||
|
||
// Add the last path segment which is the table UUID | ||
let uuid_str = table_uuid.to_string(); | ||
segments.push(&uuid_str); | ||
|
||
// Remove the first path segment which is actually the bucket | ||
segments.remove(0); | ||
Path::from(segments.join("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe having a prefix
as a separate config field could simplify this logic or avoid having to do this on every query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in this case it would.
ObjectStoreType::InMemory, | ||
ObjectStoreType::Local, | ||
ObjectStoreType::S3(None), | ||
ObjectStoreType::S3(Some("/path/to/folder")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with or without the leading slash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, works in both cases, the test_table_location_s3
actually covers that.
c28c067
to
ae09ec3
Compare
…g the bucket field
This will allow configuring a specific path alongside a bucket where to store the data in.
Closes #470.