-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple FileFormat from datafusion_data_access #2572
Conversation
file_meta: local::local_unpartitioned_file(file), | ||
partition_values: vec![], | ||
range: None, | ||
impl From<FileMeta> for PartitionedFile { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This not only makes for reduced verbosity, but avoids a confusing situation where there are two local_unpartitioned_file, one which returns FileMeta and one which returns PartitionedFile
) | ||
.await?; | ||
Ok(exec) | ||
get_exec_format(&format, &store_root, file_name, projection, limit).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a drive-by-fix to extract the common scanning logic that was duplicated for each format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for other readers, it is common logic to all tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look reasonable to me but I am not very familiar with this part of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me; I had a question about &dyn ObjectStore
vs &Arc<dyn ObjectStore>
Also since several other systems are starting to use the object store, and the parquet reader, it might make sense for someone from those projects to take a quick look and make sure this doesn't mess up what they are doing.
async fn infer_schema(&self, readers: ObjectReaderStream) -> Result<SchemaRef>; | ||
async fn infer_schema( | ||
&self, | ||
store: &dyn ObjectStore, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
store: &dyn ObjectStore, | |
store: &Arc<dyn ObjectStore>, |
Seems reasonable to me |
cc @richox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending a small conflict to resolve and switching back to &Arc<dyn ObjectStore>
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's 🚀
.github/pull_request_template.md
Outdated
@@ -25,3 +25,23 @@ If there are user-facing changes then we may require documentation to be updated | |||
<!-- | |||
If there are any breaking changes to public APIs, please add the `api change` label. | |||
--> | |||
|
|||
# Does this PR break compatibility with Ballista? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it doesn't belong in the PR 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's from the merge commit where ballista was removed
Which issue does this PR close?
Part of #2489
Rationale for this change
The IOx ObjectStore does not have an equivalent notion of ObjectReader, decoupling FileFormat from such notions is therefore a necessary precondition of any migration. I also think it makes the API significantly easier to use
What changes are included in this PR?
Reworks FileFormat to be in terms of FileMeta and ObjectStore, instead of ObjectReader and ObjectReaderStream. These concepts translate cleanly, with FileMeta -> ObjectMeta.
Are there any user-facing changes?
Yes, FileFormat is a public API