-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Make ListingTableUrl allow direct construction #12981
base: main
Are you sure you want to change the base?
Conversation
61c4b38
to
5cd4fa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @OussamaSaoudi -- this looks great. I haev a small concern about this causing a regression for certain users. Let me know what you think
where | ||
P: AsRef<str>, | ||
{ | ||
impl DataFilePaths for Vec<&str> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the previous implementation would have allowed Vec<String>
(and anything else) that implemented AsRef<str>
as well and after this change Vec<String>
I wonder if we could change the signature to something like
impl<P> DataFilePaths for Vec<P>
where
P: Into<ListingTableUrl>,
Which should still work for anything that allows AsRef<str>
but also allow Vec<ListingTableUrl>
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think this may fail since AsRef<str>
doesn't implyInto<ListingTableUrl>
. It has to go through a ListingTableUrl::parse
, which is fallible.
I'll give it some more thought and play with the types. Perhaps I'll try to change how SessionContext::read_parquet
handles AsRef<str>
and ListingTableUrl
.
Setting back to draft for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OussamaSaoudi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's a clean way to implement this change without a breaking existing code. Here are a few things I considered:
- I briefly considered changing the
as_str
impl ofListingTableChanges
to make it succeed with aparse
, but this is both a breaking change and pretty silly. - I tried using a blanket implement using TryInto, and use
ListingTableUrl::parse
as aTryInto<ListingTableUrl>
impl. However I hit some issues:
impl<P> DataFilePaths for Vec<P>
where P: TryInto<ListingTableUrl, Error=DataFusionError>
The problem is the associated type for TryInto
. A single type (exListingTableUrl
) can implement TryInto<ListingTableUrl, Error=Infallible>
, or TryInto<ListingTableUrl, Error=DataFusionError>
(ex: through parse
). This of course leads to conflicting types for my blanket impl approach.
- I looked through
SessionContext::read_parquet
. We can't change the use ofDataFilePaths::to_urls()
since that's baked into the API:
pub async fn read_parquet<P: DataFilePaths>(
&self,
table_paths: P,
options: ParquetReadOptions<'_>,
I would've preferred to separate the concerns of parsing and fetching parquet. An approach that parses strings into ListingTableUrl
before calling read_parquet
would be my preferred solution.
If DataFusion is not looking for breaking changes (understandably), I can go ahead and close this PR :)
Which issue does this PR close?
Closes #12581
Rationale for this change
Users of datafusion may have
object_store
paths. However, the currentread_parquet
forces users to format their paths as a string only to be re-parsed into the original paths. Here I provide a way for datafusion users to simply path anobject_store
path that they want to use.What changes are included in this PR?
This PR allows users to construct
ListingTableUrl
directly with aPath
instead of getting the path by parsing theUrl
path. This is useful when the user already has anobject_store::path::Path
, and doesn't want to convert their path to string only for it to be parsed again intry_new
.This PR also changes the
impl<P> for DataFilePaths where P: AsRef<str>
. Previously, passing aVec<ListingTableUrl>
toread_parquet
would convert theListingTableUrl
to&str
and parse it back toVec<ListingTableUrl>
. This causes issues with escape characters. Instead, I split the trait impl into cases forVec<&str>
andVec<ListingTableUrl>
.Are these changes tested?
I add one test to make sure that passing
Vec<ListingTableUrl>
works as expected, and does not cause errors due to escape characters.Are there any user-facing changes?