-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove get_scan_files and ExecutionPlan::file_scan_config (#7357) #7487
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Let's get @not-my-profile's view on that, i.e. if he has a good reason to keep this API.
The question is if @yahoNanJing's motivation still applies:
As I said in #7425 (comment) I'm not familiar enough with datafusion to determine that.
I think that #7485 is a decent solution if we decide to keep the function. |
It appears that |
It appears to only be used as an argument for https://github.com/apache/arrow-ballista/blob/948d0777f972144cc242d0398fd61fadf34cec73/ballista/scheduler/src/cluster/mod.rs#L679 which seems like it could definitely be achieved in a different way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked more into the ballista code and I think the files are used for more than just their file count. Specifically the URL is used here:
I am thinking, however, that we could effectively port get_files_for_scan into Ballista... Let me give that a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this API into Ballista apache/datafusion-ballista#877 so I think it is ok to remove from DataFusion. @yahoNanJing please advise if there are reasons to avoid this
Which issue does this PR close?
Closes #7357
Rationale for this change
This was added by @yahoNanJing in #5572. It was then evolved in #7175 to try to avoid strongly coupling to AvroExec. However, this is causing issues trying to split apart the crates (#7357) and it is unclear how best to adapt the design #7425 (comment).
Given this API does not appear to be being used, I wonder if we can do the simplest possible thing, and just remove it
What changes are included in this PR?
Removes the API
Are these changes tested?
Are there any user-facing changes?