Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change FileFormat to depends on TaskContext rather than SessionState #5843

Closed
wants to merge 1 commit into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Apr 3, 2023

Which issue does this PR close?

Part of #1754

Rationale for this change

I am trying to extract the physical_plan code into its own crate; and to do so I need to remove the circular dependencies between core --> datasource --> execution --> datasource

Since SessionState is staying in datafusion core crate, in order to pull out datasource into its own crate (and only depend on datafusion_execution) it can not depend on SessionState

Conveniently, TaskContext is designed to be exactly the part of SessionState needed for execution

See more details in #1754 (comment)

What changes are included in this PR?

  1. Change FileFormat to depends on TaskContext rather than SessionState

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

Technically the FileFormat / ListingTable is a public API; However, I don't think the FileFormat is widely used outside of DataFusion so I expect this to have minimal impact on users

I expect the change of TableProvider to use TaskContext to be the most disruptive and I am saving it for last.

@alamb alamb marked this pull request as draft April 3, 2023 11:34
@github-actions github-actions bot added the core Core DataFusion crate label Apr 3, 2023
@alamb alamb marked this pull request as ready for review April 3, 2023 11:41
@alamb alamb marked this pull request as draft April 3, 2023 11:41
@@ -48,7 +48,7 @@ impl FileFormat for AvroFormat {

async fn infer_schema(
&self,
_state: &SessionState,
_task_ctx: &TaskContext,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this PR is to change the SessionState references in these traits to TaskContext

@@ -1176,7 +1178,14 @@ impl QueryPlanner for DefaultQueryPlanner {
}
}

/// Execution context for registering data sources and executing queries
/// Holds state needed for planning queries within the context of a Session.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to clarify SessionState vs TaskContext

@tustvold
Copy link
Contributor

tustvold commented Apr 3, 2023

Perhaps I am missing something here, but given FileFormat and the closely related ListingTable are part of query planning, and not execution, I'm not sure why they would depend on TaskContext which is the execution state?

@alamb
Copy link
Contributor Author

alamb commented Apr 4, 2023

Perhaps I am missing something here, but given FileFormat and the closely related ListingTable are part of query planning, and not execution, I'm not sure why they would depend on TaskContext which is the execution state?

The update here is that @tustvold and I talked through this, and I think we were using different terms / ideas of what "planning" and "execution" mean.

I will attempt to add some more docs to make this clearer

@alamb
Copy link
Contributor Author

alamb commented Jun 12, 2023

I am taking a different direction for now, so closing this PR -- see #1754 for latest plan

@alamb alamb closed this Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants