Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement DynamicTableProvider in DataFusion Core #10986

Closed
goldmedal opened this issue Jun 18, 2024 · 4 comments · Fixed by #11035
Closed

Implement DynamicTableProvider in DataFusion Core #10986

goldmedal opened this issue Jun 18, 2024 · 4 comments · Fixed by #11035
Assignees
Labels
enhancement New feature or request

Comments

@goldmedal
Copy link
Contributor

Is your feature request related to a problem or challenge?

I had some discussions with @alamb about supporting a dynamic file data source (select ... from 'select .. from 'data.parquet' like #4805) in the core, as mentioned in #4850 (comment). However, we found that it's not a good idea to move so many dependencies (e.g., S3-related) to the core crate after #10745.

Describe the solution you'd like

As @alamb proposed in #10745 (comment), we can focus first on the logic that interprets table names as potential object store locations. Implement a struct DynamicTableProvider and a trait called UrlLookup to get ObjectStore at runtime.

struct DynamicTableProvider {
  // ...
  /// A callback function that is 
  url_lookup: Arc<dyn UrlLookup>
}

/// Trait for looking up the correct object store instance based on URL
pub trait UrlLookup {
  fn lookup(&self, url: &Url) -> Result<Arc<dyn ObjectStore>>;
}

By default, DynamicTableProvider only supports querying local file paths like file:///.... The implementation of dynamic file queries in datafusion-cli might also be based on DynamicTableProvider but will load the common object storage dependency by default.

Describe alternatives you've considered

No response

Additional context

No response

@goldmedal goldmedal added the enhancement New feature or request label Jun 18, 2024
@goldmedal
Copy link
Contributor Author

take

@alamb
Copy link
Contributor

alamb commented Jun 18, 2024

Thank you @goldmedal

@goldmedal
Copy link
Contributor Author

Hi @alamb,

I created a draft PR for this issue in #11035. After some experiments, I think passing only ObjectStore isn't enough for creating a TableProvider at runtime. We need to build the schema from a full SessionState.

Although there are many issues that need to be fixed, could you take a look at this PR to check if this idea makes sense when you're available?

Thanks.

@goldmedal
Copy link
Contributor Author

I have finished the PR but I think there're two follow-up issues needed to be filed:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants