Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SymlinkTextInputFormat FileFormat #2208

Open
tustvold opened this issue Apr 12, 2022 · 0 comments
Open

Support SymlinkTextInputFormat FileFormat #2208

tustvold opened this issue Apr 12, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@tustvold
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Hive compatible metastores, such as AWS Glue (#2206) do not store the individual files within a partition, and instead rely on listing the files in object storage at query time.

This becomes problematic when interacting with data that is either:

  • Not partitioned in the way that Hive expects
  • Rewrites data leaving parquet files behind that no longer form part of the most recent snapshot (e.g. Delta Lake / IOx)

Describe the solution you'd like

Much like we currently support a FileFormat of CSV or Parquet, I would like to support a FileFormat of SymlinkTextInputFormat. This is just a newline-delimited list of files, stored in object storage alongside a table or partition.

The best documentation for this functionality I can find is here, and there is documentation here on how it is used to enable inter-operation between Presto and Data Lake.

I'm not entirely sure how the query engine determines the format of the symlink targets, but I guess it must use the file suffix??

Describe alternatives you've considered

We could not support this

Additional context

I am not hugely familiar with the precise inner-workings of the Hive ecosystem, as I've only interacted with tooling that uses it under-the-hood. I therefore could be mistaken on some aspect, if so please feel free to correct me 😄

@tustvold tustvold added the enhancement New feature or request label Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant