Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to configure Spark configuration for metatables #447

Closed
yruslan opened this issue Jul 30, 2024 · 0 comments · Fixed by #452
Closed

Allow to configure Spark configuration for metatables #447

yruslan opened this issue Jul 30, 2024 · 0 comments · Fixed by #452
Labels
enhancement New feature or request

Comments

@yruslan
Copy link
Collaborator

yruslan commented Jul 30, 2024

Background

Sometimes it is needed to use 'magic' committer for some tables in the metastore, but to use default Spark configuration to write to other tables.

It could be helpful if special Spark configuration can be specified for individual tables in the metastore.

The configuration should be restored after the write.

Feature

Allow to configure Spark configuration for metatables.

Example

pramen.metastore {
  tables = [
    {
      name = "my_table1"
      format = "parquet"
      path = "s3://bucket1/path1"
    },
    {
      name = "my_table2"
      format = "parquet"
      path = "s3a://bucket2/path2"
      spark.conf {
          spark.sql.sources.commitProtocolClass = "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"
          spark.sql.parquet.output.committer.class = "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"
      }
    }
  ]
}

Proposed Solution

  • Add the ability to specify the config
  • Make sure the Spark configuration is restored after the write
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant