Allow to configure Spark configuration for metatables #447

yruslan · 2024-07-30T07:02:22Z

Background

Sometimes it is needed to use 'magic' committer for some tables in the metastore, but to use default Spark configuration to write to other tables.

It could be helpful if special Spark configuration can be specified for individual tables in the metastore.

The configuration should be restored after the write.

Feature

Allow to configure Spark configuration for metatables.

Example

pramen.metastore {
  tables = [
    {
      name = "my_table1"
      format = "parquet"
      path = "s3://bucket1/path1"
    },
    {
      name = "my_table2"
      format = "parquet"
      path = "s3a://bucket2/path2"
      spark.conf {
          spark.sql.sources.commitProtocolClass = "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol"
          spark.sql.parquet.output.committer.class = "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter"
      }
    }
  ]
}

Proposed Solution

Add the ability to specify the config
Make sure the Spark configuration is restored after the write

yruslan added the enhancement New feature or request label Jul 30, 2024

yruslan added a commit that referenced this issue Aug 1, 2024

#447 Add Spark configuration for metastore table definitions.

0e17269

yruslan added a commit that referenced this issue Aug 1, 2024

#447 Add Spark configuration for metastore table definitions.

1740b66

yruslan mentioned this issue Aug 2, 2024

#447 Add spark.conf option configurable for metastore tables #452

Merged

yruslan closed this as completed in #452 Aug 2, 2024

yruslan added a commit that referenced this issue Aug 2, 2024

#447 Add Spark configuration for metastore table definitions.

a9558d9

yruslan mentioned this issue Aug 5, 2024

Release Pramen v1.9.3 #453

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to configure Spark configuration for metatables #447

Allow to configure Spark configuration for metatables #447

yruslan commented Jul 30, 2024

Allow to configure Spark configuration for metatables #447

Allow to configure Spark configuration for metatables #447

Comments

yruslan commented Jul 30, 2024

Background

Feature

Example

Proposed Solution