Skip to content

Conversation

@xylaaaaa
Copy link
Contributor

@xylaaaaa xylaaaaa commented Dec 31, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Currently, Doris supports multiple Iceberg catalog types (HMS, REST, Hadoop, Glue, DLF, S3Tables) but lacks support for JDBC Catalog. This PR adds support for Iceberg JDBC Catalog, which allows users to store Iceberg metadata in relational databases like PostgreSQL, MySQL, and SQLite.

Key Changes:

  • Added IcebergJdbcMetaStoreProperties class to handle JDBC catalog configurations
  • Added IcebergJdbcExternalCatalog class for JDBC catalog operations
  • Integrated JDBC catalog with existing Iceberg framework (factory registration, Gson serialization)
  • Added support for all storage systems (S3, HDFS, OSS, etc.) with JDBC catalog
  • Added proper S3FileIO configuration for S3-compatible storage

Benefits:

  • Provides an alternative metadata storage option using relational databases
  • Supports database transactions for better concurrency control
  • Easier deployment compared to HMS for users without existing Hive infrastructure
  • Better integration with existing database infrastructure

Release note

Features

  • [Iceberg] Support Iceberg JDBC Catalog for metadata storage in relational databases (PostgreSQL, MySQL, SQLite)

Check List (For Author)

  • Test
    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason

Manual Test Steps:

  1. Setup PostgreSQL database with user and database
  2. Create JDBC catalog with PostgreSQL backend and S3 storage
  3. Test database and table operations (CREATE DATABASE/TABLE, INSERT, SELECT)
  4. Verify metadata is correctly stored in PostgreSQL tables
  5. Verify data files are correctly written to S3 storage
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings December 31, 2025 02:24
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for Iceberg JDBC Catalog, enabling metadata storage in relational databases (PostgreSQL, MySQL, SQLite) as an alternative to HMS or other catalog types. The implementation follows the existing pattern used by other Iceberg catalog types and integrates seamlessly with Doris's catalog framework.

Key Changes:

  • Added JDBC catalog type with comprehensive property handling and S3-compatible storage support
  • Integrated JDBC catalog into factory registration and serialization infrastructure
  • Added unit tests for property validation and JDBC-specific configuration passthrough

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
IcebergJdbcMetaStoreProperties.java Core implementation handling JDBC catalog properties, URI configuration, and storage integration
IcebergJdbcExternalCatalog.java External catalog class following standard pattern for JDBC catalog type
IcebergJdbcMetaStorePropertiesTest.java Unit tests validating property handling, passthrough, and required field checks
IcebergPropertiesFactory.java Registered "jdbc" catalog type in the factory
IcebergExternalCatalogFactory.java Added JDBC catalog creation in the factory switch statement
GsonUtils.java Registered IcebergJdbcExternalCatalog for JSON serialization
IcebergScanNode.java Added JDBC catalog type to supported scan sources
IcebergExternalCatalog.java Added ICEBERG_JDBC constant definition

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 157 to 191
private static void toFileIOProperties(List<StorageProperties> storagePropertiesList,
Map<String, String> fileIOProperties, Configuration conf) {
for (StorageProperties storageProperties : storagePropertiesList) {
if (storageProperties instanceof AbstractS3CompatibleProperties) {
toS3FileIOProperties((AbstractS3CompatibleProperties) storageProperties, fileIOProperties);
} else if (storageProperties.getHadoopStorageConfig() != null) {
conf.addResource(storageProperties.getHadoopStorageConfig());
}
}
}

private static void toS3FileIOProperties(AbstractS3CompatibleProperties s3Properties,
Map<String, String> options) {
// Set S3FileIO as the FileIO implementation for S3-compatible storage
options.put(CatalogProperties.FILE_IO_IMPL, "org.apache.iceberg.aws.s3.S3FileIO");

if (StringUtils.isNotBlank(s3Properties.getEndpoint())) {
options.put(S3FileIOProperties.ENDPOINT, s3Properties.getEndpoint());
}
if (StringUtils.isNotBlank(s3Properties.getUsePathStyle())) {
options.put(S3FileIOProperties.PATH_STYLE_ACCESS, s3Properties.getUsePathStyle());
}
if (StringUtils.isNotBlank(s3Properties.getRegion())) {
options.put(AwsClientProperties.CLIENT_REGION, s3Properties.getRegion());
}
if (StringUtils.isNotBlank(s3Properties.getAccessKey())) {
options.put(S3FileIOProperties.ACCESS_KEY_ID, s3Properties.getAccessKey());
}
if (StringUtils.isNotBlank(s3Properties.getSecretKey())) {
options.put(S3FileIOProperties.SECRET_ACCESS_KEY, s3Properties.getSecretKey());
}
if (StringUtils.isNotBlank(s3Properties.getSessionToken())) {
options.put(S3FileIOProperties.SESSION_TOKEN, s3Properties.getSessionToken());
}
}
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods toFileIOProperties and toS3FileIOProperties are duplicated from IcebergRestProperties with only minor differences (the problematic FILE_IO_IMPL line). Consider extracting these common methods to the AbstractIcebergProperties base class to eliminate code duplication and ensure consistency across all Iceberg catalog implementations. This would make maintenance easier and prevent similar issues in the future.

Copilot uses AI. Check for mistakes.
Comment on lines 170 to 172
// Set S3FileIO as the FileIO implementation for S3-compatible storage
options.put(CatalogProperties.FILE_IO_IMPL, "org.apache.iceberg.aws.s3.S3FileIO");

Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting FILE_IO_IMPL unconditionally to S3FileIO when S3 storage is detected may break functionality for non-S3 storage types. Looking at IcebergRestProperties, the toS3FileIOProperties method does not set FILE_IO_IMPL and relies on Iceberg's automatic FileIO selection via CatalogUtil.buildIcebergCatalog. This line should be removed to maintain consistency with other catalog implementations and allow Iceberg to handle FileIO selection automatically based on the storage properties provided.

Suggested change
// Set S3FileIO as the FileIO implementation for S3-compatible storage
options.put(CatalogProperties.FILE_IO_IMPL, "org.apache.iceberg.aws.s3.S3FileIO");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants