Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/lakehouse/catalogs/hive-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,7 @@ Hive transactional tables are supported from version 3.x onwards. For details, r
'glue.secret_key' = '<sk>'
);
```

When Glue service authentication information differs from S3 authentication information, you can specify S3 authentication information separately in the following way.
```sql
CREATE CATALOG hive_glue_on_s3_catalog PROPERTIES (
Expand All @@ -489,6 +490,16 @@ Hive transactional tables are supported from version 3.x onwards. For details, r
's3.secret_key' = '<sk>'
);
```

Using IAM Assumed Role to obtain S3 access credentials (Since 3.1.2+)
```sql
CREATE CATALOG `glue_hive_iamrole` PROPERTIES (
'type' = 'hms',
'hive.metastore.type' = 'glue',
'glue.region' = 'us-east-1',
'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
'glue.role_arn' = '<role_arn>'
);
</TabItem>
</Tabs>
</details>
Expand Down
50 changes: 50 additions & 0 deletions docs/lakehouse/catalogs/iceberg-catalog.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,43 @@ The current Iceberg dependency is version 1.6.1, which is compatible with higher
>
> You can check whether the source type has timezone information in the Extra column of the `DESCRIBE table_name` statement. If it shows `WITH_TIMEZONE`, it indicates that the source type is a timezone-aware type. (Supported since 3.1.0).

## Namespace Mapping

Iceberg's metadata hierarchy is Catalog -> Namespace -> Table. Namespace can have multiple levels (Nested Namespace).

```
┌─────────┐
│ Catalog │
└────┬────┘
┌─────┴─────┐
┌──▼──┐ ┌──▼──┐
│ NS1 │ │ NS2 │
└──┬──┘ └──┬──┘
│ │
┌────▼───┐ ┌──▼──┐
│ Table1 │ │ NS3 │
└────────┘ └──┬──┘
┌──────┴───────┐
┌────▼───┐ ┌────▼───┐
│ Table2 │ │ Table3 │
└────────┘ └────────┘
```


Starting from version 3.1.2, for Iceberg Rest Catalog, Doris supports mapping of Nested Namespace.

In the above example, tables will be mapped to Doris metadata according to the following logic:

| Catalog | Database | Table |
| --- | --- | --- |
| Catalog | NS1 | Table1 |
| Catalog | NS2.NS3 | Table2 |
| Catalog | NS2.NS3 | Table3 |

Support for Nested Namespace needs to be explicitly enabled. For details, please refer to [Iceberg Rest Catalog](../metastores/iceberg-rest.md)

## Examples

### Hive Metastore
Expand Down Expand Up @@ -469,6 +506,7 @@ The current Iceberg dependency is version 1.6.1, which is compatible with higher
'glue.secret_key' = '<sk>'
);
```

When Glue service authentication credentials differ from S3 authentication credentials, you can specify S3 authentication credentials separately using the following method.
```sql
CREATE CATALOG `iceberg_glue_on_s3_catalog_` PROPERTIES (
Expand All @@ -485,6 +523,18 @@ The current Iceberg dependency is version 1.6.1, which is compatible with higher
's3.secret_key' = '<sk>'
);
```

Using IAM Assumed Role to obtain S3 access credentials (Since 3.1.2+)
```sql
CREATE CATALOG `glue_iceberg_iamrole` PROPERTIES (
'type' = 'iceberg',
'iceberg.catalog.type' = 'glue',
'warehouse' = 's3://bucket/warehouse',
'glue.region' = 'us-east-1',
'glue.endpoint' = 'https://glue.us-east-1.amazonaws.com',
'glue.role_arn' = '<role_arn>'
);
```
</TabItem>
</Tabs>
</details>
Expand Down
230 changes: 182 additions & 48 deletions docs/lakehouse/metastores/aws-glue.md

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions docs/lakehouse/metastores/iceberg-rest.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ This document describes the supported parameters when connecting to and accessin
| iceberg.rest.oauth2.credential | | `oauth2` credentials used to access `server-uri` to obtain token | - | No |
| iceberg.rest.oauth2.server-uri | | URI address for obtaining `oauth2` token, used in conjunction with `iceberg.rest.oauth2.credential` | - | No |
| iceberg.rest.vended-credentials-enabled | | Whether to enable `vended-credentials` functionality. When enabled, it will obtain storage system access credentials such as `access-key` and `secret-key` from the rest server, eliminating the need for manual specification. Requires rest server support for this capability. | `false` | No |
| iceberg.rest.nested-namespace-enabled | | (Supported since version 3.1.2+) Whether to enable support for Nested Namespace. Default is `false`. If `true`, Nested Namespace will be flattened and displayed as Database names, such as `parent_ns.child_ns`. Some Rest Catalog services do not support Nested Namespace, such as AWS Glue, so this parameter should be set to `false` | No |

> Note:
>
Expand All @@ -28,6 +29,27 @@ This document describes the supported parameters when connecting to and accessin
>
> 3. For AWS Glue Rest Catalog, please refer to the [AWS Glue documentation](./aws-glue.md)

## Nested Namespace

Since 3.1.2, to fully access Nested Namespace, in addition to setting `iceberg.rest.nested-namespace-enabled` to `true` in the Catalog properties, you also need to enable the following global parameter:

```
SET GLOBAL enable_nested_namespace=true;
```

Assuming the Catalog is "ice", Namespace is "ns1.ns2", and Table is "tbl1", you can access Nested Namespace in the following ways:

```sql
mysql> USE ice.ns1.ns2;
mysql> SELECT k1 FROM ice.`ns1.ns2`.tbl1;
mysql> SELECT tbl1.k1 FROM `ns1.ns2`.tbl1;
mysql> SELECT `ns1.ns2`.tbl1.k1 FROM ice.`ns1.ns2`.tbl1;
mysql> SELECT ice.`ns1.ns2`.tbl1.k1 FROM tbl1;
mysql> REFRESH CATALOG ice;
mysql> REFRESH DATABASE ice.`ns1.ns2`;
mysql> REFRESH TABLE ice.`ns1.ns2`.tbl1;
```

## Example Configurations

- Rest Catalog service without authentication
Expand Down Expand Up @@ -111,6 +133,43 @@ This document describes the supported parameters when connecting to and accessin
);
```

- Connecting to Snowflake Open Catalog (Since 3.1.2)

```sql
-- Enable vended-credentials
CREATE CATALOG snowflake_open_catalog PROPERTIES (
'type' = 'iceberg',
'warehouse' = '<catalog_name>',
'iceberg.catalog.type' = 'rest',
'iceberg.rest.uri' = 'https://<open_catalog_account>.snowflakecomputing.com/polaris/api/catalog',
'iceberg.rest.security.type' = 'oauth2',
'iceberg.rest.oauth2.credential' = '<client_id>:<client_secret>',
'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:<principal_role>',
'iceberg.rest.vended-credentials-enabled' = 'true',
's3.endpoint' = 'https://s3.us-west-2.amazonaws.com',
's3.region' = 'us-west-2',
'iceberg.rest.nested-namespace-enabled' = 'true'
);
```

```sql
-- Disable vended-credentials
CREATE CATALOG snowflake_open_catalog PROPERTIES (
'type' = 'iceberg',
'warehouse' = '<catalog_name>',
'iceberg.catalog.type' = 'rest',
'iceberg.rest.uri' = 'https://<open_catalog_account>.snowflakecomputing.com/polaris/api/catalog',
'iceberg.rest.security.type' = 'oauth2',
'iceberg.rest.oauth2.credential' = '<client_id>:<client_secret>',
'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:<principal_role>',
's3.access_key' = '<ak>',
's3.secret_key' = '<sk>',
's3.endpoint' = 'https://s3.us-west-2.amazonaws.com',
's3.region' = 'us-west-2',
'iceberg.rest.nested-namespace-enabled' = 'true'
);
```

- Connecting to Apache Gravitino Rest Catalog

```sql
Expand Down
111 changes: 95 additions & 16 deletions docs/lakehouse/storages/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,18 @@ This document describes the parameters required for accessing AWS S3. These para

## Parameter Overview

| Property Name | Legacy Name | Description | Default Value | Required |
|------------------------------|-------------|-------------------------------------------------|---------------|----------|
| s3.endpoint | | S3 service access endpoint, e.g., s3.us-east-1.amazonaws.com | None | No |
| s3.access_key | | AWS Access Key for authentication | None | No |
| s3.secret_key | | AWS Secret Key for authentication | None | No |
| s3.region | | S3 region, e.g., us-east-1. Highly recommended to configure | None | Yes |
| s3.use_path_style | | Whether to use path-style access | FALSE | No |
| s3.connection.maximum | | Maximum number of connections for high concurrency scenarios | 50 | No |
| s3.connection.request.timeout| | Request timeout in milliseconds for connection acquisition | 3000 | No |
| s3.connection.timeout | | Connection establishment timeout in milliseconds | 1000 | No |
| s3.role_arn | | Role ARN when using Assume Role mode | None | No |
| s3.external_id | | External ID used with s3.role_arn | None | No |
| Property Name | Legacy Name | Description | Default | Required |
|------------------------------|-------------|--------------------------------------------------|---------|----------|
| s3.endpoint | | S3 service access endpoint, e.g., s3.us-east-1.amazonaws.com | None | No |
| s3.access_key | | AWS Access Key for authentication | None | No |
| s3.secret_key | | AWS Secret Key for authentication | None | No |
| s3.region | | S3 region, e.g., us-east-1. Strongly recommended | None | Yes |
| s3.use_path_style | | Whether to use path-style access | FALSE | No |
| s3.connection.maximum | | Maximum number of connections for high concurrency scenarios | 50 | No |
| s3.connection.request.timeout| | Request timeout (milliseconds), controls connection acquisition timeout | 3000 | No |
| s3.connection.timeout | | Connection establishment timeout (milliseconds) | 1000 | No |
| s3.role_arn | | Role ARN specified when using Assume Role mode | None | No |
| s3.external_id | | External ID used with s3.role_arn | None | No |

## Authentication Configuration

Expand All @@ -41,7 +41,7 @@ Doris supports the following two methods to access S3:
"s3.region"="us-east-1"
```

2. Assume Role
2. Assume Role Mode

Suitable for cross-account and temporary authorization access. Automatically obtains temporary credentials through role authorization.

Expand All @@ -52,13 +52,13 @@ Doris supports the following two methods to access S3:
"s3.region"="us-east-1"
```

> If both Access Key and Role ARN are configured, Access Key mode takes priority.
> If both Access Key and Role ARN are configured, Access Key mode takes precedence.

## Accessing S3 Directory Bucket

> This feature is supported since version 3.1.0.

Amazon S3 Express One Zone (also known as Directory Bucket) provides higher performance but has a different endpoint format.
Amazon S3 Express One Zone (also known as Directory Bucket) provides higher performance, but has a different endpoint format.

* Regular bucket: s3.us-east-1.amazonaws.com
* Directory Bucket: s3express-usw2-az1.us-west-2.amazonaws.com
Expand All @@ -71,5 +71,84 @@ Example:
"s3.access_key"="ak",
"s3.secret_key"="sk",
"s3.endpoint"="s3express-usw2-az1.us-west-2.amazonaws.com",
"s3.region"="us-west
"s3.region"="us-west-2"
```

## Permission Policies

Depending on the use case, permissions can be categorized into **read-only** and **read-write** policies.

### 1. Read-only Permissions

Only allows reading objects from S3. Suitable for LOAD, TVF, querying EXTERNAL CATALOG, and other scenarios.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
],
"Resource": "arn:aws:s3:::<your-bucket>/your-prefix/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::<your-bucket>"
}
]
}
```

### 2. Read-write Permissions

Based on read-only permissions, additionally allows deleting, creating, and modifying objects. Suitable for EXPORT, OUTFILE, and EXTERNAL CATALOG write-back scenarios.

```json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::<your-bucket>/<your-prefix>/*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:GetBucketVersioning",
"s3:GetLifecycleConfiguration"
],
"Resource": "arn:aws:s3:::<your-bucket>"
}
]
}
```

### Notes

1. Placeholder Replacement

- `<bucket>` → Your S3 Bucket name.
- `<account-id>` → Your AWS account ID (12-digit number).

2. Principle of Least Privilege

- If only querying, do not grant write permissions.

Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ The `EXPORT` command is used to export data from a specified table to files at a

- `timeout`: Timeout for export job, default is 2 hours, unit is seconds.

- `compress_type`: (Supported since 2.1.5) When specifying the export file format as Parquet / ORC files, you can specify the compression method used by Parquet / ORC files. Parquet file format can specify compression methods as SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and PLAIN, with default value SNAPPY. ORC file format can specify compression methods as PLAIN, SNAPPY, ZLIB, and ZSTD, with default value ZLIB. This parameter is supported starting from version 2.1.5. (PLAIN means no compression)
- `compress_type`: (Supported since 2.1.5) When specifying the export file format as Parquet / ORC files, you can specify the compression method used by Parquet / ORC files. Parquet file format can specify compression methods as SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and PLAIN, with default value SNAPPY. ORC file format can specify compression methods as PLAIN, SNAPPY, ZLIB, and ZSTD, with default value ZLIB. This parameter is supported starting from version 2.1.5. (PLAIN means no compression). Starting from version 3.1.1, supports specifying compression algorithms for CSV format, currently supports "plain", "gz", "bz2", "snappyblock", "lz4block", "zstd".

:::caution Note
To use the delete_existing_files parameter, you also need to add the configuration `enable_delete_existing_files = true` in fe.conf and restart fe, then delete_existing_files will take effect. delete_existing_files = true is a dangerous operation, it's recommended to use only in test environments.
Expand Down
Loading