Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 16 additions & 10 deletions docs/faq/lakehouse-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,17 +126,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. Error accessing Iceberg table via Hive Metastore: `failed to get schema` or `Storage schema reading not supported`
1. Accessing Iceberg or Hive table through Hive Catalog reports an error: `failed to get schema` or `Storage schema reading not supported`

Place the relevant `iceberg` runtime jar files in Hive's lib/ directory.

Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After configuration, restart the Hive Metastore.
You can try the following methods:

* Put the `iceberg` runtime-related jar package in the lib/ directory of Hive.

* Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After the configuration is completed, you need to restart the Hive Metastore.

* Add `"get_schema_from_table" = "true"` in the Catalog properties

This parameter is supported since versions 2.1.10 and 3.0.6.

2. Error connecting to Hive Catalog: `Caused by: java.lang.NullPointerException`

Expand Down
16 changes: 15 additions & 1 deletion docs/lakehouse/catalogs/hive-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'fs.defaultFS' = '<fs_defaultfs>', -- optional
{MetaStoreProperties},
{StorageProperties},
{CommonProperties}
{CommonProperties},
{OtherProperties}
);
```

Expand Down Expand Up @@ -78,6 +79,12 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

The CommonProperties section is for entering common attributes. Please see the "Common Properties" section in the [Catalog Overview](../catalog-overview.md).

* `{OtherProperties}`

OtherProperties section is for entering properties related to Hive Catalog.

* `get_schema_from_table`:The default value is false. By default, Doris will obtain the table schema information from the Hive Metastore. However, in some cases, compatibility issues may occur, such as the error `Storage schema reading not supported`. In this case, you can set this parameter to true, and the table schema will be obtained directly from the Table object. But please note that this method will cause the default value information of the column to be ignored. This property is supported since version 2.1.10 and 3.0.6.

### Supported Hive Versions

Supports Hive 1.x, 2.x, 3.x, and 4.x.
Expand Down Expand Up @@ -348,6 +355,13 @@ AS SELECT col1, pt1 AS col2, pt2 AS pt1 FROM test_ctas.part_ctas_src WHERE col1

### Related Parameters

* Session variables

| Parameter name | Default value | Desciption | Since version |
| ----------| ---- | ---- | --- |
| `hive_parquet_use_column_names` | `true` | When Doris reads the Parquet data type of the Hive table, it will find the column with the same name from the Parquet file to read the data according to the column name of the Hive table by default. When this variable is `false`, Doris will read data from the Parquet file according to the column order in the Hive table, regardless of the column name. Similar to the `parquet.column.index.access` variable in Hive. This parameter only applies to the top-level column name and is invalid inside the Struct. | 2.1.6+, 3.0.3+ |
| `hive_orc_use_column_names` | `true` | Similar to `hive_parquet_use_column_names`, it is for the Hive table ORC data type. Similar to the `orc.force.positional.evolution` variable in Hive. | 2.1.6+, 3.0.3+ |

* BE

| Parameter Name | Default Value | Description |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,17 +128,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema reading not supported`
1. 通过 Hive Catalog 访问 Iceberg 或 Hive 表报错:`failed to get schema` 或 `Storage schema reading not supported`

在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。
可以尝试以下方法:

在 `hive-site.xml` 配置:
* 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```
* 在 `hive-site.xml` 配置:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

配置完成后需要重启 Hive Metastore。

* 在 Catalog 属性中添加 `"get_schema_from_table" = "true"`

配置完成后需要重启 Hive Metastore
该参数自 2.1.10 和 3.0.6 版本支持

2. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (
'fs.defaultFS' = '<fs_defaultfs>', -- optional
{MetaStoreProperties},
{StorageProperties},
{CommonProperties}
{CommonProperties},
{OtherProperties}
);
```

Expand Down Expand Up @@ -80,6 +81,12 @@ CREATE CATALOG [IF NOT EXISTS] catalog_name PROPERTIES (

CommonProperties 部分用于填写通用属性。请参阅[ 数据目录概述 ](../catalog-overview.md)中【通用属性】部分。

* `{OtherProperties}`

OtherProperties 部分用于填写和 Hive Catalog 相关的其他参数。

* `get_schema_from_table`:默认为 false。默认情况下,Doris 会从 Hive Metastore 中获取表的 Schema 信息。但某些情况下可能出现兼容问题,如错误 `Storage schema reading not supported`。此时可以将这个参数设置为 true,则会从 Table 对象中直接获取表 Schema。但注意,该方式会导致列的默认值信息被忽略。该参数自 2.1.10 和 3.0.6 版本支持。

### 支持的 Hive 版本

支持 Hive 1.x,2.x,3.x,4.x。
Expand Down Expand Up @@ -357,10 +364,17 @@ AS SELECT col1,pt1 as col2,pt2 as pt1 FROM test_ctas.part_ctas_src WHERE col1>0;

### 相关参数

* BE
* Session 变量

| 参数名称 | 描述 | 默认值 | 版本 |
| ----------| ---- | ---- | --- |
| `hive_parquet_use_column_names` | `true` | Doris 在读取 Hive 表 Parquet 数据类型时,默认会根据 Hive 表的列名从 Parquet 文件中找同名的列来读取数据。当该变量为 `false` 时,Doris 会根据 Hive 表中的列顺序从 Parquet 文件中读取数据,与列名无关。类似于 Hive 中的 `parquet.column.index.access` 变量。该参数只适用于顶层列名,对 Struct 内部无效。 | 2.1.6+, 3.0.3+ |
| `hive_orc_use_column_names` | `true` | 与 `hive_parquet_use_column_names` 类似,针对的是 Hive 表 ORC 数据类型。类似于 Hive 中的 `orc.force.positional.evolution` 变量。 | 2.1.6+, 3.0.3+ |

* BE 配置

| 参数名称 | 默认值 | 描述 |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---- |
| 参数名称 | 描述 | 默认值 |
| ----------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- |
| `hive_sink_max_file_size` | 最大的数据文件大小。当写入数据量超过该大小后会关闭当前文件,滚动产生一个新文件继续写入。 | 1GB |
| `table_sink_partition_write_max_partition_nums_per_writer` | BE 节点上每个 Instance 最大写入的分区数目。 | 128 |
| `table_sink_non_partition_write_scaling_data_processed_threshold` | 非分区表开始 scaling-write 的数据量阈值。每增加 `table_sink_non_partition_write_scaling_data_processed_threshold` 数据就会发送给一个新的 writer(instance) 进行写入。scaling-write 机制主要是为了根据数据量来使用不同数目的 writer(instance) 来进行写入,会随着数据量的增加而增大写入的 writer(instance) 数目,从而提高并发写入的吞吐。当数据量比较少的时候也会节省资源,并且尽可能地减少产生的文件数目。 | 25MB |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,17 +128,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema reading not supported`
1. 通过 Hive Catalog 访问 Iceberg 或 Hive 表报错:`failed to get schema` 或 `Storage schema reading not supported`

在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。
可以尝试以下方法:

在 `hive-site.xml` 配置:
* 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```
* 在 `hive-site.xml` 配置:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

配置完成后需要重启 Hive Metastore。

* 在 Catalog 属性中添加 `"get_schema_from_table" = "true"`

配置完成后需要重启 Hive Metastore
该参数自 2.1.10 和 3.0.6 版本支持

2. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -128,17 +128,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema reading not supported`
1. 通过 Hive Catalog 访问 Iceberg 或 Hive 表报错:`failed to get schema` 或 `Storage schema reading not supported`

在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。
可以尝试以下方法:

在 `hive-site.xml` 配置:
* 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```
* 在 `hive-site.xml` 配置:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

配置完成后需要重启 Hive Metastore。

* 在 Catalog 属性中添加 `"get_schema_from_table" = "true"`

配置完成后需要重启 Hive Metastore
该参数自 2.1.10 和 3.0.6 版本支持

2. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`

Expand Down
26 changes: 16 additions & 10 deletions versioned_docs/version-2.1/faq/lakehouse-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,17 +126,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. Error accessing Iceberg table via Hive Metastore: `failed to get schema` or `Storage schema reading not supported`
1. Accessing Iceberg or Hive table through Hive Catalog reports an error: `failed to get schema` or `Storage schema reading not supported`

Place the relevant `iceberg` runtime jar files in Hive's lib/ directory.

Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After configuration, restart the Hive Metastore.
You can try the following methods:

* Put the `iceberg` runtime-related jar package in the lib/ directory of Hive.

* Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After the configuration is completed, you need to restart the Hive Metastore.

* Add `"get_schema_from_table" = "true"` in the Catalog properties

This parameter is supported since versions 2.1.10 and 3.0.6.

2. Error connecting to Hive Catalog: `Caused by: java.lang.NullPointerException`

Expand Down
26 changes: 16 additions & 10 deletions versioned_docs/version-3.0/faq/lakehouse-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,17 +126,23 @@ ln -s /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt /etc/ssl/certs/ca-

## Hive Catalog

1. Error accessing Iceberg table via Hive Metastore: `failed to get schema` or `Storage schema reading not supported`
1. Accessing Iceberg or Hive table through Hive Catalog reports an error: `failed to get schema` or `Storage schema reading not supported`

Place the relevant `iceberg` runtime jar files in Hive's lib/ directory.

Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After configuration, restart the Hive Metastore.
You can try the following methods:

* Put the `iceberg` runtime-related jar package in the lib/ directory of Hive.

* Configure in `hive-site.xml`:

```
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
```

After the configuration is completed, you need to restart the Hive Metastore.

* Add `"get_schema_from_table" = "true"` in the Catalog properties

This parameter is supported since versions 2.1.10 and 3.0.6.

2. Error connecting to Hive Catalog: `Caused by: java.lang.NullPointerException`

Expand Down