From 9c0d6034eb80ad529b17bb6445d36f689b47b921 Mon Sep 17 00:00:00 2001 From: Ti Chi Robot Date: Fri, 3 Nov 2023 16:26:09 +0800 Subject: [PATCH] add snappy restriction note (#15241) (#15253) --- dumpling-overview.md | 4 ++++ storage-engine/titan-overview.md | 1 + ticdc/ticdc-sink-to-kafka.md | 2 +- tidb-lightning/tidb-lightning-data-source.md | 5 +++-- tidb-lightning/troubleshoot-tidb-lightning.md | 6 +++++- tikv-configuration-file.md | 4 ++++ tune-tikv-memory-performance.md | 2 +- 7 files changed, 19 insertions(+), 5 deletions(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index 578bf51e71357..21ff053e115da 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -177,6 +177,10 @@ You can use the `--compress ` option to compress the CSV and SQL data an - This option can save disk space, but it also slows down the export speed and increases CPU consumption. Use this option with caution in scenarios where the export speed is critical. - For TiDB Lightning v6.5.0 and later versions, you can use compressed files exported by Dumpling as the data source without additional configuration. +> **Note:** +> +> The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. + ### Format of exported files - `metadata`: The start time of the exported files and the position of the master binary log. diff --git a/storage-engine/titan-overview.md b/storage-engine/titan-overview.md index 67fde1da882e6..ae7adb16d208a 100644 --- a/storage-engine/titan-overview.md +++ b/storage-engine/titan-overview.md @@ -54,6 +54,7 @@ A blob file mainly consists of blob records, meta blocks, a meta index block, an > + The Key-Value pairs in the blob file are stored in order, so that when the Iterator is implemented, the sequential reading performance can be improved via prefetching. > + Each blob record keeps a copy of the user key corresponding to the value. This way, when Titan performs Garbage Collection (GC), it can query the user key and identify whether the corresponding value is outdated. However, this process introduces some write amplification. > + BlobFile supports compression at the blob record level. Titan supports multiple compression algorithms, such as [Snappy](https://github.com/google/snappy), [LZ4](https://github.com/lz4/lz4), and [Zstd](https://github.com/facebook/zstd). Currently, the default compression algorithm Titan uses is LZ4. +> + The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. ### TitanTableBuilder diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md index 6326751c64d4b..e253226a21db9 100644 --- a/ticdc/ticdc-sink-to-kafka.md +++ b/ticdc/ticdc-sink-to-kafka.md @@ -58,7 +58,7 @@ The following are descriptions of sink URI parameters and values that can be con | `max-message-bytes` | The maximum size of data that is sent to Kafka broker each time (optional, `10MB` by default). From v5.0.6 and v4.0.6, the default value has changed from `64MB` and `256MB` to `10MB`. | | `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default). This value must be greater than or equal to the value of [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) in Kafka. | | `required-acks` | A parameter used in the `Produce` request, which notifies the broker of the number of replica acknowledgements it needs to receive before responding. Value options are `0` (`NoResponse`: no response, only `TCP ACK` is provided), `1` (`WaitForLocal`: responds only after local commits are submitted successfully), and `-1` (`WaitForAll`: responds after all replicated replicas are committed successfully. You can configure the minimum number of replicated replicas using the [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) configuration item of the broker). (Optional, the default value is `-1`). | -| `compression` | The compression algorithm used when sending messages (value options are `none`, `lz4`, `gzip`, `snappy`, and `zstd`; `none` by default). | +| `compression` | The compression algorithm used when sending messages (value options are `none`, `lz4`, `gzip`, `snappy`, and `zstd`; `none` by default). Note that the Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported.| | `protocol` | The protocol with which messages are output to Kafka. The value options are `canal-json`, `open-protocol`, `canal`, `avro` and `maxwell`. | | `auto-create-topic` | Determines whether TiCDC creates the topic automatically when the `topic-name` passed in does not exist in the Kafka cluster (optional, `true` by default). | | `enable-tidb-extension` | Optional. `false` by default. When the output protocol is `canal-json`, if the value is `true`, TiCDC sends [WATERMARK events](/ticdc/ticdc-canal-json.md#watermark-event) and adds the [TiDB extension field](/ticdc/ticdc-canal-json.md#tidb-extension-field) to Kafka messages. From v6.1.0, this parameter is also applicable to the `avro` protocol. If the value is `true`, TiCDC adds [three TiDB extension fields](/ticdc/ticdc-avro-protocol.md#tidb-extension-fields) to the Kafka message. | diff --git a/tidb-lightning/tidb-lightning-data-source.md b/tidb-lightning/tidb-lightning-data-source.md index 5a7c8bf7dada3..47b58ac7a8a40 100644 --- a/tidb-lightning/tidb-lightning-data-source.md +++ b/tidb-lightning/tidb-lightning-data-source.md @@ -23,7 +23,7 @@ When TiDB Lightning is running, it looks for all files that match the pattern of | Schema file | Contains the `CREATE DATABASE` DDL statement| `${db_name}-schema-create.sql` | | Data file | If the data file contains data for a whole table, the file is imported into a table named `${db_name}.${table_name}` | \${db_name}.\${table_name}.\${csv\|sql\|parquet} | | Data file | If the data for a table is split into multiple data files, each data file must be suffixed with a number in its filename | \${db_name}.\${table_name}.001.\${csv\|sql\|parquet} | -| Compressed file | If the file contains a compression suffix, such as `gzip`, `snappy`, or `zstd`, TiDB Lightning will decompress the file before importing it. | \${db_name}.\${table_name}.\${csv\|sql\|parquet}.{compress} | +| Compressed file | If the file contains a compression suffix, such as `gzip`, `snappy`, or `zstd`, TiDB Lightning will decompress the file before importing it. Note that the Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. | \${db_name}.\${table_name}.\${csv\|sql\|parquet}.{compress} | TiDB Lightning processes data in parallel as much as possible. Because files must be read in sequence, the data processing concurrency is at the file level (controlled by `region-concurrency`). Therefore, when the imported file is large, the import performance is poor. It is recommended to limit the size of the imported file to no greater than 256 MiB to achieve the best performance. @@ -295,7 +295,8 @@ TiDB Lightning currently supports compressed files exported by Dumpling or compr > - Because TiDB Lightning cannot concurrently decompress a single large compressed file, the size of the compressed file affects the import speed. It is recommended that a source file is no greater than 256 MiB after decompression. > - TiDB Lightning only imports individually compressed data files and does not support importing a single compressed file with multiple data files included. > - TiDB Lightning does not support `parquet` files compressed through another compression tool, such as `db.table.parquet.snappy`. If you want to compress `parquet` files, you can configure the compression format for the `parquet` file writer. -> - TiDB Lightning v6.4.0 and later versions only support `.bak` files and the following compressed data files: `gzip`, `snappy`, and `zstd`. Other types of files cause errors. For those unsupported files, you need to modify the file names in advance, or move those files out of the import data directory to avoid such errors. +> - TiDB Lightning v6.4.0 and later versions only support the following compressed data files: `gzip`, `snappy`, and `zstd`. Other types of files cause errors. If an unsupported compressed file exists in the directory where the source data file is stored, this will cause the task to report an error. You can move those unsupported files out of the import data directory to avoid such errors. +> - The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. ## Match customized files diff --git a/tidb-lightning/troubleshoot-tidb-lightning.md b/tidb-lightning/troubleshoot-tidb-lightning.md index 3d29cd9272720..2cb040e945a0f 100644 --- a/tidb-lightning/troubleshoot-tidb-lightning.md +++ b/tidb-lightning/troubleshoot-tidb-lightning.md @@ -207,4 +207,8 @@ TiDB does not support all MySQL character sets. Therefore, TiDB Lightning report ### `invalid compression type ...` -- TiDB Lightning v6.4.0 and later versions only support `.bak` files and the following compressed data files: `gzip`, `snappy`, and `zstd`. Other types of files cause errors. For those unsupported files, you need to modify the file names in advance, or move those files out of the import data directory to avoid such errors. For more details, see [Compressed files](/tidb-lightning/tidb-lightning-data-source.md#compressed-files). +- TiDB Lightning v6.4.0 and later versions only support the following compressed data files: `gzip`, `snappy`, and `zstd`. Other types of compressed files cause errors. If an unsupported compressed file exists in the directory where the source data file is stored, this will cause the task to report an error. You can move those unsupported files out of the import data directory to avoid such errors. For more details, see [Compressed files](/tidb-lightning/tidb-lightning-data-source.md#compressed-files). + +> **Note:** +> +> The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 4051986971738..517b91ac9a124 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -1602,6 +1602,10 @@ Configuration items related to `rocksdb.defaultcf.titan`. + Optional values: `"no"`, `"snappy"`, `"zlib"`, `"bzip2"`, `"lz4"`, `"lz4hc"`, `"zstd"` + Default value: `"lz4"` +> **Note:** +> +> The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. + ### `blob-cache-size` + The cache size of a Blob file diff --git a/tune-tikv-memory-performance.md b/tune-tikv-memory-performance.md index ffb8d02184921..5ec3f97a7cad6 100644 --- a/tune-tikv-memory-performance.md +++ b/tune-tikv-memory-performance.md @@ -148,7 +148,7 @@ max-manifest-file-size = "20MB" block-size = "64KB" # The compaction mode of each layer of RocksDB data. The optional values include no, snappy, zlib, -# bzip2, lz4, lz4hc, and zstd. +# bzip2, lz4, lz4hc, and zstd. Note that the Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. # "no:no:lz4:lz4:lz4:zstd:zstd" indicates there is no compaction of level0 and level1; lz4 compaction algorithm is used # from level2 to level4; zstd compaction algorithm is used from level5 to level6. # "no" means no compaction. "lz4" is a compaction algorithm with moderate speed and compaction ratio. The