From cac4651033e40c2ff3d5473b5ecb9cc0a8f919e5 Mon Sep 17 00:00:00 2001 From: ZhouJinsong Date: Tue, 19 Sep 2023 19:30:08 +0800 Subject: [PATCH] [AMORO-1907]Correct and supplement documentation (#1995) * Modify docs for version 0.5.1 * fix some spelling errors * fix flink optimizer group configuration error * fix some spelling error --- docs/admin-guides/deployment.md | 62 +++++++++++++----- docs/admin-guides/managing-catalogs.md | 2 +- docs/admin-guides/managing-optimizers.md | 26 +++++--- docs/concepts/table-watermark.md | 4 +- docs/engines/flink/flink-ddl.md | 12 ++-- docs/engines/flink/flink-dml.md | 8 +-- docs/engines/flink/flink-get-started.md | 8 +-- docs/engines/spark/spark-conf.md | 2 +- docs/engines/spark/spark-ddl.md | 4 +- docs/engines/trino.md | 2 +- ...nce.png => freshness_cost_performance.png} | Bin docs/user-guides/cdc-ingestion.md | 4 +- docs/user-guides/configurations.md | 2 +- 13 files changed, 87 insertions(+), 49 deletions(-) rename docs/images/concepts/{fressness_cost_performance.png => freshness_cost_performance.png} (100%) diff --git a/docs/admin-guides/deployment.md b/docs/admin-guides/deployment.md index db8acb52c0..d2c050f0b9 100644 --- a/docs/admin-guides/deployment.md +++ b/docs/admin-guides/deployment.md @@ -23,13 +23,13 @@ You can choose to download the stable release package from [download page](../.. ## Download the distribution -All released package can be downaloded from [download page](../../download/). +All released package can be downloaded from [download page](../../download/). You can download amoro-x.y.z-bin.zip (x.y.z is the release number), and you can also download the runtime packages for each engine version according to the engine you are using. Unzip it to create the amoro-x.y.z directory in the same directory, and then go to the amoro-x.y.z directory. ## Source code compilation -You can build based on the master branch without compiling Trino. The compilation method and the directory of results are described below +You can build based on the master branch without compiling Trino. The compilation method and the directory of results are described below: ```shell git clone https://github.com/NetEase/amoro.git @@ -38,7 +38,7 @@ base_dir=$(pwd) mvn clean package -DskipTests -pl '!Trino' cd dist/target/ ls -amoro-x.y.z-bin.zip # AMS release pakcage +amoro-x.y.z-bin.zip # AMS release package dist-x.y.z-tests.jar dist-x.y.z.jar archive-tmp/ @@ -53,14 +53,14 @@ maven-archiver/ cd ${base_dir}/spark/v3.1/spark-runtime/target ls -amoro-spark-3.1-runtime-0.4.0.jar # Spark v3.1 runtime package) -amoro-spark-3.1-runtime-0.4.0-tests.jar -amoro-spark-3.1-runtime-0.4.0-sources.jar -original-amoro-spark-3.1-runtime-0.4.0.jar +amoro-spark-3.1-runtime-x.y.z.jar # Spark v3.1 runtime package) +amoro-spark-3.1-runtime-x.y.z-tests.jar +amoro-spark-3.1-runtime-x.y.z-sources.jar +original-amoro-spark-3.1-runtime-x.y.z.jar ``` -If you need to compile the Trino module at the same time, you need to install jdk17 locally and configure `toolchains.xml` in the user's ${user.home}/.m2/ directory, then run mvn -package -P toolchain to compile the entire project. +If you need to compile the Trino module at the same time, you need to install jdk17 locally and configure `toolchains.xml` in the user's `${user.home}/.m2/` directory, +then run `mvn package -P toolchain` to compile the entire project. ```xml @@ -80,14 +80,14 @@ package -P toolchain to compile the entire project. ## Configuration -If you want to use AMS in a production environment, it is recommended to modify `{ARCTIC_HOME}/conf/config.yaml` by referring to the following configuration steps. +If you want to use AMS in a production environment, it is recommended to modify `{AMORO_HOME}/conf/config.yaml` by referring to the following configuration steps. ### Configure the service address - The `ams.server-bind-host` configuration specifies the host to which AMS is bound. The default value, `0.0.0.0,` indicates binding to all network interfaces. -- The `ams.server-expose-host` configuration specifies the host exposed by AMS that the compute engine and optimizer use to connect to AMS. You can configure a specific IP address on the machine or an IP prefix. When AMS starts up, it will find the first host that matches this prefix. -- The `ams.thrift-server.table-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the table service. The compute engine accesses AMS through this port, and the default value is 1260. -- The `ams.thrift-server.optimizing-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the optimizing service. The optimizers accesses AMS through this port, and the default value is 1261. +- The `ams.server-expose-host` configuration specifies the host exposed by AMS that the computing engines and optimizers used to connect to AMS. You can configure a specific IP address on the machine, or an IP prefix. When AMS starts up, it will find the first host that matches this prefix. +- The `ams.thrift-server.table-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the table service. The computing engines access AMS through this port, and the default value is 1260. +- The `ams.thrift-server.optimizing-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the optimizing service. The optimizers access AMS through this port, and the default value is 1261. - The `ams.http-server.bind-port` configuration specifies the port to which the HTTP service is bound. The Dashboard and Open API are bound to this port, and the default value is 1630. ```yaml @@ -106,12 +106,12 @@ ams: ``` {{< hint info >}} -make sure the port is not used before configuring it +Make sure the port is not used before configuring it. {{< /hint >}} ### Configure system database -Users can use MySQL/PostgreSQL as the system database instead of Derby. +You can use MySQL/PostgreSQL as the system database instead of the default Derby. Create an empty database in MySQL/PostgreSQL, then AMS will automatically create table structures in this MySQL/PostgreSQL database when it first started. @@ -150,7 +150,7 @@ ams: zookeeper-address: 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183 # ZooKeeper server address. ``` -### Configure containers +### Configure optimizer containers To scale out the optimizer through AMS, container configuration is required. If you choose to manually start an external optimizer, no additional container configuration is required. AMS will initialize a container named `external` by default to store all externally started optimizers. @@ -204,4 +204,32 @@ You can also restart/stop AMS with the following command: ```shell bin/ams.sh restart/stop -``` \ No newline at end of file +``` + +## Upgrade AMS + +### Upgrade system databases + +You can find all the upgrade SQL scripts under `{ARCTIC_HOME}/conf/mysql/` with name pattern `upgrade-a.b.c-to-x.y.z.sql`. +Execute the upgrade SQL scripts one by one to your system database based on your starting and target versions. + +### Replace all libs and plugins + +Replace all contents in the original `{ARCTIC_HOME}/lib` directory with the contents in the lib directory of the new installation package. +Replace all contents in the original `{ARCTIC_HOME}/plugin` directory with the contents in the plugin directory of the new installation package. + +{{< hint info >}} +Backup the old content before replacing it, so that you can roll back the upgrade operation if necessary. +{{< /hint >}} + +### Configure new parameters + +The old configuration file `{ARCTIC_HOME}/conf/config.yaml` is usually compatible with the new version, but the new version may introduce new parameters. Try to compare the configuration files of the old and new versions, and reconfigure the parameters if necessary. + +### Restart AMS + +Restart AMS with the following commands: +```shell +bin/ams.sh restart +``` + diff --git a/docs/admin-guides/managing-catalogs.md b/docs/admin-guides/managing-catalogs.md index 25429cabeb..de53b6488a 100644 --- a/docs/admin-guides/managing-catalogs.md +++ b/docs/admin-guides/managing-catalogs.md @@ -54,7 +54,7 @@ Common properties include: We recommend users to create a Catalog following the guidelines below: - If you want to use it in conjunction with HMS, choose `External Catalog` for the `Type` and `Hive Metastore` for the `Metastore`, and choose the table format based on your needs, Mixed-Hive or Iceberg. -- If you want to use Mixed-Iceberg provided by amoro, choose `Internal Catalog` for the `Type` and `Mixed-Iceberg` for the table format. +- If you want to use Mixed-Iceberg provided by Amoro, choose `Internal Catalog` for the `Type` and `Mixed-Iceberg` for the table format. ## Delete catalog When a user needs to delete a Catalog, they can go to the details page of the Catalog and click the Remove button at the bottom of the page to perform the deletion. diff --git a/docs/admin-guides/managing-optimizers.md b/docs/admin-guides/managing-optimizers.md index 40f8f13494..51944b6048 100644 --- a/docs/admin-guides/managing-optimizers.md +++ b/docs/admin-guides/managing-optimizers.md @@ -17,10 +17,10 @@ The optimizer is the execution unit for performing self-optimizing tasks on a ta * Optimizer: The specific unit that performs optimizing tasks, usually with multiple concurrent units. ## Optimizer container -Before using self-optimizing, you need to configure the container information in the configuration file. Opimizer container represents a specific set of runtime environment configuration, and the scheduling scheme of optimizer in that runtime environment. container includes three types: flink, local, and external. +Before using self-optimizing, you need to configure the container information in the configuration file. Optimizer container represents a specific set of runtime environment configuration, and the scheduling scheme of optimizer in that runtime environment. container includes three types: flink, local, and external. ### Local container -Local conatiner is a way to start Optimizer by local process and supports multi-threaded execution of Optimizer tasks. It is recommended to be used only in demo or local deployment scenarios. If the environment variable for jdk is not configured, the user can configure java_home to point to the jdk root directory. If already configured, this configuration item can be ignored. +Local container is a way to start Optimizer by local process and supports multi-threaded execution of Optimizer tasks. It is recommended to be used only in demo or local deployment scenarios. If the environment variable for jdk is not configured, the user can configure java_home to point to the jdk root directory. If already configured, this configuration item can be ignored. ```yaml containers: @@ -42,8 +42,8 @@ in the "export.{env_arg}" property of the container's properties. The commonly u with the hadoop compatible package flink-shaded-hadoop-2-uber-x.y.z.jar, you need to download it and copy it to the FLINK_HOME/lib directory. The flink-shaded-hadoop-2-uber-2.7.5-10.0.jar is generally sufficient and can be downloaded at: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-10.0/flink-shaded-hadoop-2-uber-2.7.5-10.0.jar -- HADOOP_CONF_DIR, which holds the configuration files for the hadoop cluster (including hdfs-site.xml, core-site.xml, yarn-site.xml ). If the hadoop cluster has kerberos authentication enabled, you need to prepare an additional krb5.conf and a keytab file for the user to submit tasks -- JVM_ARGS, you can configure flink to run additional configuration parameters, here is an example of configuring krb5.conf, specify the address of krb5.conf to be used by Flink when committing via -Djava.security.krb5.conf=/opt/krb5.conf +- HADOOP_CONF_DIR, which holds the configuration files for the hadoop cluster (including hdfs-site.xml, core-site.xml, yarn-site.xml ). If the hadoop cluster has kerberos authentication enabled, you need to prepare an additional `krb5.conf` and a keytab file for the user to submit tasks +- JVM_ARGS, you can configure flink to run additional configuration parameters, here is an example of configuring krb5.conf, specify the address of krb5.conf to be used by Flink when committing via `-Djava.security.krb5.conf=/opt/krb5.conf` - HADOOP_USER_NAME, the username used to submit tasks to yarn - FLINK_CONF_DIR, the directory where flink_conf.yaml is located @@ -87,9 +87,16 @@ The optimizer group supports the following properties: | Property | Container type | Required | Default | Description | |---------------------|----------------|----------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | scheduling-policy | All | No | quota | The scheduler group scheduling policy, the default value is `quota`, it will be scheduled according to the quota resources configured for each table, the larger the table quota is, the more optimizer resources it can take. There is also a configuration `balanced` that will balance the scheduling of each table, the longer the table has not been optimized, the higher the scheduling priority will be. | -| flink-conf.* | flink | No | N/A | Any configuration for `flink on yarn` mode, like `flink-conf.taskmanager.memory.process.size` or `flink-conf.jobmanager.memory.process.size`. The value in `conf/flink-conf.yaml` will be used if not setted here. You can find more supported property in [Flink Configuration](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/) | +| flink-conf.* | flink | No | N/A | Any configuration for `flink on yarn` mode, like `flink-conf.taskmanager.memory.process.size` or `flink-conf.jobmanager.memory.process.size`. The value in `conf/flink-conf.yaml` will be used if not set here. You can find more supported property in [Flink Configuration](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/) | | memory | local | Yes | N/A | The memory size of the local optimizer Java process. | +{{< hint info >}} +To better utilize the resources of Flink Optimizer, it is recommended to add the following configuration to the Flink Optimizer Group: +* Set `flink-conf.taskmanager.memory.managed.size` to `32mb` as Flink optimizer does not have any computation logic, it does not need to occupy managed memory. +* Set `flink-conf.taskmanager.memory.netwrok.max` to `32mb` as there is no need for communication between operators in Flink Optimizer. +* Set `flink-conf.taskmanager.memory.netwrok.nin` to `32mb` as there is no need for communication between operators in Flink Optimizer. +{{< /hint >}} + ### Edit optimizer group You can click the `edit` button on the `Optimizer Groups` page to modify the configuration of the Optimizer group. @@ -115,7 +122,7 @@ You can click the `Release` button on the `Optimizer` page to release the optimi ![release optimizer](../images/admin/optimizer_release.png) {{< hint info >}} -Currently, only pptimizer scaled through the dashboard can be released on dashboard. +Currently, only optimizer scaled through the dashboard can be released on dashboard. {{< /hint >}} ### Deploy external optimizer @@ -124,8 +131,11 @@ You can submit optimizer in your own Flink task development platform or local Fl ```shell ./bin/flink run-application -t yarn-application \ - -Djobmanager.memory.process.size=1024m \ - -Dtaskmanager.memory.process.size=2048m \ + -Djobmanager.memory.process.size=1024mb \ + -Dtaskmanager.memory.process.size=2048mb \ + -Dtaskmanager.memory.managed.size=32mb \ + -Dtaskmanager.memory.network.max=32mb \ + -Dtaskmanager.memory.network.min=32mb \ -c com.netease.arctic.optimizer.flink.FlinkOptimizer \ ${ARCTIC_HOME}/plugin/optimize/OptimizeJob.jar \ -a 127.0.0.1:1261 \ diff --git a/docs/concepts/table-watermark.md b/docs/concepts/table-watermark.md index f5ffe020b5..2c4ad010f4 100644 --- a/docs/concepts/table-watermark.md +++ b/docs/concepts/table-watermark.md @@ -18,7 +18,7 @@ However, in high-freshness streaming data warehouses, massive small files and fr freshness, the greater the impact on performance. To achieve the required performance, users must incur higher costs. Thus, for streaming data warehouses, data freshness, query performance, and cost form a tripartite paradox. -Fressness, cost and performance +Freshness, cost and performance Amoro offers a resolution to the tripartite paradox for users by utilizing AMS management functionality and a self-optimizing mechanism. Unlike traditional data warehouses, Lakehouse tables are utilized in a multitude of data pipelines, AI, and BI scenarios. Measuring data freshness is @@ -58,4 +58,4 @@ greater flexibility: SHOW TBLPROPERTIES test_db.test_log_store ('watermark.base'); ``` -You can learn about how to use Watermark in detail by referring to [Managing tables](../managing-tables/). \ No newline at end of file +You can learn about how to use Watermark in detail by referring to [Managing tables](../using-tables/). \ No newline at end of file diff --git a/docs/engines/flink/flink-ddl.md b/docs/engines/flink/flink-ddl.md index ed6e8448b8..4d0a4743b2 100644 --- a/docs/engines/flink/flink-ddl.md +++ b/docs/engines/flink/flink-ddl.md @@ -189,16 +189,16 @@ Not supported at the moment | BIGINT | BIGINT | | FLOAT | FLOAT | | DOUBLE | DOUBLE | -| DECIAML(p, s) | DECIAML(p, s) | +| DECIMAL(p, s) | DECIMAL(p, s) | | DATE | DATE | | TIMESTAMP(6) | TIMESTAMP | -| VARBINARY | BYNARY | +| VARBINARY | BINARY | | ARRAY | ARRAY | | MAP | MAP | | ROW | STRUCT | -### Mixed-Iceberg daata types +### Mixed-Iceberg data types | Flink Data Type | Mixed-Iceberg Data Type | |-----------------------------------|-------------------------| | CHAR(p) | STRING | @@ -211,13 +211,13 @@ Not supported at the moment | BIGINT | LONG | | FLOAT | FLOAT | | DOUBLE | DOUBLE | -| DECIAML(p, s) | DECIAML(p, s) | +| DECIMAL(p, s) | DECIMAL(p, s) | | DATE | DATE | | TIMESTAMP(6) | TIMESTAMP | -| TIMESTAMP(6) WITH LCOAL TIME ZONE | TIMESTAMPTZ | +| TIMESTAMP(6) WITH LOCAL TIME ZONE | TIMESTAMPTZ | | BINARY(p) | FIXED(p) | | BINARY(16) | UUID | -| VARBINARY | BYNARY | +| VARBINARY | BINARY | | ARRAY | ARRAY | | MAP | MAP | | ROW | STRUCT | diff --git a/docs/engines/flink/flink-dml.md b/docs/engines/flink/flink-dml.md index 826d2afd84..94f00e3fc5 100644 --- a/docs/engines/flink/flink-dml.md +++ b/docs/engines/flink/flink-dml.md @@ -87,7 +87,7 @@ The following Hint Options are supported: | properties.pulsar.admin.adminUrl | (none) | String | Required if LogStore is pulsar, otherwise not required | Pulsar admin 的 HTTP URL,如:http://my-broker.example.com:8080 | | properties.* | (none) | String | No | Parameters for Logstore:
For Logstore with Kafka ('log-store.type'='kafka' default value), all other parameters supported by the Kafka Consumer can be set by prefixing properties. to the parameter name, for example, 'properties.batch.size'='16384'. The complete parameter information can be found in the [Kafka official documentation](https://kafka.apache.org/documentation/#consumerconfigs);
For LogStore set to Pulsar ('log-store.type'='pulsar'), all relevant configurations supported by Pulsar can be set by prefixing properties. to the parameter name, for example: 'properties.pulsar.client.requestTimeoutMs'='60000'. For complete parameter information, refer to the [Flink-Pulsar-Connector documentation](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/pulsar) | | log.consumer.changelog.modes | all-kinds | String | No | The type of RowKind that will be generated when reading log data, supports: all-kinds, append-only.
all-kinds: will read cdc data, including +I/-D/-U/+U;
append-only: will only generate Insert data, recommended to use this configuration when reading without primary key. | -| log-store.kafka.compatible.enabled | false | Boolean | No | Compatible with LogStore Kafka's deprecated Source API; this parameter must be set to true when Flink tasks reading logstore kafka need to be upgraded to Amoro version 0.4.1 and above with state, otherwise a state incompatibility exception will be encountered. | +| log-store.kafka.compatible.enabled | false | Boolean | No | Compatible with LogStore Kafka's deprecated Source API; this parameter must be set to true when Flink tasks reading Logstore kafka need to be upgraded to Amoro version 0.4.1 and above with state, otherwise a state incompatibility exception will be encountered. | > **Notes** > @@ -175,10 +175,10 @@ Hint Options | Key | Default Value | Type | Required | Description | |--------------------------------------------------|---------------|----------|----------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| arctic.emit.mode | auto | String | No | Data writing modes currently supported are: file, log, and auto. For example: 'file' means data is only written to the filestore. 'log' means data is only written to the logstore. 'file,log' means data is written to both the filestore and the logstore. 'auto' means data is written only to the filestore if the logstore for the Amoro table is disabled. If the logstore for the Amoro table is enabled, it means data is written to both the filestore and the logstore. It is recommended to use 'auto'. | -| arctic.emit.auto-write-to-logstore.watermark-gap | (none) | Duration | No | This feature is only enabled when 'arctic.emit.mode'='auto'. If the watermark of the Amoro writers is greater than the current system timestamp minus a specific value, the writers will also write data to the logstore. The default setting is to enable the logstore writer immediately after the job starts. The value for this feature must be greater than 0. | +| arctic.emit.mode | auto | String | No | Data writing modes currently supported are: file, log, and auto. For example: 'file' means data is only written to the Filestore. 'log' means data is only written to the Logstore. 'file,log' means data is written to both the Filestore and the Logstore. 'auto' means data is written only to the Filestore if the Logstore for the Amoro table is disabled. If the Logstore for the Amoro table is enabled, it means data is written to both the Filestore and the Logstore. It is recommended to use 'auto'. | +| arctic.emit.auto-write-to-logstore.watermark-gap | (none) | Duration | No | This feature is only enabled when 'arctic.emit.mode'='auto'. If the watermark of the Amoro writers is greater than the current system timestamp minus a specific value, the writers will also write data to the Logstore. The default setting is to enable the Logstore writer immediately after the job starts. The value for this feature must be greater than 0. | | log.version | v1 | String | No | The log data format currently has only one version, so it can be left empty | -| sink.parallelism | (none) | String | No | The parallelism for writing to the filestore and logstore is determined separately. The parallelism for submitting the file operator is always 1. | +| sink.parallelism | (none) | String | No | The parallelism for writing to the Filestore and Logstore is determined separately. The parallelism for submitting the file operator is always 1. | | write.distribution-mode | hash | String | No | The distribution modes for writing to the Amoro table include: none and hash. | | write.distribution.hash-mode | auto | String | No | The hash strategy for writing to an Amoro table only takes effect when write.distribution-mode=hash. The available options are: primary-key, partition-key, primary-partition-key, and auto. primary-key: Shuffle by primary key partition-key: Shuffle by partition key primary-partition-key: Shuffle by primary key and partition key auto: If the table has both a primary key and partitions, use primary-partition-key; if the table has a primary key but no partitions, use primary-key; if the table has partitions but no primary key, use partition-key. Otherwise, use none. | | properties.pulsar.admin.adminUrl | (none) | String | If the LogStore is Pulsar and it is required for querying, it must be filled in, otherwise it can be left empty. | The HTTP URL for Pulsar Admin is in the format: http://my-broker.example.com:8080. | diff --git a/docs/engines/flink/flink-get-started.md b/docs/engines/flink/flink-get-started.md index ddf621a3c8..8fdeef56a7 100644 --- a/docs/engines/flink/flink-get-started.md +++ b/docs/engines/flink/flink-get-started.md @@ -34,9 +34,9 @@ Version Description: | Connector Version | Flink Version | Dependent Iceberg Version | | ----------------- |---------------| ----------------- | -| 0.5.0 | 1.12.x | 1.1.0 | -| 0.5.0 | 1.14.x | 1.1.0 | -| 0.5.0 | 1.15.x | 1.1.0 | +| 0.5.0 | 1.12.x | 1.3.0 | +| 0.5.0 | 1.14.x | 1.3.0 | +| 0.5.0 | 1.15.x | 1.3.0 | The Amoro project can be self-compiled to obtain the runtime jar. @@ -66,7 +66,7 @@ tar -zxvf flink-${FLINK_VERSION}-bin-scala_2.12.tgz cd flink-${FLINK_VERSION} # Download Flink Hadoop dependency wget ${FLINK_CONNECTOR_URL}/flink-shaded-hadoop-2-uber/${HADOOP_VERSION}-10.0/flink-shaded-hadoop-2-uber-${HADOOP_VERSION}-10.0.jar -# Download Flink Aoro Connector +# Download Flink Amoro Connector wget ${AMORO_CONNECTOR_URL}/amoro-flink-runtime-${FLINK_MAJOR_VERSION}/${AMORO_VERSION}/amoro-flink-runtime-${FLINK_MAJOR_VERSION}-${AMORO_VERSION}.jar # Copy the necessary JAR files to the lib directory diff --git a/docs/engines/spark/spark-conf.md b/docs/engines/spark/spark-conf.md index 0f279c4845..18f165ed60 100644 --- a/docs/engines/spark/spark-conf.md +++ b/docs/engines/spark/spark-conf.md @@ -1,5 +1,5 @@ --- -title: "Spark Conriguration" +title: "Spark Configuration" url: spark-configuration aliases: - "spark/configuration" diff --git a/docs/engines/spark/spark-ddl.md b/docs/engines/spark/spark-ddl.md index 53ae2b6331..3b9acbecef 100644 --- a/docs/engines/spark/spark-ddl.md +++ b/docs/engines/spark/spark-ddl.md @@ -66,8 +66,8 @@ Supported transformations are: * years(ts): partition by year * months(ts): partition by month -* days(ts) or date(ts): equivalent to dateint partitioning -* hours(ts) or date_hour(ts): equivalent to dateint and hour partitioning +* days(ts) or date(ts): equivalent to dating partitioning +* hours(ts) or date_hour(ts): equivalent to dating and hour partitioning * bucket(N, col): partition by hashed value mod N buckets * truncate(L, col): partition by value truncated to L diff --git a/docs/engines/trino.md b/docs/engines/trino.md index 38761e65aa..1e4fcabe13 100644 --- a/docs/engines/trino.md +++ b/docs/engines/trino.md @@ -44,7 +44,7 @@ SELECT * FROM "{TABLE_NAME}" #### Query BaseStore of Table Directly querying the BaseStore in a table with a primary key is supported. The BaseStore stores the stock data of the table, which is usually generated by batch job or optimization. -The queried data is static and the query efficiency is very high, but the timeliness is not good. The syntax is as follows: +The queried data is static, and the query efficiency is very high, but the timeliness is not good. The syntax is as follows: ```sql SELECT * FROM "{TABLE_NAME}#BASE" diff --git a/docs/images/concepts/fressness_cost_performance.png b/docs/images/concepts/freshness_cost_performance.png similarity index 100% rename from docs/images/concepts/fressness_cost_performance.png rename to docs/images/concepts/freshness_cost_performance.png diff --git a/docs/user-guides/cdc-ingestion.md b/docs/user-guides/cdc-ingestion.md index 5e970e6d16..2ef2ef6f97 100644 --- a/docs/user-guides/cdc-ingestion.md +++ b/docs/user-guides/cdc-ingestion.md @@ -18,7 +18,7 @@ The following example will show how MySQL CDC data is written to an Iceberg tabl **Requirements** -Please add [Flink Connector MySQL CDC](https://repo1.maven.org/maven2/com/ververica/flink-connector-mysql-cdc/2.3.0/flink-connector-mysql-cdc-2.3.0.jar) and [Iceberg](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-1.14/1.1.0/iceberg-flink-1.14-1.1.0.jar) Jars to the lib directory of the Flink engine package. +Please add [Flink SQL Connector MySQL CDC](https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.3.0/flink-connector-mysql-cdc-2.3.0.jar) and [Iceberg](https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-1.14/1.1.0/iceberg-flink-1.14-1.1.0.jar) Jars to the lib directory of the Flink engine package. ```sql CREATE TABLE products ( @@ -58,7 +58,7 @@ The following example will show how MySQL CDC data is written to a Mixed-Iceberg **Requirements** -Please add [Flink Connector MySQL CDC](https://repo1.maven.org/maven2/com/ververica/flink-connector-mysql-cdc/2.3.0/flink-connector-mysql-cdc-2.3.0.jar) and [Amoro](../../../download/) Jars to the lib directory of the Flink engine package. +Please add [Flink SQL Connector MySQL CDC](https://repo1.maven.org/maven2/com/ververica/flink-sql-connector-mysql-cdc/2.3.0/flink-connector-mysql-cdc-2.3.0.jar) and [Amoro](../../../download/) Jars to the lib directory of the Flink engine package. ```sql CREATE TABLE products ( diff --git a/docs/user-guides/configurations.md b/docs/user-guides/configurations.md index 4adbbead2d..676508edb1 100644 --- a/docs/user-guides/configurations.md +++ b/docs/user-guides/configurations.md @@ -54,7 +54,7 @@ Data-cleaning configurations are applicable to both Iceberg Format and Mixed str | snapshot.base.keep.minutes | 720(12 hours) | Table-Expiration keeps the latest snapshots of BaseStore within a specified time in minutes | | clean-orphan-file.enabled | false | Enables periodically clean orphan files | | clean-orphan-file.min-existing-time-minutes | 2880(2 days) | Cleaning orphan files keeps the files modified within a specified time in minutes | -| clean-dangling-delete-files.enabled | true | Whether to enable cleaning of dangling delete files | +| clean-dangling-delete-files.enabled | true | Whether to enable cleaning of dangling delete files | ## Mixed Format configurations