apache · zhoujinsong · Sep 19, 2023 · Sep 18, 2023 · Sep 18, 2023 · Sep 19, 2023
diff --git a/docs/admin-guides/deployment.md b/docs/admin-guides/deployment.md
@@ -23,13 +23,13 @@ You can choose to download the stable release package from [download page](../..
 
 ## Download the distribution
 
-All released package can be downaloded from [download page](../../download/).
+All released package can be downloaded from [download page](../../download/).
 You can download amoro-x.y.z-bin.zip (x.y.z is the release number), and you can also download the runtime packages for each engine version according to the engine you are using.
 Unzip it to create the amoro-x.y.z directory in the same directory, and then go to the amoro-x.y.z directory.
 
 ## Source code compilation
 
-You can build based on the master branch without compiling Trino. The compilation method and the directory of results are described below
+You can build based on the master branch without compiling Trino. The compilation method and the directory of results is described below
 
 ```shell
 git clone https://github.com/NetEase/amoro.git
@@ -38,7 +38,7 @@ base_dir=$(pwd)
 mvn clean package -DskipTests -pl '!Trino'
 cd dist/target/
 ls
-amoro-x.y.z-bin.zip # AMS release pakcage
+amoro-x.y.z-bin.zip # AMS release package
 dist-x.y.z-tests.jar
 dist-x.y.z.jar
 archive-tmp/
@@ -53,14 +53,14 @@ maven-archiver/
 
 cd ${base_dir}/spark/v3.1/spark-runtime/target
 ls
-amoro-spark-3.1-runtime-0.4.0.jar # Spark v3.1 runtime package)
-amoro-spark-3.1-runtime-0.4.0-tests.jar
-amoro-spark-3.1-runtime-0.4.0-sources.jar
-original-amoro-spark-3.1-runtime-0.4.0.jar
+amoro-spark-3.1-runtime-x.y.z.jar # Spark v3.1 runtime package)
+amoro-spark-3.1-runtime-x.y.z-tests.jar
+amoro-spark-3.1-runtime-x.y.z-sources.jar
+original-amoro-spark-3.1-runtime-x.y.z.jar
 ```
 
-If you need to compile the Trino module at the same time, you need to install jdk17 locally and configure `toolchains.xml` in the user's ${user.home}/.m2/ directory, then run mvn
-package -P toolchain to compile the entire project.
+If you need to compile the Trino module at the same time, you need to install jdk17 locally and configure `toolchains.xml` in the user's `${user.home}/.m2/` directory,
+then run `mvn package -P toolchain` to compile the entire project.
 
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
@@ -80,14 +80,14 @@ package -P toolchain to compile the entire project.
 
 ## Configuration
 
-If you want to use AMS in a production environment, it is recommended to modify `{ARCTIC_HOME}/conf/config.yaml` by referring to the following configuration steps.
+If you want to use AMS in a production environment, it is recommended to modify `{AMORO_HOME}/conf/config.yaml` by referring to the following configuration steps.
 
 ### Configure the service address
 
 - The `ams.server-bind-host` configuration specifies the host to which AMS is bound. The default value, `0.0.0.0,` indicates binding to all network interfaces.
-- The `ams.server-expose-host` configuration specifies the host exposed by AMS that the compute engine and optimizer use to connect to AMS. You can configure a specific IP address on the machine or an IP prefix. When AMS starts up, it will find the first host that matches this prefix.
-- The `ams.thrift-server.table-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the table service. The compute engine accesses AMS through this port, and the default value is 1260.
-- The `ams.thrift-server.optimizing-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the optimizing service. The optimizers accesses AMS through this port, and the default value is 1261.
+- The `ams.server-expose-host` configuration specifies the host exposed by AMS that the computing engines and optimizers used to connect to AMS. You can configure a specific IP address on the machine, or an IP prefix. When AMS starts up, it will find the first host that matches this prefix.
+- The `ams.thrift-server.table-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the table service. The computing engines access AMS through this port, and the default value is 1260.
+- The `ams.thrift-server.optimizing-service.bind-port` configuration specifies the binding port of the Thrift Server that provides the optimizing service. The optimizers access AMS through this port, and the default value is 1261.
 - The `ams.http-server.bind-port` configuration specifies the port to which the HTTP service is bound. The Dashboard and Open API are bound to this port, and the default value is 1630.
 
 ```yaml
@@ -106,12 +106,12 @@ ams:
 ```
 
 {{< hint info >}}
-make sure the port is not used before configuring it
+Make sure the port is not used before configuring it.
 {{< /hint >}}
 
 ### Configure system database
 
-Users can use MySQL/PostgreSQL as the system database instead of Derby. 
+You can use MySQL/PostgreSQL as the system database instead of the default Derby. 
 
 Create an empty database in MySQL/PostgreSQL, then AMS will automatically create table structures in this MySQL/PostgreSQL database when it first started.
 
@@ -150,7 +150,7 @@ ams:
     zookeeper-address: 127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183 # ZooKeeper server address.
 ```
 
-### Configure containers
+### Configure optimizer containers
 
 To scale out the optimizer through AMS, container configuration is required. 
 If you choose to manually start an external optimizer, no additional container configuration is required. AMS will initialize a container named `external` by default to store all externally started optimizers.
@@ -204,4 +204,32 @@ You can also restart/stop AMS with the following command:
 
 ```shell
 bin/ams.sh restart/stop
-```
+```
+
+## Upgrade AMS
+
+### Upgrade system databases
+
+You can find all the upgrade SQL scripts under `{ARCTIC_HOME}/conf/mysql/` with name pattern `upgrade-a.b.c-to-x.y.z.sql`.
+Execute the upgrade SQL scripts one by one to your system database based on your starting and target versions.
+
+### Replace all libs and plugins
+
+Replace all contents in the original `{ARCTIC_HOME}/lib` directory with the contents in the lib directory of the new installation package.
+Replace all contents in the original `{ARCTIC_HOME}/plugin` directory with the contents in the plugin directory of the new installation package.
+
+{{< hint info >}}
+Backup the old content before replacing it, so that you can roll back the upgrade operation if necessary.
+{{< /hint >}}
+
+### Configure new parameters
+
+The old configuration file `{ARCTIC_HOME}/conf/config.yaml` is usually compatible with the new version, but the new version may introduce new parameters. Try to compare the configuration files of the old and new versions, and reconfigure the parameters if necessary.
+
+### Restart AMS
+
+Restart AMS with the following commands:
+```shell
+bin/ams.sh restart
+```
+
diff --git a/docs/admin-guides/managing-catalogs.md b/docs/admin-guides/managing-catalogs.md
@@ -54,7 +54,7 @@ Common properties include:
 We recommend users to create a Catalog following the guidelines below：
 
 - If you want to use it in conjunction with HMS, choose `External Catalog` for the `Type` and `Hive Metastore` for the `Metastore`, and choose the table format based on your needs, Mixed-Hive or Iceberg.
-- If you want to use Mixed-Iceberg provided by amoro, choose `Internal Catalog` for the `Type` and `Mixed-Iceberg` for the table format.
+- If you want to use Mixed-Iceberg provided by Amoro, choose `Internal Catalog` for the `Type` and `Mixed-Iceberg` for the table format.
 
 ## Delete catalog
 When a user needs to delete a Catalog, they can go to the details page of the Catalog and click the Remove button at the bottom of the page to perform the deletion.

diff --git a/docs/admin-guides/managing-optimizers.md b/docs/admin-guides/managing-optimizers.md
@@ -17,10 +17,10 @@ The optimizer is the execution unit for performing self-optimizing tasks on a ta
 * Optimizer: The specific unit that performs optimizing tasks, usually with multiple concurrent units.
 
 ## Optimizer container
-Before using self-optimizing, you need to configure the container information in the configuration file. Opimizer container represents a specific set of runtime environment configuration, and the scheduling scheme of optimizer in that runtime environment. container includes three types: flink, local, and external.
+Before using self-optimizing, you need to configure the container information in the configuration file. Optimizer container represents a specific set of runtime environment configuration, and the scheduling scheme of optimizer in that runtime environment. container includes three types: flink, local, and external.
 
 ### Local container
-Local conatiner is a way to start Optimizer by local process and supports multi-threaded execution of Optimizer tasks. It is recommended to be used only in demo or local deployment scenarios. If the environment variable for jdk is not configured, the user can configure java_home to point to the jdk root directory. If already configured, this configuration item can be ignored.
+Local container is a way to start Optimizer by local process and supports multi-threaded execution of Optimizer tasks. It is recommended to be used only in demo or local deployment scenarios. If the environment variable for jdk is not configured, the user can configure java_home to point to the jdk root directory. If already configured, this configuration item can be ignored.
 
 ```yaml
 containers:
@@ -42,8 +42,8 @@ in the "export.{env_arg}" property of the container's properties. The commonly u
   with the hadoop compatible package flink-shaded-hadoop-2-uber-x.y.z.jar, you need to download it and copy it to the
   FLINK_HOME/lib directory. The flink-shaded-hadoop-2-uber-2.7.5-10.0.jar is generally sufficient and can be downloaded
   at: https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.7.5-10.0/flink-shaded-hadoop-2-uber-2.7.5-10.0.jar
-- HADOOP_CONF_DIR, which holds the configuration files for the hadoop cluster (including hdfs-site.xml, core-site.xml, yarn-site.xml ). If the hadoop cluster has kerberos authentication enabled, you need to prepare an additional krb5.conf and a keytab file for the user to submit tasks
-- JVM_ARGS, you can configure flink to run additional configuration parameters, here is an example of configuring krb5.conf, specify the address of krb5.conf to be used by Flink when committing via -Djava.security.krb5.conf=/opt/krb5.conf
+- HADOOP_CONF_DIR, which holds the configuration files for the hadoop cluster (including hdfs-site.xml, core-site.xml, yarn-site.xml ). If the hadoop cluster has kerberos authentication enabled, you need to prepare an additional `krb5.conf` and a keytab file for the user to submit tasks
+- JVM_ARGS, you can configure flink to run additional configuration parameters, here is an example of configuring krb5.conf, specify the address of krb5.conf to be used by Flink when committing via `-Djava.security.krb5.conf=/opt/krb5.conf`
 - HADOOP_USER_NAME, the username used to submit tasks to yarn
 - FLINK_CONF_DIR, the directory where flink_conf.yaml is located
 
@@ -87,9 +87,16 @@ The optimizer group supports the following properties:
 | Property            | Container type | Required | Default | Description                                                                                                                                                                                                                                                                                                                                                                                                      |
 |---------------------|----------------|----------|---------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | scheduling-policy   | All | No | quota | The scheduler group scheduling policy, the default value is `quota`, it will be scheduled according to the quota resources configured for each table, the larger the table quota is, the more optimizer resources it can take. There is also a configuration `balanced` that will balance the scheduling of each table, the longer the table has not been optimized, the higher the scheduling priority will be. |
-| flink-conf.*   | flink | No | N/A | Any configuration for `flink on yarn` mode, like `flink-conf.taskmanager.memory.process.size` or `flink-conf.jobmanager.memory.process.size`. The value in `conf/flink-conf.yaml` will be used if not setted here. You can find more supported property in [Flink Configuration](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)                                                                                                                                                                                                                                     |
+| flink-conf.*   | flink | No | N/A | Any configuration for `flink on yarn` mode, like `flink-conf.taskmanager.memory.process.size` or `flink-conf.jobmanager.memory.process.size`. The value in `conf/flink-conf.yaml` will be used if not set here. You can find more supported property in [Flink Configuration](https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/)                                                                                                                                                                                                                                     |
 | memory   | local | Yes | N/A | The memory size of the local optimizer Java process.                                                                                                                                                                                                                                                                                                                                                             |
 
+{{< hint info >}}
+To better utilize the resources of Flink Optimizer, it is recommended to add the following configuration to the Flink Optimizer Group:
+* Set `flink-conf.taskmanager.memory.managed.size` to `32mb` as Flink optimizer does not have any computation logic, it does not need to occupy managed memory.
+* Set `flink-conf.taskmanager.memory.netwrok.max` to `32mb` as there is no need for communication between operators in Flink Optimizer.
+* Set `flink-conf.taskmanager.memory.netwrok.nin` to `32mb` as there is no need for communication between operators in Flink Optimizer.
+{{< /hint >}}
+
 ### Edit optimizer group
 
 You can click the `edit` button on the `Optimizer Groups` page to modify the configuration of the Optimizer group.
@@ -115,7 +122,7 @@ You can click the `Release` button on the `Optimizer` page to release the optimi
 ![release optimizer](../images/admin/optimizer_release.png)
 
 {{< hint info >}}
-Currently, only pptimizer scaled through the dashboard can be released on dashboard.
+Currently, only optimizer scaled through the dashboard can be released on dashboard.
 {{< /hint >}}
 
 ### Deploy external optimizer
@@ -124,8 +131,11 @@ You can submit optimizer in your own Flink task development platform or local Fl
 
 ```shell
 ./bin/flink run-application -t yarn-application \
- -Djobmanager.memory.process.size=1024m \
- -Dtaskmanager.memory.process.size=2048m \
+ -Djobmanager.memory.process.size=1024mb \
+ -Dtaskmanager.memory.process.size=2048mb \
+ -Dtaskmanager.memory.managed.size=32mb \
+ -Dtaskmanager.memory.network.max=32mb \
+ -Dtaskmanager.memory.network.min=32mb \
  -c com.netease.arctic.optimizer.flink.FlinkOptimizer \
  ${ARCTIC_HOME}/plugin/optimize/OptimizeJob.jar \
  -a 127.0.0.1:1261 \

diff --git a/docs/concepts/table-watermark.md b/docs/concepts/table-watermark.md
@@ -18,7 +18,7 @@ However, in high-freshness streaming data warehouses, massive small files and fr
 freshness, the greater the impact on performance. To achieve the required performance, users must incur higher costs. Thus, for streaming data
 warehouses, data freshness, query performance, and cost form a tripartite paradox.
 
-<img src="../images/concepts/fressness_cost_performance.png" alt="Fressness, cost and performance" width="60%" height="60%">
+<img src="../images/concepts/freshness_cost_performance.png" alt="Freshness, cost and performance" width="60%" height="60%">
 
 Amoro offers a resolution to the tripartite paradox for users by utilizing AMS management functionality and a self-optimizing mechanism. Unlike
 traditional data warehouses, Lakehouse tables are utilized in a multitude of data pipelines, AI, and BI scenarios. Measuring data freshness is
@@ -58,4 +58,4 @@ greater flexibility:
 SHOW TBLPROPERTIES test_db.test_log_store ('watermark.base');
 ```
 
-You can learn about how to use Watermark in detail by referring to [Managing tables](../managing-tables/).
+You can learn about how to use Watermark in detail by referring to [Managing tables](../using-tables/).
diff --git a/docs/engines/flink/flink-ddl.md b/docs/engines/flink/flink-ddl.md
@@ -189,16 +189,16 @@ Not supported at the moment
 | BIGINT          | BIGINT         |
 | FLOAT           | FLOAT          |
 | DOUBLE          | DOUBLE         |
-| DECIAML(p, s)   | DECIAML(p, s)  |
+| DECIMAL(p, s)   | DECIMAL(p, s)  |
 | DATE            | DATE           |
 | TIMESTAMP(6)    | TIMESTAMP      |
-| VARBINARY       | BYNARY         |
+| VARBINARY       | BINARY         |
 | ARRAY<T>        | ARRAY<T>       |
 | MAP<K, V>       | MAP<K, V>      |
 | ROW             | STRUCT         |
 
 
-### Mixed-Iceberg daata types
+### Mixed-Iceberg data types
 | Flink Data Type                   | Mixed-Iceberg Data Type |
 |-----------------------------------|-------------------------|
 | CHAR(p)                           | STRING                  |
@@ -211,13 +211,13 @@ Not supported at the moment
 | BIGINT                            | LONG                    |
 | FLOAT                             | FLOAT                   |
 | DOUBLE                            | DOUBLE                  |
-| DECIAML(p, s)                     | DECIAML(p, s)           |
+| DECIMAL(p, s)                     | DECIMAL(p, s)           |
 | DATE                              | DATE                    |
 | TIMESTAMP(6)                      | TIMESTAMP               |
-| TIMESTAMP(6) WITH LCOAL TIME ZONE | TIMESTAMPTZ             |
+| TIMESTAMP(6) WITH LOCAL TIME ZONE | TIMESTAMPTZ             |
 | BINARY(p)                         | FIXED(p)                |
 | BINARY(16)                        | UUID                    |
-| VARBINARY                         | BYNARY                  |
+| VARBINARY                         | BINARY                  |
 | ARRAY<T>                          | ARRAY<T>                |
 | MAP<K, V>                         | MAP<K, V>               |
 | ROW                               | STRUCT                  |