Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add an FAQ item for apisix high latency due to etcd #7906

Merged
merged 1 commit into from
Sep 15, 2022

Conversation

hansedong
Copy link
Contributor

@hansedong hansedong commented Sep 13, 2022

Signed-off-by: hansedong admin@yinxiaoluo.com

Description

Add an FAQ item for apisix high latency due to Etcd

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

@hansedong hansedong changed the title docs: add FAQ for apisix high latency due to etcd docs: add an FAQ item for apisix high latency due to etcd Sep 13, 2022
Copy link
Contributor

@tokers tokers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

Please wrap the version number with the back quota. Also, rename Etcd to ETCD.

docs/en/latest/FAQ.md Outdated Show resolved Hide resolved
docs/en/latest/FAQ.md Outdated Show resolved Hide resolved
docs/en/latest/FAQ.md Outdated Show resolved Hide resolved

```

At present, Etcd officially maintains two main branches, 3.4 and 3.5.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At present, Etcd officially maintains two main branches, 3.4 and 3.5.
At present, Etcd officially maintains two main branches, `3.4` and `3.5`.


1. Change the communication method between APISIX and Etcd from HTTPS to HTTP (not recommended).
2. Fallback version to 3.4.20 (not recommended).
3. Clone the Etcd source code and compile the release-3.5 branch directly (this branch has fixed the problem of HTTP2 connections, but the new version has not been released yet). This method is recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also not a recommended way to change the ETCD source code since users may deploy ETCD via image.

@hansedong
Copy link
Contributor Author

@tokers I made some changes, please help to do a review, thanks.

Comment on lines 637 to 641
这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
这些延迟问题,严重影响了 APISIX 的服务稳定性,而之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC。而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
在这个场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会非常的慢。
这些延迟问题,严重影响了 APISIX 的服务稳定性。之所以会出现这类问题,主要是因为 ETCD 对外提供了 2 种操作方式:HTTP(HTTPS)、gRPC,而 APISIX 是基于 HTTP(HTTPS)协议来操作 ETCD 的。
在上述场景中,ETCD 存在一个关于 HTTP/2 的 BUG:如果通过 HTTPS 操作 ETCD(HTTP 不受影响),HTTP/2 的连接数上限为 Golang 默认的 `250` 个。
所以,当 APISIX 数据面节点数较多时,一旦所有 APISIX 节点与 ETCD 连接数超过这个上限,则 APISIX 的接口响应会变得非常慢。


```

目前,ETCD 官方主要维护了 `3.4` 和 `3.5` 2 个主要版本。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
目前,ETCD 官方主要维护了 `3.4``3.5` 2 个主要版本。
目前,ETCD 官方主要维护了 `3.4``3.5` 这两个主要版本。在 `3.4` 系列中,近期发布的 `3.4.20` 版本已修复了这个问题。至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)仍尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以参考以下几种方式解决这个问题:

Comment on lines 659 to 660
而 `3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:

而 `3.4` 已有近期发布的 `3.4.20` 修复了这个问题。
至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:

1. APISIX 与 ETCD 的通讯方式,由 HTTPS 改为 HTTP。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. APISIX 与 ETCD 的通讯方式,由 HTTPS 改为 HTTP。
1. APISIX 与 ETCD 的通讯方式由 HTTPS 改为 HTTP。

至于 `3.5` 版本,其实,官方很早之前就在筹备发布 `3.5.5` 版本了,但截止目前(2022.09.13)也尚未发布。所以,如果你使用的是 ETCD 的版本小于 `3.5.5`,可以有几种方式解决这个问题:

1. APISIX 与 ETCD 的通讯方式,由 HTTPS 改为 HTTP。
2. 回退版本到 `3.4.20`。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. 回退版本到 `3.4.20`
2. 将 ETCD 版本回退到 `3.4.20`


```

At present, ETCD officially maintains two main branches, `3.4` and `3.5`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
At present, ETCD officially maintains two main branches, `3.4` and `3.5`.
ETCD officially maintains two main versions, `3.4` and `3.5`. In the `3.4` series, the recently released `3.4.20` version has fixed this issue. As for the `3.5` version, the official was preparing to release the `3.5.5` version a long time ago, but it has not been released as of now (2022.09.13). So, if you are using ETCD version less than `3.5.5`, you can refer to the following ways to solve this problem:

Comment on lines 659 to 660
The `3.4` branch has the recently released `3.4.20` which fixes this issue.
As for the `3.5` branch, in fact, the official is preparing to release the `3.5.5` version a long time ago, but it has not been released so far. So, if you are using a version of ETCD less than `3.5.5`, there are several ways to solve this problem:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `3.4` branch has the recently released `3.4.20` which fixes this issue.
As for the `3.5` branch, in fact, the official is preparing to release the `3.5.5` version a long time ago, but it has not been released so far. So, if you are using a version of ETCD less than `3.5.5`, there are several ways to solve this problem:

As for the `3.5` branch, in fact, the official is preparing to release the `3.5.5` version a long time ago, but it has not been released so far. So, if you are using a version of ETCD less than `3.5.5`, there are several ways to solve this problem:

1. Change the communication method between APISIX and ETCD from HTTPS to HTTP.
2. Fallback version to `3.4.20`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Fallback version to `3.4.20`.
2. Roll back the ETCD version to `3.4.20`.

@@ -627,6 +627,58 @@ curl http://127.0.0.1:9180/apisix/admin/routes/health-info \

:::

## APISIX 与 ETCD 相关的延迟较高的问题有哪些,如何修复?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change ETCD to etcd. please refer to https://etcd.io/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hansedong I think this comment is not just to modify etcd in the title to the correct format, but to modify all the parts of etcd in this content to the correct format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylviaBABY It's fixed, thanks for pointing out the problem.

make GOOS=linux GOARCH=amd64
```

编译的二进制在 bin 目录下,将其替换掉你服务器环境的 ETCD 二进制后,然后重启 ETCD 即可。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
编译的二进制在 bin 目录下,将其替换掉你服务器环境的 ETCD 二进制后,然后重启 ETCD 即可。
编译的二进制在 `bin` 目录下,将其替换掉你服务器环境的 ETCD 二进制后,然后重启 ETCD 即可。


编译的二进制在 bin 目录下,将其替换掉你服务器环境的 ETCD 二进制后,然后重启 ETCD 即可。

相关的 issue 或 PR 可以参考:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
相关的 issue 或 PR 可以参考
更多信息,请参考

Comment on lines 677 to 680
- https://github.com/etcd-io/etcd/issues/14185
- https://github.com/apache/apisix/issues/7078
- https://github.com/apache/apisix/issues/7353
- https://github.com/etcd-io/etcd/pull/14169
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- https://github.com/etcd-io/etcd/issues/14185
- https://github.com/apache/apisix/issues/7078
- https://github.com/apache/apisix/issues/7353
- https://github.com/etcd-io/etcd/pull/14169
- [when etcd node have many http long polling connections, it may cause etcd to respond slowly to http requests.](https://github.com/etcd-io/etcd/issues/14185)
- [bug: when apisix starts for a while, its communication with etcd starts to time out](https://github.com/apache/apisix/issues/7078)
- [the prometheus metrics API is tool slow](https://github.com/apache/apisix/issues/7353)
- [Support configuring `MaxConcurrentStreams` for http2](https://github.com/etcd-io/etcd/pull/14169)

Comment on lines 677 to 680
- https://github.com/etcd-io/etcd/issues/14185
- https://github.com/apache/apisix/issues/7078
- https://github.com/apache/apisix/issues/7353
- https://github.com/etcd-io/etcd/pull/14169
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- https://github.com/etcd-io/etcd/issues/14185
- https://github.com/apache/apisix/issues/7078
- https://github.com/apache/apisix/issues/7353
- https://github.com/etcd-io/etcd/pull/14169
- [when etcd node have many http long polling connections, it may cause etcd to respond slowly to http requests.](https://github.com/etcd-io/etcd/issues/14185)
- [bug: when apisix starts for a while, its communication with etcd starts to time out](https://github.com/apache/apisix/issues/7078)
- [the prometheus metrics API is tool slow](https://github.com/apache/apisix/issues/7353)
- [Support configuring `MaxConcurrentStreams` for http2](https://github.com/etcd-io/etcd/pull/14169)


The compiled binary is in the bin directory, replace it with the ETCD binary of your server environment, and then restart ETCD:

Related issues or PRs can refer to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Related issues or PRs can refer to
For more information, please refer to:

2. Fallback version to `3.4.20`.
3. Clone the ETCD source code and compile the `release-3.5` branch directly (this branch has fixed the problem of HTTP/2 connections, but the new version has not been released yet).

The way to recompile ETCD is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The way to recompile ETCD is as follows
The way to recompile etcd is as follows:

These problems related to higher latency seriously affect the service stability of APISIX, and the reason why such problems occur is mainly because ETCD provides two modes of operation: HTTP (HTTPS) and gRPC. And APISIX uses the HTTP (HTTPS) protocol to operate ETCD.
In this scenario, ETCD has a bug about HTTP/2: if ETCD is operated over HTTPS (HTTP is not affected), the upper limit of HTTP/2 connections is the default `250` in Golang. Therefore, when the number of APISIX data plane nodes is large, once the number of connections between all APISIX nodes and ETCD exceeds this upper limit, the response of APISIX API interface will be very slow.

In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows
In Golang, the default upper limit of HTTP/2 connections is `250`, the code is as follows:

@hansedong
Copy link
Contributor Author

@hf400159 Fixed as you suggested. Thank you very much for taking the time to point out the problem so patiently.

Signed-off-by: hansedong <admin@yinxiaoluo.com>
@SylviaBABY SylviaBABY requested review from guitu168 and tokers and removed request for guitu168 September 15, 2022 02:02
@SylviaBABY SylviaBABY merged commit ed437af into apache:master Sep 15, 2022
Liu-Junlin pushed a commit to Liu-Junlin/apisix that referenced this pull request Nov 4, 2022
Signed-off-by: hansedong <admin@yinxiaoluo.com>

Signed-off-by: hansedong <admin@yinxiaoluo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants