diff --git a/docs/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/docs/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index cad7c8110b880..f20a6f46820f1 100644 --- a/docs/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/docs/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT ### Description #### Syntax -`APPROX_COUNT_DISTINCT (expr)` +`APPROX_COUNT_DISTINCT(expr)` +Returns an approximate aggregation function similar to the result of `COUNT(DISTINCT col)`. -Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col). +It is implemented based on the HyperLogLog algorithm, which uses a fixed size of memory to estimate the column base. The algorithm is based on the assumption of a null distribution in the tails, and the accuracy depends on the data distribution. Based on the fixed bucket size used by Doris, the relative standard error of the algorithm is 0.8125%. -It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory can be used for columns with high cardinality. +For a more detailed and specific analysis, see [related paper](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,6 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT +### Keywords + APPROX_COUNT_DISTINCT diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index bc23f856d98ae..dcd0b86a50649 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT -### description +### Description #### Syntax `APPROX_COUNT_DISTINCT(expr)` +返回类似于 `COUNT(DISTINCT col)` 结果的近似值聚合函数。 -返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。 +它基于 HyperLogLog 算法实现,使用固定大小的内存估算列基数。该算法基于尾部零分布假设进行计算,具体精确程度取决于数据分布。基于 Doris 使用的固定桶大小,该算法相对标准误差为 0.8125% -它比 COUNT 和 DISTINCT 组合的速度更快,并使用固定大小的内存,因此对于高基数的列可以使用更少的内存。 +更详细具体的分析,详见[相关论文](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,5 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT + +### Keywords + APPROX_COUNT_DISTINCT diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index bc23f856d98ae..dcd0b86a50649 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT -### description +### Description #### Syntax `APPROX_COUNT_DISTINCT(expr)` +返回类似于 `COUNT(DISTINCT col)` 结果的近似值聚合函数。 -返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。 +它基于 HyperLogLog 算法实现,使用固定大小的内存估算列基数。该算法基于尾部零分布假设进行计算,具体精确程度取决于数据分布。基于 Doris 使用的固定桶大小,该算法相对标准误差为 0.8125% -它比 COUNT 和 DISTINCT 组合的速度更快,并使用固定大小的内存,因此对于高基数的列可以使用更少的内存。 +更详细具体的分析,详见[相关论文](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,5 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT + +### Keywords + APPROX_COUNT_DISTINCT diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index bc23f856d98ae..dcd0b86a50649 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT -### description +### Description #### Syntax `APPROX_COUNT_DISTINCT(expr)` +返回类似于 `COUNT(DISTINCT col)` 结果的近似值聚合函数。 -返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。 +它基于 HyperLogLog 算法实现,使用固定大小的内存估算列基数。该算法基于尾部零分布假设进行计算,具体精确程度取决于数据分布。基于 Doris 使用的固定桶大小,该算法相对标准误差为 0.8125% -它比 COUNT 和 DISTINCT 组合的速度更快,并使用固定大小的内存,因此对于高基数的列可以使用更少的内存。 +更详细具体的分析,详见[相关论文](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,5 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT + +### Keywords + APPROX_COUNT_DISTINCT diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index bc23f856d98ae..dcd0b86a50649 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT -### description +### Description #### Syntax `APPROX_COUNT_DISTINCT(expr)` +返回类似于 `COUNT(DISTINCT col)` 结果的近似值聚合函数。 -返回类似于 COUNT(DISTINCT col) 结果的近似值聚合函数。 +它基于 HyperLogLog 算法实现,使用固定大小的内存估算列基数。该算法基于尾部零分布假设进行计算,具体精确程度取决于数据分布。基于 Doris 使用的固定桶大小,该算法相对标准误差为 0.8125% -它比 COUNT 和 DISTINCT 组合的速度更快,并使用固定大小的内存,因此对于高基数的列可以使用更少的内存。 +更详细具体的分析,详见[相关论文](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,5 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT + +### Keywords + APPROX_COUNT_DISTINCT diff --git a/versioned_docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/versioned_docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index cad7c8110b880..f20a6f46820f1 100644 --- a/versioned_docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/versioned_docs/version-2.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT ### Description #### Syntax -`APPROX_COUNT_DISTINCT (expr)` +`APPROX_COUNT_DISTINCT(expr)` +Returns an approximate aggregation function similar to the result of `COUNT(DISTINCT col)`. -Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col). +It is implemented based on the HyperLogLog algorithm, which uses a fixed size of memory to estimate the column base. The algorithm is based on the assumption of a null distribution in the tails, and the accuracy depends on the data distribution. Based on the fixed bucket size used by Doris, the relative standard error of the algorithm is 0.8125%. -It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory can be used for columns with high cardinality. +For a more detailed and specific analysis, see [related paper](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,6 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT +### Keywords + APPROX_COUNT_DISTINCT diff --git a/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index cad7c8110b880..f20a6f46820f1 100644 --- a/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/versioned_docs/version-2.1/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT ### Description #### Syntax -`APPROX_COUNT_DISTINCT (expr)` +`APPROX_COUNT_DISTINCT(expr)` +Returns an approximate aggregation function similar to the result of `COUNT(DISTINCT col)`. -Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col). +It is implemented based on the HyperLogLog algorithm, which uses a fixed size of memory to estimate the column base. The algorithm is based on the assumption of a null distribution in the tails, and the accuracy depends on the data distribution. Based on the fixed bucket size used by Doris, the relative standard error of the algorithm is 0.8125%. -It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory can be used for columns with high cardinality. +For a more detailed and specific analysis, see [related paper](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,6 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT +### Keywords + APPROX_COUNT_DISTINCT diff --git a/versioned_docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md b/versioned_docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md index cad7c8110b880..f20a6f46820f1 100644 --- a/versioned_docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md +++ b/versioned_docs/version-3.0/sql-manual/sql-functions/aggregate-functions/approx-count-distinct.md @@ -24,19 +24,20 @@ specific language governing permissions and limitations under the License. --> -## APPROX_COUNT_DISTINCT ### Description #### Syntax -`APPROX_COUNT_DISTINCT (expr)` +`APPROX_COUNT_DISTINCT(expr)` +Returns an approximate aggregation function similar to the result of `COUNT(DISTINCT col)`. -Returns an approximate aggregation function similar to the result of COUNT (DISTINCT col). +It is implemented based on the HyperLogLog algorithm, which uses a fixed size of memory to estimate the column base. The algorithm is based on the assumption of a null distribution in the tails, and the accuracy depends on the data distribution. Based on the fixed bucket size used by Doris, the relative standard error of the algorithm is 0.8125%. -It combines COUNT and DISTINCT faster and uses fixed-size memory, so less memory can be used for columns with high cardinality. +For a more detailed and specific analysis, see [related paper](https://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf) -### example -``` +### Example + +```sql MySQL > select approx_count_distinct(query_id) from log_statis group by datetime; +-----------------+ | approx_count_distinct(`query_id`) | @@ -44,6 +45,6 @@ MySQL > select approx_count_distinct(query_id) from log_statis group by datetime | 17721 | +-----------------+ ``` -### keywords -APPROX_COUNT_DISTINCT +### Keywords + APPROX_COUNT_DISTINCT