-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat](skew & kurt) New aggregate function skew & kurt #40945
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TPC-H: Total hot run time: 41559 ms
|
TPC-DS: Total hot run time: 193943 ms
|
ClickBench: Total hot run time: 31.75 s
|
TeamCity be ut coverage result: |
4ae074e
to
c5fa11c
Compare
run buildall |
TPC-H: Total hot run time: 41954 ms
|
TPC-DS: Total hot run time: 194990 ms
|
ClickBench: Total hot run time: 32.71 s
|
TeamCity be ut coverage result: |
|
||
namespace doris::vectorized { | ||
|
||
enum class StatisticsFunctionKind : uint8_t { skewPop, kurtPop }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use UPPER CASE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
namespace doris::vectorized { | ||
|
||
enum class StatisticsFunctionKind : uint8_t { skewPop, kurtPop }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better use UPPER CASE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed to STATISTICS_FUNCTION_KIND
|
||
template <typename T, std::size_t _level> | ||
struct StatFuncOneArg { | ||
using Type1 = T; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same type, no need two type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
} | ||
|
||
void reset() { return; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is usefully, should reset all m to init val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
using ResultType = Float64; | ||
using Data = VarMoments<ResultType, _level>; | ||
|
||
static constexpr UInt32 num_args = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems not use this var?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
using ColVecT1 = ColumnVectorOrDecimal<T1>; | ||
using ColVecT2 = ColumnVectorOrDecimal<T2>; | ||
using ResultType = typename StatFunc::ResultType; | ||
using ColVecResult = ColumnVector<ResultType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here seems could write more simple code,
as the two function return type is ColumnFloat64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/agg/Skew.java
Show resolved
Hide resolved
implements UnaryExpression, ExplicitlyCastableSignature, AlwaysNullable { | ||
|
||
public static final List<FunctionSignature> SIGNATURES = ImmutableList.of( | ||
FunctionSignature.ret(DoubleType.INSTANCE).args(FloatType.INSTANCE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could let FE members check the args order, #39352
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now same with 39352
|
||
void add(AggregateDataPtr __restrict place, const IColumn** columns, ssize_t row_num, | ||
Arena*) const override { | ||
if constexpr (NullableInput) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should skip the null value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is using creator_without_type::create_ignore_nullable
, aggregate_function_null will not be used since this return type is always nullable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
"'getPopulation' method"); | ||
} | ||
|
||
T getPopulation() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_population
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41903 ms
|
TPC-DS: Total hot run time: 195972 ms
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
42228ba
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 41536 ms
|
TPC-DS: Total hot run time: 191403 ms
|
ClickBench: Total hot run time: 33.05 s
|
run p0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
# Versions - [x] dev - [x] 3.0 - [ ] 2.1 - [ ] 2.0 # Languages - [x] Chinese - [x] English ref apache/doris#40945
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
`skew`,`skew_pop` and `skewness` is used to calculate [skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness) of a data distribution. `kurt`,`kurt_pop` and `kurtosis` is used to calculate [kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data distribution. The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN. The formula used to calculate skew is `3-th moments / (variance^{1.5})` The formula used to calculate kurt is `4-th moments / (variance^{2}) - 3` when value of any result is NaN, doris will return NULL. doc: apache/doris-website#1127
skew
,skew_pop
andskewness
is used to calculate skewness of a data distribution.kurt
,kurt_pop
andkurtosis
is used to calculate kurtosis of a data distribution.The implementation references ClickHouse/ClickHouse#5200, and modified result type to AlwaysNullable since doris do not support NaN.
The formula used to calculate skew is
3-th moments / (variance^{1.5})
The formula used to calculate kurt is
4-th moments / (variance^{2}) - 3
when value of any result is NaN, doris will return NULL.
doc: apache/doris-website#1127