-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add math operators and functions to work with multidimensional vectors #27933
Conversation
Added operators, tupleHammingDistance has been refactored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work!
Some comments, most of them suggests style change.
{ | ||
try | ||
{ | ||
ColumnWithTypeAndName left{left_elements.empty() ? nullptr : left_elements[i], left_types[i], {}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there real cases where it's empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just copypasted from the tupleHammingDistance
... Actually, do not know (and I agree -- it is strange if there is a real case). I can delete it from all places. If old tests with tupleHammingDistance
not fail, then it will be possible to conclude that there are no such cases :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are!!! SELECT tupleHammingDistance(materialize((1, 2)), (1, 4));
-- do not know why, but it fails if you leave only the third part of the ternary operator.
namespace | ||
{ | ||
struct Max2Name { static constexpr auto name = "max2"; }; | ||
using FunctionMax2 = FunctionMathBinaryFloat64<BinaryFunctionVectorized<Max2Name, max>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should be updated as there is a conversion to Float64, as I can understand.
Yes it returns Fload64 for all input types.
SELECT toTypeName(max2(2, 1))
┌─toTypeName(max2(2, 1))─┐
│ Float64 │
└────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think that it is not good to deduce such an exact function to the Float64
type, but do not know a convenient way to fix it.
if (tuple_size == 0) | ||
return DataTypeUInt8().createColumnConstWithDefaultValue(input_rows_count); | ||
|
||
const auto & p_column = arguments[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there use-cases for non-constnat p
, how do you think? Also, shold we limit possible values for p
, like 0 < p < inf
or make it integer?
SELECT LpNorm((3, 1, 4), 0), LpNorm((3, 1, 4), inf);
┌─LpNorm((3, 1, 4), 0)─┬─LpNorm((3, 1, 4), inf)─┐
│ inf │ 1 │
└──────────────────────┴────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to not limit p
. It definitely should NOT be just integer (I will add a test when p
is float). Real math sense is when p >= 1
, but probably somebody will want to use it for other purposes...
However, LpNorm
with inf
is weird as it is not the same as LinfNorm
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it 1 <= p < inf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restrictions added in the 53d7649.
Btw, did you consider different naming variants for Maybe |
For the purpose of not starting with a capital letter, new names are better. Current names are good in perspective that they can be read straightforward (Lp-norm, not the norm in Lp metric). UPD: added aliases |
Interesting bug or feature: (1, 2) * NULL is NULL, not tuple of NULLs.
+ for Strings, * for String and Number are not added as it can be implemented soon. LpNorm cannot get Decimal because of the pow function.
Error is related to If it's difficult to fix we can omit support of |
Any plan for supporting |
Internal documentation ticket: DOCSUP-16593. |
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
Changelog category:
Changelog entry:
This fully closes #4509 and even more.
Detailed description / Documentation draft:
Function
tupleHammingDistance
has been refactored. Now the summation is performed within only one variable instead of a column to get the result.operator +
,tuplePlus
,vectorSum
— do tuple-wise addition. Arguments:(Tuple, Tuple)
. Returns:Tuple
.operator -
,tupleMinus
,vectorDifference
— do tuple-wise subtraction. Arguments:(Tuple, Tuple)
. Returns:Tuple
.tupleMultiply
— do tuple-wise multiplication (compatibility). Arguments:(Tuple, Tuple)
. Returns:Tuple
.tupleDivide
— do tuple-wise division (compatibility). Arguments:(Tuple, Tuple)
. Returns:Tuple
.unary operator -
,tupleNegate
— do tuple-wise negation. Arguments:(Tuple)
. Returns:Tuple
.operator *
,tupleMultiplyByNumber
— multiplies each element of a tuple by a number. Arguments:(Tuple, Number)
. Returns:Tuple
.operator /
,tupleDivideByNumber
— divides each element of a tuple by a number. Arguments:(Tuple, Number)
. Returns:Tuple
.operator *
,dotProduct
,scalarProduct
— a dot (aka scalar) product of vectors. Arguments:(Tuple, Tuple)
. Returns:Number
.L1Norm
,normL1
— calculates the sum of absolute values of coordinates. Arguments:(Tuple)
. Returns:Number
.L2Norm
,normL2
— calculates the square root of the sum of coordinates squares. Arguments:(Tuple)
. Returns:Number
.LinfNorm
,normLinf
— calculates the maximum absolute value among coordinates. Arguments:(Tuple)
. Returns:Number
.LpNorm
,normLp
— calculates a root ofp
th power of the sum of absolute values of coordinates inp
th powers. Arguments:(Tuple, Number)
. Returns:Number
.LpNorm
should be reviewed very carefully.L1Distance
,distanceL1
— finds the distance between two points (as tuples) using 1-norm. Arguments:(Tuple, Tuple)
. Returns:Number
.L2Distance
,distanceL2
— finds the distance between two points (as tuples) using 2-norm. Arguments:(Tuple, Tuple)
. Returns:Number
.LinfDistance
,distanceLinf
— finds the distance between two points (as tuples) using infinity-norm. Arguments:(Tuple, Tuple)
. Returns:Number
.LpDistance
,distanceLp
— finds the distance between two points (as tuples) using p-norm. Arguments:(Tuple, Tuple, Number)
. Returns:Number
.L1Normalize
,normalizeL1
— finds a unit vector of a given vector (tuple) according to 1-norm. Arguments:(Tuple)
. Returns:Tuple
.L2Normalize
,normalizeL2
— finds a unit vector of a given vector (tuple) according to 2-norm. Arguments:(Tuple)
. Returns:Tuple
.LinfNormalize
,normalizeLinf
— finds a unit vector of a given vector (tuple) according to infinity-norm. Arguments:(Tuple)
. Returns:Tuple
.LpNormalize
,normalizeLp
— finds a unit vector of a given vector (tuple) according to p-norm. Arguments:(Tuple, Number)
. Returns:Tuple
.cosineDistance
— calculates the cosine of the angle between vectors and subtracts it from one. Arguments:(Tuple, Tuple)
. Returns:Number
.max2
— finds the maximum of two numbers (developed forLinfNorm
function and it is just good to have this function). Arguments:(Number, Number)
. Returns:Number
. Maybe should be updated as there is a conversion to Float64, as I can understand.min2
— finds the minimum of two numbers (compatibility). Arguments:(Number, Number)
. Returns:Number
.Examples for each of the queries can be founded in the test related to this pull request.
In
Lp
functions only whenp
is not less than 1 makes sense as it is not a norm in the opposite case. However, there are no restrictions, so the user can pass even a negative number as a parameter.UPD: added restrictions
1 <= p < inf
.LxDistance(u, v) := LxNorm(u - v)
,LxNormalize(u) := u / LxNorm(u)
,cosineDistance(u, v) := 1 - (u * v) / (L2Norm(u) * L2Norm(v))
.Operators overloading that can be added:
(String, String) -> String
) to concatenate strings, * ((String, Integer) -> String
) to concatenate one string multiple times.