Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH] Aligning the NULL and NaN compare semantics of Spark and CH #2390

Closed
exmy opened this issue Jul 19, 2023 · 0 comments
Closed

[CH] Aligning the NULL and NaN compare semantics of Spark and CH #2390

exmy opened this issue Jul 19, 2023 · 0 comments
Labels
enhancement New feature or request stale stale

Comments

@exmy
Copy link
Contributor

exmy commented Jul 19, 2023

Spark and CH have different compare semantics for NULL and NaN value:

  • In Spark, NULL is least and Nan is greatest
  • IN CH, NaN and NULLs are alway last in the array. nan_direction_hint can control its least or greatest.
    • if nan_direction_hint == -1, NaN and NULLs are considered as least than everything other;
    • if nan_direction_hint == 1, NaN and NULLs are considered as greatest than everything other.

example:
sort array [null,0.5,1.0,2.1,NaN] in ascending, in descending

ascending descending
Spark [null,0.5,1.0,2.1,NaN] [NaN,2.1,1.0,0.5,null]
CH [0.5,1.0,2.1,nan,NULL] [2.1,1.0,0.5,nan,NULL]

The array_max/array_min functions have the same issue.

It seems there is no other way to align them except by modifying the CH kernel code, and the modify is unlikely to be accepted by CH community.

The initial change:

diff --git a/src/Columns/ColumnVector.h b/src/Columns/ColumnVector.h
index 7b40d01c..a06c985d 100644
--- a/src/Columns/ColumnVector.h
+++ b/src/Columns/ColumnVector.h
@@ -99,8 +99,8 @@ struct FloatCompareHelper
                 return 0;
 
             return isnan_a
-                ? nan_direction_hint
-                : -nan_direction_hint;
+                ? -nan_direction_hint
+                : nan_direction_hint;
         }
 
         return (T(0) < (a - b)) - ((a - b) < T(0));
diff --git a/src/Functions/array/arraySort.cpp b/src/Functions/array/arraySort.cpp
index a853289e..86457fed 100644
--- a/src/Functions/array/arraySort.cpp
+++ b/src/Functions/array/arraySort.cpp
@@ -22,7 +22,7 @@ struct Less
     bool operator()(size_t lhs, size_t rhs) const
     {
         if constexpr (positive)
-            return column.compareAt(lhs, rhs, column, 1) < 0;
+            return column.compareAt(lhs, rhs, column, -1) < 0;
         else
             return column.compareAt(lhs, rhs, column, -1) > 0;
     }
@exmy exmy added the enhancement New feature or request label Jul 19, 2023
@exmy exmy changed the title [CH] Aligning the NULL and NaN sorting semantics of Spark and CH [CH] Aligning the NULL and NaN compare semantics of Spark and CH Jul 19, 2023
@github-actions github-actions bot added the stale stale label Nov 18, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale stale
Projects
None yet
Development

No branches or pull requests

1 participant