Specify a well-defined sorting order for float and double types

Currently parquet-format specifies the sort order for floating point numbers as follows:
```java

   *   FLOAT - signed comparison of the represented value
   *   DOUBLE - signed comparison of the represented value
```
The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than \<u>0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C\</u>\+ implementations, which leads to interoperability problems.

TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages.

**Reporter**: [Zoltan Ivanfi](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zi) / @zivanfi
**Assignee**: [Micah Kornfield](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=emkornfield) / @emkornfield
#### Related issues:
- [Impala shouldn't write column indexes for float columns until PARQUET-1222 is resolved](https://issues.apache.org/jira/browse/IMPALA-7304) (Blocked)
- [Implement specification-compliant floating point comparison](https://issues.apache.org/jira/browse/IMPALA-6539) (blocks)
- [[parquet-mr] Implement specification-compliant floating point comparison](https://github.com/apache/parquet-java/issues/2135) (blocks)
- [[C++] Implement specification-compliant floating point comparison](https://github.com/apache/arrow/issues/42801) (blocks)
- [[C++][Dataset] Handle NaNs correctly in Parquet predicate push-down](https://issues.apache.org/jira/browse/ARROW-12264) (is related to)
- [Ignore float/double statistics in case of NaN](https://github.com/apache/parquet-java/issues/2145) (is related to)
- [Clarify ambiguous min/max stats for FLOAT/DOUBLE](https://github.com/apache/parquet-format/issues/348) (is related to)

<sub>**Note**: *This issue was originally created as [PARQUET-1222](https://issues.apache.org/jira/browse/PARQUET-1222). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specify a well-defined sorting order for float and double types #342

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Specify a well-defined sorting order for float and double types #342

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions