Skip to content

Proposal: Make floats non-NaN by default #11234

Open
@topolarity

Description

@topolarity

Introduction

The idea behind this proposal comes from observing that NaN has some surprising commonalities with null pointers:

  1. It represents an invalid value using a specialized bit sequence
  2. Some functions expect to receive this invalid value, others assume they do not (for optimal performance)

In combination with the arithmetic/comparison behavior of NaNs, these troubles lead to a number of footguns in real-life code.

Examples include sorting algorithms failing for NaN data, every NaN colliding in a hash map, NaNs propagating virally in streaming outputs, invalid image filtering when operating on NaNs, nan values persisting after filtering, parsers failing on NaN inputs, and formatting/display unintentionally exposing NaN to the user.

Footguns abound when there is disagreement about whether NaN needs to be handled correctly.

Proposal

Option A: Replace f32 with error{NaN}!f32

  • All floating point operations (+-*/%) yield error{NaN}!f32
  • Arithmetic is overloaded on error{NaN}!f32
  • error{NaN}!f32 can be unwrapped with try, catch, and if like any other error union
  • Comparison of f32 yields bool. Comparison of error{NaN}!f32 yields error{NaNOperand}!bool

Other error unions, such as error{Foo}!f32 are not treated specially (no arithmetic, no special layout, etc.).

"NaN-boxing" is to be supported via getNaNPayload and setNaNPayload

Option B: Make comparisons of floats return error{NaNOperand}!bool

This is a minimal change to the language that would force users to explicitly account for NaN in floating point comparisons, which is the central oversight in above-mentioned bugs.

API Impacts

This means that "nan-safe" functions can be given a type that reflects their special handling of NaN. Meanwhile, highly-optimized routines that don't handle NaN correctly can be given a type that reflects their assumptions:

// Returns median, ignoring any NaN values
pub fn median(in: []const error{NaN}!f32) f32 { ... }
// Assumes that inputs do not include NaN
pub fn convolve(a: []const f32, b: []const f32,  out: []f32) { ... }

Example

/// Insertion sort. NaN values are sorted to the end
fn sort_inplace(vec: []error{NaN}!f32) void {
    for (vec) |maybe_key, i| {
        if (maybe_key) |key| { // If maybe_key is NaN, treat it as greater than everything (i.e. don't move it)
            var j = i;
            // `error{NaN}!f32` forces us to explicitly handle the NaN case here
            while ((vec[j - 1] > key) catch true) { // If vec[j - 1] is NaN, treat it as greater
                vec[j] = vec[j - 1];
                j = j - 1;
                if (j == 0) break;
            }
            vec[j] = key;
        }
    }
}

Meanwhile, the code for a non-NaN-safe version of this function would look exactly like it does today.

Supplemental Ideas

These related ideas can be accepted/rejected separately from the main proposal:

  1. Size Optimization for ?f32: Define ?f32 to be stored in a typical float, by assigning it a NaN payload with a special value. This is similar to R's "NA" value, except that ?f32 would not support arithmetic or comparison (except with null), meaning that NA/NaN propagation is not an issue. It behaves like any other optional.

  2. @assertFinite/@assertNonNaN built-ins: The UB-introducing @setFloatMode(.Optimize) assumptions are that inputs/outputs are non-Inf and non-NaN. All other fast-math optimization flags make a different performance/accuracy trade-off, but do not directly introduce poison/undefined into the program.@assertFinite would allow the programmer to make these dangerous assumptions explicit in their code where it's obvious exactly which operands it affects.

(1) can be particularly important for performance when operating on large, structured data, since it affects how many values can fit into a cache line. This is why it's common to see in statistical software, including R and Pandas.

Edit: Updated 4/11 to use error{NaN}!f32 instead of ?f32 + add supplemental ideas

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions