Proposal: Custom hardware numeric formats #7157

ghost · 2020-11-18T18:00:47Z

There are a wide variety of possible numeric formats supported by different hardware, and new ones may come into use in the future (I personally have my eye on posits). Our current strategy is to hardcode every format that we intend to support, which means that if we don't support a format, a new one rises to prominence, or someone just wants to tinker with an FPGA and tries to compile Zig for their work, either we need to make breaking changes to the compiler or someone has to fork it and fragment efforts. There are two solutions: either we try to anticipate every hardware format that numbers can possibly have (impossible, inelegant), or we provide a way of implementing formats in userspace. I propose the latter.

To be clear: I am NOT proposing operator overloading. In fact this proposal takes great care to make that impossible. The key insight is that we don't need functions -- we merely need to expose the capabilities of the existing hardware. This means that when defining a format, we need only provide three things:

A set of registers, divided into saved, temporary, and argument
A method to convert comptime_int or comptime_float or both to a bit pattern
A mapping from arithmetic operators to assembly instructions

For convenience in representation, it might also be helpful to define b{n}, an n-width bit vector. Such a type would support slicing by comptime-known bit indices with assignment, concatenation, repetition, equality checking, coercion from signed or unsigned integers (but not vice versa), and bitwise logical operations, but no arithmetic. Also, optionally, a more convenient coercion syntax: (T)val, in which the parentheses are required. Also optionally, types anyint and anyfloat to describe generic values that may only be specialised to integers or reals, and for which @typeOf returns the specialisation type if the value is defined. None of this is necessary for this proposal -- if they would create more problems than they solve, then to hell with them.

Here is one possible syntax, probably not the best:

// Implementing a posit with n = 8, es = 1
// https://posithub.org/docs/BeatingFloatingPoint.pdf

const p8_1 = numtype(b8) { // Hardcoded assumption of fixed bit width -- do not like, ideas welcome
    // `p8_1` registers (hypothetical, of course)
    // (represented here as strings -- in actual use this would most likely be done in codegen,
    // so we'd have access to the `Register` type, so don't take this at face value;
    // we might even make `.reg` a tagged union that can take strings, registers, or vectors)
    // (while we're at it, let's support indexing into vector registers
    // -- this is necessary on systems that use the same file for FP and vector ops, like ARM)
    .reg = switch (builtin.arch) {
        .enb_posit => .{
            .tmp = .{ "p8.b", "p9.b", "p10.b", /* ... */ };
            .sav = .{ "p16.b", "p17.b", /* ... */ };
            .arg = .{ "p0.b", "p1.b", /* ... */ };
        },
    };

    // Convert from `comptime_float`
    .@"comptime_float" = fn (i: comptime_float) b8 {
        // This is expensive, but that's ok because it only runs at comptime

        // We might as well make these explicit, for easy rewriting
        // -- we cannot go whole hog and parameterise as we depend on hardware for specific formats
        const n = 8;
        const es = 1;
        const useed = 1 << (1 << es);

        // Check for special cases and impossible values
        if (i = 0) return 0;
        if (i = NaN) return 1 << (n - 1);
        if (i = +inf or i = -inf) @compileError("p8_1 does not support infinities");

        // Split into an absolute value and a sign -- negation is trivial with posits
        const sign = i < 0;
        const abs = if (sign) i else -i;

        // Split the absolute value into a traditional exponent and mantissa -- this makes things easier
        const exp = (comptime_int)@floor(@log2(abs));
        const frac = abs / (1 << exp);

        // Now we calculate the fields, and insert them -- keep an index to track available space
        var acc: b8 = undefined;
        var idx = n;

        // First, the sign -- since in this algorithm we invert at the end, this is always 0
        acc[n - 1..n] = 0;
        idx -= 1;

        // Next, the regime
        {
            const k = exp / useed;
            const m: anyint = blk: if (k <= 0) {
                if (k < 2 - n) @compileError("p8_1 literal out of range (too small)");
                break :blk (b1)0 ** -k ++ (b1)1;
            } else {
                if (k > n - 2) @compileError("p8_1 literal out of range (too big)");
                if (k == n - 2) break :blk (std.meta.BitVector(n - 1))(-1);
                break :blk (b1)1 ** (k + 1) ++ (b1)0;
            };

            acc[idx-@bitSizeOf(TypeOf(m))..idx] = m;
            idx -= @bitSizeOf(@TypeOf(m));
        }

        // Then, the exponent
        {
            const e: std.meta.Int(es) = exp % useed;
            // There might not be space for the exponent -- we must check for that
            const m: anyint = if (idx - es < 0) blk: {
                if (e % (1 << (es - idx)) != 0) @compileError("p8_1 literal cannot be represented (exponent truncated)");
                break :blk (std.meta.BitVector(idx))(e >> (es - idx));
            } else (std.meta.BitVector(es))e;

            acc[idx-@bitSizeOf(TypeOf(m))..idx] = m;
            idx -= @bitSizeOf(@TypeOf(m));
        }

        // Finally, the fraction
        {
            const f: anyint = blk: {
                var j = frac - 1;
                var l = 0;
                break :blk while (j % 1 != 0) { j *= 2; l += 1 } else (std.meta.Int(l))j;
            };
            // As with the exponent, we must check for truncation
            const m = if (@bitSizeOf(@TypeOf(f)) > idx) blk: {
                if (f % (1 << idx) != 0) @compileError("p8_1 literal cannot be represented (fraction truncated)");
                break :blk (std.meta.BitVector(idx))(f >> (@bitSizeOf(@TypeOf(f)) - idx));
            } else (std.meta.BitVector(idx))(f << (idx - @bitSizeOf(@TypeOf(f))));

            acc[0..idx] = m;
            idx = 0;
        }

        // Lastly, we take the twos-complement of a negative value
        return if (sign) -(u8)acc else acc;
    };

    // Operators (must be naked inline functions
    // -- we may choose to require them to be pure assembly, if we don't mind new syntax)

    // `+` must take two arguments of this type, and return a value of the same
    .@"+" = inline fn callconv(.naked) (_a: @This(), _b: @This()) @This() {
        // Assembly syntax taken from #5241
        return asm {
            "?reg" a = _a, // `?reg` is taken from `.reg`
            "?reg" b = _b,
            "?reg" d: @This(),
            "?status",
        } ( "padd ?(d), ?(a), ?(b)" ) : d;
    };

    // `<` must take two arguments of this type, and return a `bool`
    .@"<" = inline fn callconv(.naked) (_a: @This(), _b: @This()) bool {
        return asm {
            "?reg" a = _a,
            "?reg" b = _b,
            "?reg" c: u8,
            "?reg" d: u8,
            "lt" lt: bool,
            "?status",
        } (
            // Routines may also define software implementations
            \\ pmov ?(c), ?(a)
            \\ pmov ?(d), ?(b)
            \\ comp ?(c), ?(d)
        ) : lt;
    };

    // All the operators must be defined -- we end here for brevity
};

By requiring a register map and a bit width, and more or less forcing inline assembly for operations, we ensure that the defined type will be supported in hardware. No operator overloading here.

For smooth interop with SIMD, numtypes may not be subdivided -- they represent scalars only. Strictly speaking, there is nothing to stop the author wrapping a matrix or other tensor type this way, however this will not interoperate with Zig SIMD. Defining .reg as a tagged union as mentioned above may ease integration -- further thought is needed on this subject.

With this proposal, we future-proof the language against new numeric formats, without breaking it wide open. Zig remains tight and transparent, while becoming more flexible and adaptable.

The text was updated successfully, but these errors were encountered:

s5bug · 2020-11-18T20:10:47Z

Perhaps this goes hand-in-hand with my idea to use a Computable Number implementation for comptime_float instead of an f128. This would make the algorithm above for coercing to a constant-size float/posit value potentially much cheaper, at the cost of comptime float math being a little bit more expensive.

Thoughts?

mrakh · 2020-11-18T21:06:33Z

Pretty much everything out there either uses IEEE-754 or fixed point math since the past three or so decades. The introduction of posits, while promising, is mostly just an academic exercise at this point. It would seem to me that it's more pragmatic to hardcode any new formats as they pop up, than to build a framework around something that something that rarely (if ever) changes. I suppose a custom float type with a user-defined number of exponent and mantissa bits could be useful, for future revisions of the IEEE-754 standard?

ghost · 2020-11-19T15:23:28Z

The proper way to support custom numeric types is operator overloading. There are very good use cases for that in the here-and-now (vector math, complex numbers, rationals). On the other hand, if you consider operator overloading harmful, then you have to live with method chaining for your posits, just as you do for vectors and rationals. And in the extremely unlikely event that posits or LNS or whatever else goes mainstream, there should be no problem in retrofitting support for it into the language, since Zig has no plans of being feature-frozen after 1.0 (#350, #6466). In the mean time, let's not complicate the language and compiler with speculative features.

ghost · 2021-07-09T12:52:25Z

Ewww.

Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 18, 2020

Vexu added this to the 0.8.0 milestone Nov 18, 2020

ghost mentioned this issue Apr 18, 2021

Add MathType for scope limited operator overloading #8567

Closed

andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021

ghost closed this as completed Jul 9, 2021

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Custom hardware numeric formats #7157

Proposal: Custom hardware numeric formats #7157

ghost commented Nov 18, 2020 •

edited by ghost

Loading

s5bug commented Nov 18, 2020

mrakh commented Nov 18, 2020

ghost commented Nov 19, 2020

ghost commented Jul 9, 2021

Proposal: Custom hardware numeric formats #7157

Proposal: Custom hardware numeric formats #7157

Comments

ghost commented Nov 18, 2020 • edited by ghost Loading

s5bug commented Nov 18, 2020

mrakh commented Nov 18, 2020

ghost commented Nov 19, 2020

ghost commented Jul 9, 2021

ghost commented Nov 18, 2020 •

edited by ghost

Loading