Numeric literal semantics (#144)

This proposal provides semantics for numeric literals. * Numeric literals have a type derived from their value, and can be converted to any type that can represent that value. * Simple operations such as arithmetic that involve only literals also produce values of literal types. * Literals implicitly convert to types that can represent them. * The Carbon prelude provides: * An arbitrary-precision integer type `BigInt`. * A rational number type `Rational(T:! Type)` with constraints on `T` not yet determined. * A family of integer literal types, `IntLiteral(N:! BigInt)`. * A family of real literal types, `RealLiteral(N:! Rational(BigInt))`. Co-authored-by: Chandler Carruth <chandlerc@gmail.com> Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>
carbon-language · Sep 24, 2021 · b9619db · b9619db
1 parent 3b11b8d
commit b9619db
Showing 1 changed file with 318 additions and 0 deletions.
diff --git a/proposals/p0144.md b/proposals/p0144.md
@@ -0,0 +1,318 @@
+# Numeric literal semantics
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/144)
+
+## Table of contents
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Prelude support](#prelude-support)
+    -   [Implicit conversions](#implicit-conversions)
+    -   [Examples](#examples)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Use an ordinary integer or floating-point type for literals](#use-an-ordinary-integer-or-floating-point-type-for-literals)
+    -   [Use same type for all literals](#use-same-type-for-all-literals)
+    -   [Allow leading `-` in literal tokens](#allow-leading---in-literal-tokens)
+
+<!-- tocstop -->
+
+## Problem
+
+When a numeric literal appears in a program, we need to understand its
+semantics:
+
+-   What type does it have?
+-   What value is produced by operations on it?
+-   When can it validly be used to initialize an object?
+
+## Background
+
+In C++, numeric literals have either an integral type or a floating-point type.
+C++ provides permission for implementations to add extended integral types, but
+in practice (for bad reasons relating to `intmax_t`) implementations do not do
+so, so there are a small finite set of types that any given numeric literal
+might have:
+
+-   `int`, `long`, `long long`, or `unsigned` versions of these
+-   `float`, `double`, or `long double`
+
+The choice of type is determined solely by the literal.
+
+The C++ approach is error-prone and problematic:
+
+-   Lossy conversions from literals in initializers are permitted.
+-   Lossy operations on literals are permitted; for example, on a typical
+    implementation, `1 << 60` has value `0` because `1` is a 32-bit type.
+-   Attempting to naturally express some values has undefined behavior; for
+    example, `int x = -2147483648;` typically results in undefined behavior even
+    when -2147483648 is a valid `int` value.
+-   Integer literals with value 0 have special semantics that are lost when the
+    integer is passed to a function: "perfect" forwarding doesn't work for such
+    literals.
+-   The built-in types are privileged: only the types listed above have
+    literals. There is no syntax for a 64-bit integer literal, only for (for
+    example) a `long int` literal, which may or may not 64 bits wide.
+-   The type of a literal can be unpredictable in portable code, as it can
+    depend on which type a particular value happens to fit into.
+
+## Proposal
+
+Numeric literals have a type derived from their value, and can be converted to
+any type that can represent that value.
+
+Simple operations such as arithmetic that involve only literals also produce
+values of literal types.
+
+## Details
+
+Numeric literals have a type derived from their value. Two integer literals have
+the same type if and only if they represent the same integer. Two real number
+literals have the same type if and only if they represent the same real number.
+
+That is:
+
+-   For every integer, there is a type representing literals with that integer
+    value.
+-   For every rational number, there is a type representing literals with that
+    real value.
+-   The types for real numbers are distinct from the types for integers, even
+    for real numbers that represent integers. `var x: i32 = 1.0;` is invalid.
+
+Primitive operators are available between numeric literals, and produce values
+with numeric literal types. For example, the type of `1 + 2` is the same as the
+type of `3`.
+
+Numeric types can provide conversions to support initialization from numeric
+literals. Because the value of the literal is carried in the type, a type-level
+decision can be made as to whether the conversion is valid.
+
+The integer types defined in the standard library permit conversion from integer
+literal types whose values are representable in the integer type. The
+floating-point types defined in the Carbon library permit conversion from
+integer and rational literal types whose values are between the minimum and
+maximum finite value representable in the floating-point type.
+
+### Prelude support
+
+The following types are defined in the Carbon prelude:
+
+-   An arbitrary-precision integer type.
+
+    ```
+    class BigInt;
+    ```
+
+-   A rational type, parameterized by a type used for its numerator and
+    denominator.
+
+    ```
+    class Rational(T:! Type);
+    ```
+
+    The exact constraints on `T` are not yet decided.
+
+-   A type representing integer literals.
+
+    ```
+    class IntLiteral(N:! BigInt);
+    ```
+
+-   A type representing floating-point literals.
+
+    ```
+    class FloatLiteral(X:! Rational(BigInt));
+    ```
+
+All of these types are usable during compilation. `BigInt` supports the same
+operations as `Int(n)`. `Rational(T)` supports the same operations as
+`Float(n)`.
+
+The types `IntLiteral(n)` and `FloatLiteral(x)` also support primitive integer
+and floating-point operations such as arithmetic and comparison, but these
+operations are typically heterogeneous: for example, an addition between
+`IntLiteral(n)` and `IntLiteral(m)` produces a value of type
+`IntLiteral(n + m)`.
+
+### Implicit conversions
+
+`IntLiteral(n)` converts to any sufficiently large integer type, as if by:
+
+```
+impl [template N:! BigInt, template M:! BigInt]
+    IntLiteral(N) as ImplicitAs(Int(M))
+    if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
+  ...
+}
+impl [template N:! BigInt, template M:! BigInt]
+    IntLiteral(N) as ImplicitAs(Unsigned(M))
+    if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
+  ...
+}
+```
+
+The above is for exposition purposes only; various parts of this syntax are not
+yet decided.
+
+Similarly, `IntLiteral(x)` and `FloatLiteral(x)` convert to any sufficiently
+large floating-point type, and produce the nearest representable floating-point
+value. Conversions in which `x` lies exactly half-way between two values are
+rejected, as
+[previously decided](/docs/design/lexical_conventions/numeric_literals.md#ties).
+Conversions in which `x` is outside the range of finite values of the
+floating-point type are also represented, rather than saturating to the finite
+range or producing an infinity.
+
+### Examples
+
+```carbon
+// This is OK: the initializer is of the integer literal type with value
+// -2147483648 despite being written as a unary `-` applied to a literal.
+var x: i32 = -2147483648;
+
+// This initializes y to 2^60.
+var y: i64 = 1 << 60;
+
+// This forms a rational literal whose value is one third, and converts it to
+// the nearest representable value of type `f64`.
+var z: f64 = 1.0 / 3.0;
+
+// This is an error: 300 cannot be represented in type `i8`.
+var c: i8 = 300;
+
+fn f[template T:! Type](v: T) {
+  var x: i32 = v * 2;
+}
+
+// OK: x = 2_000_000_000.
+f(1_000_000_000);
+
+// Error: 4_000_000_000 can't be represented in type `i32`.
+f(2_000_000_000);
+
+// No storage required for the bound when it's of integer literal type.
+struct Span(template T:! Type, template BoundT:! Type) {
+  var begin: T*;
+  var bound: BoundT;
+}
+
+// Returns 1, because 1.3 can implicitly convert to f32, even though conversion
+// to f64 might be a more exact match.
+fn G() -> i32 {
+  match (1.3) {
+    case _: f32 => { return 1; }
+    case _: f64 => { return 2; }
+  }
+}
+
+// Can only be called with a literal 0.
+fn PassMeZero(_: IntLiteral(0));
+
+// Can only be called with integer literals in the given range.
+fn ConvertToByte[template N:! BigInt](_: IntLiteral(N)) -> i8
+    if N >= -128 and N <= 127 {
+  return N as i8;
+}
+
+// Given any int literal, produces a literal whose value is one higher.
+fn OneHigher(L: IntLiteral(template _:! BigInt)) -> auto {
+  return L + 1;
+}
+// Error: 256 can't be represented in type `i8`.
+var v: i8 = OneHigher(255);
+```
+
+## Alternatives considered
+
+### Use an ordinary integer or floating-point type for literals
+
+We could decide on a fixed-width type based on the form of the literal, for
+example using a type suffix with some rules to determine what type to pick for
+unsuffixed literals.
+
+Advantages:
+
+-   This follows what C++ does.
+-   Can determine the type of a floating-point number without requiring
+    contextual information.
+
+Disadvantages:
+
+-   Surprising behavior when applying an operator to a literal would result in
+    overflow. Even if we diagnose this, a diagnostic that `-2147483648` is
+    invalid because it overflows is surprising.
+-   Creates additional literal syntax that users will need to understand.
+-   May select types that don't match the programmer's expectations.
+-   Whatever types we pick are privileged.
+
+### Use same type for all literals
+
+We could give literals a single, arbitrary-precision type (say, `Integer` for
+integer literals and `Rational` for real literals).
+
+Advantages:
+
+-   Only introduces two new types, not an unbounded parameterized family of
+    types.
+-   Writing a function that takes any integer literal can be done with more
+    obvious syntax and less syntactic overhead. Instead of:
+    ```
+    fn OneHigher(L: IntLiteral(template _:! BigInt));
+    ```
+    we could write
+    ```
+    fn OneHigher(template L:! Integer);
+    ```
+    However, with this proposal, a function taking any integer expression that
+    can be evaluated to a constant can be written as
+    ```
+    fn F(template N:! BigInt);
+    ```
+    and such a function would accept all integer literals, as well as
+    non-literal constants.
+
+Disadvantages:
+
+-   Our mechanism for specifying the behavior of operations such as arithmetic
+    is based on interface implementations, which are looked up by type.
+    Supporting `impl` selection based on values would introduce substantial
+    complexity.
+-   If we introduce an arbitrary-precision integer type, it would be
+    inconsistent to support it only during compilation. However, if we allow its
+    use at runtime, programs may use it accidentally, with an invisible
+    performance cost. For example, `var x: auto = 123;` would result in `x`
+    having an infinite-precision type, possibly involving invisible dynamic
+    allocation.
+    -   Under this proposal, the type of `x` is a type that can only represent
+        the value `123`; as such, `x` is effectively immutable. The
+        arbitrary-precision integer type introduced in this proposal can only be
+        used explicitly by programs naming it.
+
+### Allow leading `-` in literal tokens
+
+We could treat a leading `-` character as part of a numeric literal token, so
+that -- for example -- `-123` would be a single `-123` token rather than a unary
+negation applied to a literal `123`.
+
+Advantages:
+
+-   This would narrowly solve the problem that `INT_MIN` cannot be written
+    directly, without any of the other implications of this proposal.
+
+Disadvantages:
+
+-   Makes the behavior of unary `-` less uniform.
+-   Prevents the introduction of infix or postfix operators that bind more
+    tightly than unary `-`, such as an infix exponentiation operator: `-2**2`
+    may be expected to evaluate to -4, not to +4.