Skip to content

Commit

Permalink
Numeric literal semantics (#144)
Browse files Browse the repository at this point in the history
This proposal provides semantics for numeric literals.

* Numeric literals have a type derived from their value, and can be converted to any type that can represent that value.

* Simple operations such as arithmetic that involve only literals also produce values of literal types.

* Literals implicitly convert to types that can represent them.

* The Carbon prelude provides:
    * An arbitrary-precision integer type `BigInt`.
    * A rational number type `Rational(T:! Type)` with constraints on `T` not yet determined.
    * A family of integer literal types, `IntLiteral(N:! BigInt)`.
    * A family of real literal types, `RealLiteral(N:! Rational(BigInt))`.

Co-authored-by: Chandler Carruth <chandlerc@gmail.com>
Co-authored-by: Jon Meow <46229924+jonmeow@users.noreply.github.com>
  • Loading branch information
3 people authored Sep 24, 2021
1 parent 3b11b8d commit b9619db
Showing 1 changed file with 318 additions and 0 deletions.
318 changes: 318 additions & 0 deletions proposals/p0144.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,318 @@
# Numeric literal semantics

<!--
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
Exceptions. See /LICENSE for license information.
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-->

[Pull request](https://github.com/carbon-language/carbon-lang/pull/144)

## Table of contents

<!-- toc -->

## Table of contents

- [Problem](#problem)
- [Background](#background)
- [Proposal](#proposal)
- [Details](#details)
- [Prelude support](#prelude-support)
- [Implicit conversions](#implicit-conversions)
- [Examples](#examples)
- [Alternatives considered](#alternatives-considered)
- [Use an ordinary integer or floating-point type for literals](#use-an-ordinary-integer-or-floating-point-type-for-literals)
- [Use same type for all literals](#use-same-type-for-all-literals)
- [Allow leading `-` in literal tokens](#allow-leading---in-literal-tokens)

<!-- tocstop -->

## Problem

When a numeric literal appears in a program, we need to understand its
semantics:

- What type does it have?
- What value is produced by operations on it?
- When can it validly be used to initialize an object?

## Background

In C++, numeric literals have either an integral type or a floating-point type.
C++ provides permission for implementations to add extended integral types, but
in practice (for bad reasons relating to `intmax_t`) implementations do not do
so, so there are a small finite set of types that any given numeric literal
might have:

- `int`, `long`, `long long`, or `unsigned` versions of these
- `float`, `double`, or `long double`

The choice of type is determined solely by the literal.

The C++ approach is error-prone and problematic:

- Lossy conversions from literals in initializers are permitted.
- Lossy operations on literals are permitted; for example, on a typical
implementation, `1 << 60` has value `0` because `1` is a 32-bit type.
- Attempting to naturally express some values has undefined behavior; for
example, `int x = -2147483648;` typically results in undefined behavior even
when -2147483648 is a valid `int` value.
- Integer literals with value 0 have special semantics that are lost when the
integer is passed to a function: "perfect" forwarding doesn't work for such
literals.
- The built-in types are privileged: only the types listed above have
literals. There is no syntax for a 64-bit integer literal, only for (for
example) a `long int` literal, which may or may not 64 bits wide.
- The type of a literal can be unpredictable in portable code, as it can
depend on which type a particular value happens to fit into.

## Proposal

Numeric literals have a type derived from their value, and can be converted to
any type that can represent that value.

Simple operations such as arithmetic that involve only literals also produce
values of literal types.

## Details

Numeric literals have a type derived from their value. Two integer literals have
the same type if and only if they represent the same integer. Two real number
literals have the same type if and only if they represent the same real number.

That is:

- For every integer, there is a type representing literals with that integer
value.
- For every rational number, there is a type representing literals with that
real value.
- The types for real numbers are distinct from the types for integers, even
for real numbers that represent integers. `var x: i32 = 1.0;` is invalid.

Primitive operators are available between numeric literals, and produce values
with numeric literal types. For example, the type of `1 + 2` is the same as the
type of `3`.

Numeric types can provide conversions to support initialization from numeric
literals. Because the value of the literal is carried in the type, a type-level
decision can be made as to whether the conversion is valid.

The integer types defined in the standard library permit conversion from integer
literal types whose values are representable in the integer type. The
floating-point types defined in the Carbon library permit conversion from
integer and rational literal types whose values are between the minimum and
maximum finite value representable in the floating-point type.

### Prelude support

The following types are defined in the Carbon prelude:

- An arbitrary-precision integer type.

```
class BigInt;
```

- A rational type, parameterized by a type used for its numerator and
denominator.

```
class Rational(T:! Type);
```

The exact constraints on `T` are not yet decided.

- A type representing integer literals.

```
class IntLiteral(N:! BigInt);
```

- A type representing floating-point literals.

```
class FloatLiteral(X:! Rational(BigInt));
```

All of these types are usable during compilation. `BigInt` supports the same
operations as `Int(n)`. `Rational(T)` supports the same operations as
`Float(n)`.

The types `IntLiteral(n)` and `FloatLiteral(x)` also support primitive integer
and floating-point operations such as arithmetic and comparison, but these
operations are typically heterogeneous: for example, an addition between
`IntLiteral(n)` and `IntLiteral(m)` produces a value of type
`IntLiteral(n + m)`.

### Implicit conversions

`IntLiteral(n)` converts to any sufficiently large integer type, as if by:

```
impl [template N:! BigInt, template M:! BigInt]
IntLiteral(N) as ImplicitAs(Int(M))
if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
...
}
impl [template N:! BigInt, template M:! BigInt]
IntLiteral(N) as ImplicitAs(Unsigned(M))
if N >= Int(M).MinValue as BigInt and N <= Int(M).MaxValue as BigInt {
...
}
```

The above is for exposition purposes only; various parts of this syntax are not
yet decided.

Similarly, `IntLiteral(x)` and `FloatLiteral(x)` convert to any sufficiently
large floating-point type, and produce the nearest representable floating-point
value. Conversions in which `x` lies exactly half-way between two values are
rejected, as
[previously decided](/docs/design/lexical_conventions/numeric_literals.md#ties).
Conversions in which `x` is outside the range of finite values of the
floating-point type are also represented, rather than saturating to the finite
range or producing an infinity.

### Examples

```carbon
// This is OK: the initializer is of the integer literal type with value
// -2147483648 despite being written as a unary `-` applied to a literal.
var x: i32 = -2147483648;
// This initializes y to 2^60.
var y: i64 = 1 << 60;
// This forms a rational literal whose value is one third, and converts it to
// the nearest representable value of type `f64`.
var z: f64 = 1.0 / 3.0;
// This is an error: 300 cannot be represented in type `i8`.
var c: i8 = 300;
fn f[template T:! Type](v: T) {
var x: i32 = v * 2;
}
// OK: x = 2_000_000_000.
f(1_000_000_000);
// Error: 4_000_000_000 can't be represented in type `i32`.
f(2_000_000_000);
// No storage required for the bound when it's of integer literal type.
struct Span(template T:! Type, template BoundT:! Type) {
var begin: T*;
var bound: BoundT;
}
// Returns 1, because 1.3 can implicitly convert to f32, even though conversion
// to f64 might be a more exact match.
fn G() -> i32 {
match (1.3) {
case _: f32 => { return 1; }
case _: f64 => { return 2; }
}
}
// Can only be called with a literal 0.
fn PassMeZero(_: IntLiteral(0));
// Can only be called with integer literals in the given range.
fn ConvertToByte[template N:! BigInt](_: IntLiteral(N)) -> i8
if N >= -128 and N <= 127 {
return N as i8;
}
// Given any int literal, produces a literal whose value is one higher.
fn OneHigher(L: IntLiteral(template _:! BigInt)) -> auto {
return L + 1;
}
// Error: 256 can't be represented in type `i8`.
var v: i8 = OneHigher(255);
```

## Alternatives considered

### Use an ordinary integer or floating-point type for literals

We could decide on a fixed-width type based on the form of the literal, for
example using a type suffix with some rules to determine what type to pick for
unsuffixed literals.

Advantages:

- This follows what C++ does.
- Can determine the type of a floating-point number without requiring
contextual information.

Disadvantages:

- Surprising behavior when applying an operator to a literal would result in
overflow. Even if we diagnose this, a diagnostic that `-2147483648` is
invalid because it overflows is surprising.
- Creates additional literal syntax that users will need to understand.
- May select types that don't match the programmer's expectations.
- Whatever types we pick are privileged.

### Use same type for all literals

We could give literals a single, arbitrary-precision type (say, `Integer` for
integer literals and `Rational` for real literals).

Advantages:

- Only introduces two new types, not an unbounded parameterized family of
types.
- Writing a function that takes any integer literal can be done with more
obvious syntax and less syntactic overhead. Instead of:
```
fn OneHigher(L: IntLiteral(template _:! BigInt));
```
we could write
```
fn OneHigher(template L:! Integer);
```
However, with this proposal, a function taking any integer expression that
can be evaluated to a constant can be written as
```
fn F(template N:! BigInt);
```
and such a function would accept all integer literals, as well as
non-literal constants.

Disadvantages:

- Our mechanism for specifying the behavior of operations such as arithmetic
is based on interface implementations, which are looked up by type.
Supporting `impl` selection based on values would introduce substantial
complexity.
- If we introduce an arbitrary-precision integer type, it would be
inconsistent to support it only during compilation. However, if we allow its
use at runtime, programs may use it accidentally, with an invisible
performance cost. For example, `var x: auto = 123;` would result in `x`
having an infinite-precision type, possibly involving invisible dynamic
allocation.
- Under this proposal, the type of `x` is a type that can only represent
the value `123`; as such, `x` is effectively immutable. The
arbitrary-precision integer type introduced in this proposal can only be
used explicitly by programs naming it.

### Allow leading `-` in literal tokens

We could treat a leading `-` character as part of a numeric literal token, so
that -- for example -- `-123` would be a single `-123` token rather than a unary
negation applied to a literal `123`.

Advantages:

- This would narrowly solve the problem that `INT_MIN` cannot be written
directly, without any of the other implications of this proposal.

Disadvantages:

- Makes the behavior of unary `-` less uniform.
- Prevents the introduction of infix or postfix operators that bind more
tightly than unary `-`, such as an infix exponentiation operator: `-2**2`
may be expected to evaluate to -4, not to +4.

0 comments on commit b9619db

Please sign in to comment.