diff --git a/docs/design/expressions/README.md b/docs/design/expressions/README.md index 2eb454590df9b..9654c3e1a5c4b 100644 --- a/docs/design/expressions/README.md +++ b/docs/design/expressions/README.md @@ -11,7 +11,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception ## Table of contents - [Overview](#overview) -- [Implicit conversions](#implicit-conversions) +- [Conversions and casts](#conversions-and-casts) @@ -29,12 +29,15 @@ fn Foo(a: i32*) -> i32 { Here, the parameter type `i32*`, the return type `i32`, and the operand `*a` of the `return` statement are all expressions. -## Implicit conversions +## Conversions and casts When an expression appears in a context in which an expression of a specific type is expected, [implicit conversions](implicit_conversions.md) are applied to convert the expression to the target type. +Expressions can also be converted to a specific type using an +[`as` expression](as_expressions.md). + ``` fn Bar(n: i32); fn Baz(n: i64) { diff --git a/docs/design/expressions/as_expressions.md b/docs/design/expressions/as_expressions.md new file mode 100644 index 0000000000000..c775ed9e9077a --- /dev/null +++ b/docs/design/expressions/as_expressions.md @@ -0,0 +1,178 @@ +# `as` expressions + + + + + +## Table of contents + +- [Overview](#overview) +- [Precedence and associativity](#precedence-and-associativity) +- [Built-in types](#built-in-types) + - [Data types](#data-types) + - [Compatible types](#compatible-types) +- [Extensibility](#extensibility) +- [Alternatives considered](#alternatives-considered) +- [References](#references) + + + +## Overview + +An expression of one type can be explicitly cast to another type by using an +`as` expression: + +``` +var n: i32 = Get(); +var f: f32 = n as f32; +``` + +An `as` expression can be used to perform any implicit conversion, either when +the context does not imply a destination type or when it is valuable to a reader +of the code to make the conversion explicit. In addition, `as` expressions can +perform safe conversions that nonetheless should not be performed implicitly, +such as lossy conversions or conversions that lose capabilities or change the +way a type would be interpreted. + +As guidelines, an `as` conversion should be permitted when: + +- The conversion is _complete_: it produces a well-defined output value for + each input value. +- The conversion is _unsurprising_: the resulting value is the expected value + in the destination type. + +For example: + +- A conversion from `fM` to `iN` is not complete, because it is not defined + for input values that are out of the range of the destination type, such as + infinities or, if `N` is too small, large finite values. +- A conversion from `iM` to `iN`, where `N` < `M`, is either not complete or + not unsurprising, because there is more than one possible expected behavior + for an input value that is not within the destination type, and those + behaviors are not substantially the same -- we could perform two's + complement wrapping, saturate, or produce undefined behavior analogous to + arithmetic overflow. +- A conversion from `iM` to `fN` can be unsurprising, because even though + there may be a choice of which way to round, the possible values are + substantially the same. + +It is possible for user-defined types to [extend](#extensibility) the set of +valid explicit casts that can be performed by `as`. Such extensions are expected +to follow these guidelines. + +## Precedence and associativity + +`as` expressions are non-associative. + +``` +var b: bool = true; +// OK +var n: i32 = (b as i1) as i32; +var m: auto = b as (bool as Hashable); +// Error, ambiguous +var m: auto = b as T as U; +``` + +The `as` operator has lower precedence than operators that visually bind +tightly: + +- prefix symbolic operators + - dereference (`*a`) + - negation (`-a`) + - complement (`~a`) +- postfix symbolic operators + - pointer type formation (`T*`), + - function call (`a(...)`), + - array indexing (`a[...]`), and + - member access (`a.m`). + +The `as` operator has higher precedence than assignment and comparison. It is +unordered with respect to binary arithmetic, bitwise operators, and unary `not`. + +``` +// OK +var x: i32* as Comparable; +// OK, `x as (U*)` not `(x as U)*`. +var y: auto = x as U*; + +var a: i32; +var b: i32; +// OK, `(a as i64) < ((*x) as i64)`. +if (a as i64 < *x as i64) {} +// Ambiguous: `(a + b) as i64` or `a + (b as i64)`? +var c: i32 = a + b as i64; +// Ambiguous: `(a as i64) + b` or `a as (i64 + b)`? +var d: i32 = a as i64 + b; + +// OK, `(-a) as f64`, not `-(a as f64)`. +// Unfortunately, the former is undefined if `a` is `i32.MinValue()`; +// the latter is not. +var u: f64 = -a as f64; + +// OK, `i32 as (GetType())`, not `(i32 as GetType)()`. +var e: i32 as GetType(); +``` + +## Built-in types + +### Data types + +In addition to the [implicit conversions](implicit_conversions.md#data-types), +the following numeric conversions are supported by `as`: + +- `iN`, `uN`, or `fN` -> `fM`, for any `N` and `M`. Values that cannot be + exactly represented are suitably rounded to one of the two nearest + representable values. Very large finite values may be rounded to an + infinity. NaN values are converted to NaN values. + +- `bool` -> `iN` or `uN`. `false` converts to `0` and `true` converts to `1` + (or to `-1` for `i1`). + +Conversions from numeric types to `bool` are not supported with `as`; instead of +using `as bool`, such conversions can be performed with `!= 0`. + +Lossy conversions between `iN` or `uN` and `iM` or `uM` are not supported with +`as`, and similarly conversions from `fN` to `iM` are not supported. + +**Future work:** Add mechanisms to perform these conversions. + +### Compatible types + +The following conversion is supported by `as`: + +- `T` -> `U` if `T` is + [compatible](../generics/terminology.md#compatible-types) with `U`. + +**Future work:** We may need a mechanism to restrict which conversions between +adapters are permitted and which code can perform them. Some of the conversions +permitted by this rule may only be allowed in certain contexts. + +## Extensibility + +Explicit casts can be defined for user-defined types such as +[classes](../classes.md) by implementing the `As` interface: + +``` +interface As(Dest:! Type) { + fn Convert[me: Self]() -> Dest; +} +``` + +The expression `x as U` is rewritten to `x.(As(U).Convert)()`. + +## Alternatives considered + +- [Do not distinguish between safe and unsafe casts](/docs/proposals/p0845.md#merge-as-and-assume_as) +- [Do not distinguish between `as` and implicit conversions](/docs/proposals/p0845.md#as-only-performs-implicit-conversions) +- [Allow `iN as bool`](/docs/proposals/p0845.md#integer-to-bool-conversions) +- [Disallow `bool as iN`](/docs/proposals/p0845.md#bool-to-integer-conversions) + +## References + +- [Implicit conversions in C++](https://en.cppreference.com/w/cpp/language/implicit_conversion) +- Proposal + [#845: `as` expressions](https://github.com/carbon-language/carbon-lang/pull/845). diff --git a/docs/design/expressions/implicit_conversions.md b/docs/design/expressions/implicit_conversions.md index e591252e0815d..460829e973cac 100644 --- a/docs/design/expressions/implicit_conversions.md +++ b/docs/design/expressions/implicit_conversions.md @@ -21,7 +21,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Pointer conversions](#pointer-conversions) - [Pointer conversion examples](#pointer-conversion-examples) - [Type-of-types](#type-of-types) -- [Semantics](#semantics) +- [Consistency with `as`](#consistency-with-as) - [Extensibility](#extensibility) - [Alternatives considered](#alternatives-considered) - [References](#references) @@ -245,37 +245,33 @@ can be implicitly converted to the type-of-type `TT2` if `T` [satisfies the requirements](../generics/details.md#subtyping-between-type-of-types) of `TT2`. -## Semantics +## Consistency with `as` An implicit conversion of an expression `E` of type `T` to type `U`, when -permitted, always has the same meaning as the explicit cast expression `E as U`. -Moreover, such an implicit conversion is expected to exactly preserve the value. -For example, `(E as U) as T`, if valid, should be expected to result in the same -value as produced by `E`. - -**Note:** The explicit cast expression syntax has not yet been decided. The use -of `E as T` in this document is provisional. +permitted, always has the same meaning as the +[explicit cast expression `E as U`](as_expressions.md). Moreover, because such +an implicit conversion is expected to exactly preserve the value, +`(E as U) as T`, if valid, should be expected to result in the same value as +produced by `E` even if the `as T` cast cannot be performed as an implicit +conversion. ## Extensibility Implicit conversions can be defined for user-defined types such as -[classes](../classes.md) by implementing the `ImplicitAs` interface: +[classes](../classes.md) by implementing the `ImplicitAs` interface, which +extends +[the `As` interface used to implement `as` expressions](as_expressions.md#extensibility): ``` -interface As(Dest:! Type) { - fn Convert[me: Self]() -> Dest; +interface ImplicitAs(Dest:! Type) extends As(Dest) { + // Inherited from As(Dest): + // fn Convert[me: Self]() -> Dest; } -interface ImplicitAs(Dest:! Type) extends As(Dest) {} ``` When attempting to implicitly convert an expression `x` to type `U`, the expression is rewritten to `x.(ImplicitAs(U).Convert)()`. -**Note:** The `As` interface is intended to be used as the implementation -vehicle for explicit casts: `x as U` would be rewritten as -`x.(As(U).Convert)()`. However, the explicit cast expression syntax has not yet -been decided, so this rewrite is provisional. - Note that implicit conversions are not transitive. Even if an `impl A as ImplicitAs(B)` and an `impl B as ImplicitAs(C)` are both provided, an expression of type `A` cannot be implicitly converted to type `C`. Allowing diff --git a/docs/design/generics/details.md b/docs/design/generics/details.md index ccd829f9a4f15..d08dffa27c0b9 100644 --- a/docs/design/generics/details.md +++ b/docs/design/generics/details.md @@ -1572,7 +1572,7 @@ addition to using the same data representation, they both implement one interface, `Hashable`, and use the same implementation for that interface. The one difference between them is that `Song as Hashable` may be implicitly converted to `Song`, which implements interface `Printable`, and -`PlayableSong as Hashable` may be implicilty converted to `PlayableSong`, which +`PlayableSong as Hashable` may be implicitly converted to `PlayableSong`, which implements interface `Media`. This means that it is safe to convert between `HashMap(Song, Int) == HashMap(Song as Hashable, Int)` and `HashMap(PlayableSong, Int) == HashMap(PlayableSong as Hashable, Int)` (though diff --git a/docs/design/lexical_conventions/words.md b/docs/design/lexical_conventions/words.md index 52e5823f2ff37..98e7546022933 100644 --- a/docs/design/lexical_conventions/words.md +++ b/docs/design/lexical_conventions/words.md @@ -37,6 +37,7 @@ The following words are interpreted as keywords: - `alias` - `and` - `api` +- `as` - `auto` - `base` - `break` diff --git a/proposals/p0845.md b/proposals/p0845.md new file mode 100644 index 0000000000000..dce8cdbab9dc9 --- /dev/null +++ b/proposals/p0845.md @@ -0,0 +1,401 @@ +# `as` expressions + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/845) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) +- [Future work](#future-work) + - [Provide a mechanism for unsafe conversions](#provide-a-mechanism-for-unsafe-conversions) + - [Casting operator for conversions with domain restrictions](#casting-operator-for-conversions-with-domain-restrictions) +- [Alternatives considered](#alternatives-considered) + - [Allow `as` to perform some unsafe conversions](#allow-as-to-perform-some-unsafe-conversions) + - [Allow `as` to perform two's complement truncation](#allow-as-to-perform-twos-complement-truncation) + - [`as` only performs implicit conversions](#as-only-performs-implicit-conversions) + - [Integer to bool conversions](#integer-to-bool-conversions) + - [Bool to integer conversions](#bool-to-integer-conversions) + + + +## Problem + +We would like to provide a notation for the following operations: + +- Requesting a type conversion in order to select an operation to perform, or + to resolve an ambiguity between possible operations: + ``` + fn Ratio(a: i32, b: i32) -> f64 { + // Note that a / b would invoke a different / operation. + return a / (b as f64); + } + ``` +- Specifying the type that an expression will have or will be converted into, + for documentation purposes. + ``` + class Thing { + var id: i32; + } + fn PrintThing(t: Thing) { + // 'as i32' reminds the reader what type we're printing. + Print(t.id as i32); + } + ``` +- Specifying the type that an expression is expected to have, potentially + after implicit conversions, as a form of static assertion. + ``` + fn Munge() { + // I expect this expression to produce a Widget but I'm getting compiler + // errors and I'd like to narrow down why. + F(Some().Complex().Expression() as Widget); + } + ``` + +In general, the developer wants to specify that an expression should be +considered to produce a value of a particular type, and that type might be more +general than the type of the expression, the same as the type of the expression, +or perhaps might represent a different way of viewing the value. + +The first of the above problems is especially important in Carbon due to the use +of facet types for generics. Explicit conversions of types to interfaces will be +necessary in order to select the meaning of operations, because the same member +name on different facet types for the same underlying type will in general have +different meanings. + +For this proposal, the following are out of scope: + +- Requesting a type conversion that changes the value, such as by truncation. +- Converting a value to a narrower type or determining whether such a + conversion is possible -- `try_as` or `as?` operations. + +## Background + +C++ provides a collection of different kinds of casts and conversions from an +expression `x` to a type `T`: + +- Copy-initialization: `T v = x;` +- Direct-initialization: `T v(x);` +- Named casts: + - `static_cast(x)` + - `const_cast(x)` + - `reinterpret_cast(x)` + - `dynamic_cast(x)` +- C-style casts: `T(x)` or equivalently `(T)x` + - These can do anything that `static_cast`, `const_cast`, and + `reinterpret_cast` can do, but ignore access control on base classes. +- List-initialization: `T{x}` + - This can do anything that implicit conversion can do, and can also + initialize a single -- real or notional -- subobject of `T`. + - Narrowing conversions are disallowed. + +These conversions are all different, and each of them has some surprising or +unsafe behavior. + +Swift provides four forms of type casting operation: + +- `x as T` performs a conversion from subtype to supertype. + - `pattern as T` in a pattern matching context converts a pattern that + matches a subtype to a pattern that matches a supertype, by performing a + runtime type test. This effectively results in a checked supertype to + subtype conversion. +- `x as! T` performs a conversion from supertype to subtype, with the + assumption that the value inhabits the subtype. +- `x as? T` performs a conversion from supertype to subtype, producing an + `Optional`. +- `T(x)` and similar construction expressions are used to convert between + types without a subtyping relationship, such as between integer and + floating-point types. + +In Swift, `x as T` is always unsurprising and safe. + +Rust provides the following: + +- `x as T` performs a conversion to type `T`. + - When there is no corresponding value, some specified value is produced: + this conversion will perform 2's complement truncation on integers and + will saturate when converting large floating-point values to integers. + - Conversions between distinct pointer types, and between pointers and + integers, are permitted. Rust treats accesses through pointers as + unsafe, but not pointer arithmetic or casting. + +This cast can perform some conversions with surprising results, such as integer +truncation. It can also have surprising performance implications, because it +defines the behavior of converting an out-of-range value -- for example, when +converting from floating-point to integer -- in ways that aren't supported +across all modern targets. + +Haskell and Scala support type ascription notation, `x : T`. This has also been +proposed for Rust. This notation constrains the type checker to find a type for +the expression `x` that is consistent with `T`, and is used: + +- for documentation purposes, +- to guide the type checker to select a particular meaning of the code in the + presence of ambiguity, and +- as a diagnostic tool when attempting to understand type inference failures. + +## Proposal + +Carbon provides a binary `as` operator. + +`x as T` performs an unsurprising and safe conversion from `x` to type `T`. + +- This can be used to perform any implicit conversion explicitly. As in Swift, + this can therefore be used to convert from subtype to supertype. +- This can also be used to perform an unsurprising and safe conversion that + cannot be performed implicitly because it's lossy, such as from `i32` to + `f32`. + +This operator does not perform conversions with domain restrictions, such as +converting from `f32` to `i64`, where sufficiently large values can't be +converted. It does not perform operations in which there are multiple different +reasonable interpretations, such as converting from `i64` to `i32`, where a +two's complement truncation might sometimes be reasonable but where the intent +is more likely that it is an error to convert a value that does not fit into an +`i32`. + +See changes to the design for details. + +## Rationale based on Carbon's goals + +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + - Providing only unsurprising built-in `as` conversions, and encouraging + user-defined types to do the same, makes code easier to understand. +- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + - Syntactically distinguishing between always-safe `as` conversions and + potentially-unsafe conversions being performed by other syntax makes it + clearer which code should be the subject of more scrutiny when reasoning + about safety. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - The `As` interface provides the same functionality as single-argument + `explicit` constructors and `explicit` conversion functions in C++, and + can be used to expose those operations for interoperability purposes and + as a replacement for those operations during migration. + +## Future work + +### Provide a mechanism for unsafe conversions + +We need to provide additional conversions beyond those proposed for `as`. In +particular, to supply the same set of conversions as C++, we would need at least +the following conversions that don't match the rules for `as`: + +Conversions with a domain restriction: + +- Conversions from pointer-to-supertype to pointer-to-subtype. +- Conversions from floating-point to integer types that assume the input is + in-range. +- (Not in C++.) Conversions between any two integer types that assume the + input is in-range. + +Conversions that modify some values: + +- Truncating conversions between any two integer types. + +Conversions that reinterpret values: + +- Conversions between arbitrary pointer types. +- Conversions between integers and pointers. +- Bit-casts between arbitrary, sufficiently-trivial types of the same size. + +Special cases: + +- Some analogue of `dynamic_cast`. +- Some analogue of `const_cast`. + +We will need to decide which of these we wish to provide -- in particular, +depending on our plans for mutability and RTTI, `const_cast` and `dynamic_cast` +may or may not be appropriate. + +For the operations we do supply, we could provide either named functions or +dedicated language syntax. While this proposal suggests that the `as` operator +should not be the appropriate language syntax for the above cases, that decision +should be revisited once we have more information from examining the +alternatives. + +#### Casting operator for conversions with domain restrictions + +We could provide an additional casting operator, such as `assume_as` or +`unsafe_as`, to model conversions that have a domain restriction, such as +`i64 -> i32` or `f32 -> i64` or `Base*` -> `Derived*`. + +Advantage: + +- Provides additional important but unsafe functionality. +- Gives this functionality the appearance of being a central language feature. +- Separates safe conversions from unsafe ones. + +Disadvantage: + +- Increases complexity. +- The connection between these conversions may not be obvious, and the kind + and amount of unsafety in practice differs substantially between them. + +If we don't follow this direction, we will need to provide these operations by +another mechanism, such as named function calls. + +## Alternatives considered + +### Allow `as` to perform some unsafe conversions + +We could provide a single type-casting operator that can perform some +conversions that have a domain restriction, treating values out of range as +programming errors. + +One particularly appealing option would be to permit `as` to convert freely +between integer and floating-point types, but not permit it to convert from +supertype to subtype. + +Advantage: + +- Developers many not want to be reminded about the possibility of overflow in + conversions to integer types. +- This would make `as` more consistent with arithmetic operations, which will + likely have no overt indication that they're unsafe in the presence of + integer overflow. +- If we don't do this, then code mixing differently-sized types will need to + use a syntactic notation other than `as`, even if all conversions remain + in-bounds. If such code is common, as it is in C++ (for example, when mixing + `int` and `size_t`), developers may become accustomed to using that "assume + in range" notation and not consider it to be a warning sign, thereby eroding + the advantage of using a distinct notation. + +Disadvantage: + +- If we allow this conversion, there would be no clear foundation for which + conversions can be performed by `as` and which cannot in general. +- An `as` expression would be less suitable for selecting which operation to + perform if it can be unsafe. +- Under maintenance, every usage of `as` would need additional scrutiny + because it's not in general a safe operation. +- This risks being surprising to developers coming from C and C++ where + integer type conversions are always safe. + +The choice to not provide these operations with `as` is experimental, and should +be revisited when we have more information about the design of integer types and +their behavior. + +### Allow `as` to perform two's complement truncation + +We could allow `as` to convert between any two integer types, performing a two's +complement conversion between these types. + +Advantage: + +- Familiar to developers from C++ and various other systems programming + languages. + +Disadvantage: + +- Makes `as` conversions have behavior that diverges from the behavior of + arithmetic, where we expect at least signed overflow to be considered a + programming error rather than being guaranteed to wrap around. +- Introducing a common and easy notation for conversion with wraparound means + that this notation will also be used in the -- likely much more common -- + case of wanting to truncate a value that is already known to be in-bounds. + Compared to having distinct notation for these two operations: + - This removes the ability to distinguish between programming errors due + to overflow and intentional wraparound by using the same syntax for + both, both for readers of the code and for automated checks in debugging + builds. + - This removes the ability to optimize on the basis of knowing that a + value is expected to be in-bounds when performing a narrowing + conversion. + +The choice to not provide these operations with `as` is experimental, and should +be revisited when we have more information about the design of integer types and +their behavior. + +### `as` only performs implicit conversions + +We could limit `as` to performing only implicit conversions. This would mean +that `as` cannot perform lossy conversions. + +Advantage: + +- One fewer set of rules for developers to be aware of. + +Disadvantage: + +- Converting between integer and floating-point types is common, and providing + built-in syntax for it seems valuable. + +### Integer to bool conversions + +We could allow a conversion of integer types (and perhaps even floating-point +types) to `bool`, converting non-zero values to `true` and converting zeroes to +`false`. + +Advantage: + +- This treatment of non-zero values as being "truthy" and zero values as being + "falsy" is familiar to developers of various other languages. +- Uniform treatment of types that can be notionally converted to a Boolean + value may be useful in templates and generics in some cases. + +Disadvantage: + +- The lossy treatment of all non-zero values as being "truthy" is somewhat + arbitrary and can be confusing. +- An `as bool` conversion is less clear to a reader than a `!= 0` test. +- An `as bool` conversion is more verbose than a `!= 0` test. + +### Bool to integer conversions + +We could disallow conversions from `bool` to `iN` types. + +Advantage: + +- More clearly demarcates the intended semantics of `bool` as a truth value + rather than as a number. +- Avoids making a choice as to whether `true` should map to 1 (zero-extension) + or -1 (sign-extension). + - But there is a strong established convention of using 1. +- Such conversions are a known source of bugs, especially when performed + implicitly. `as` conversions will likely be fairly common and routine in + Carbon code due to their use in generics. As such, they may be written + without much thought and not given much scrutiny in code review. + ``` + var found: bool = false; + var total_found: i32 = 0; + for (var (key: i32, value: i32) in list) { + if (key == expected) { + found = true; + total_found += value; + } + } + // Include an explicit `as i64` to emphasize that we're widening the + // total at this point. + // Bug: meant to pass `total_found` not `found` here. + add_to_total(found as i64); + ``` + +Disadvantage: + +- Removes a sometimes-useful operation for which there isn't a similarly terse + alternative expression form. + - But we could add a member function `b.AsBit()` if we wanted. +- Does not expose the intended connection between the `bool` type and bits. + +We could disallow conversion from `bool` to `i1`. + +Advantage: + +- Avoids a surprising behavior where this conversion converts `true` to -1 + whereas all others convert `true` to 1. + +Disadvantage: + +- Results in non-uniform treatment of conversion from `bool`, and an awkward + special case that may get in the way of generics. +- A conversion from `bool` that produces -1 for a `true` value is useful when + producing a mask, for example in `(b as i1) as u32`.