diff --git a/docs/design/README.md b/docs/design/README.md new file mode 100644 index 0000000000000..20ee3ca6c925b --- /dev/null +++ b/docs/design/README.md @@ -0,0 +1,896 @@ +# Language design overview + + + +## Table of contents + + + +- [Context and disclaimer](#context-and-disclaimer) + - [Example code](#example-code) +- [Basic syntax](#basic-syntax) + - [Code and comments](#code-and-comments) + - [Files, libraries, and packages](#files-libraries-and-packages) + - [Names and scopes](#names-and-scopes) + - [Naming conventions](#naming-conventions) + - [Aliases](#aliases) + - [Name lookup](#name-lookup) + - [Name lookup for common types](#name-lookup-for-common-types) + - [Expressions](#expressions) + - [Functions](#functions) + - [Blocks and statements](#blocks-and-statements) + - [Variables](#variables) + - [Lifetime and move semantics](#lifetime-and-move-semantics) + - [Control flow](#control-flow) + - [`if`/`else`](#ifelse) + - [`while`, `break`, and `continue`](#while-break-and-continue) + - [`return`](#return) +- [Types](#types) + - [Primitive types](#primitive-types) + - [Composite types](#composite-types) + - [Tuples](#tuples) + - [Variants](#variants) + - [Pointers and references](#pointers-and-references) + - [Arrays and slices](#arrays-and-slices) + - [User-defined types](#user-defined-types) + - [Structs](#structs) + - [Allocation, construction, and destruction](#allocation-construction-and-destruction) + - [Assignment, copying, and moving](#assignment-copying-and-moving) + - [Comparison](#comparison) + - [Implicit and explicit conversion](#implicit-and-explicit-conversion) + - [Inline type composition](#inline-type-composition) + - [Unions](#unions) +- [Pattern matching](#pattern-matching) + - [`match` control flow](#match-control-flow) + - [Pattern matching in local variables](#pattern-matching-in-local-variables) + - [Pattern matching as function overload resolution](#pattern-matching-as-function-overload-resolution) +- [Type abstractions](#type-abstractions) + - [Interfaces](#interfaces) + - [Generics](#generics) + - [Templates](#templates) + - [Types with template parameters](#types-with-template-parameters) + - [Functions with template parameters](#functions-with-template-parameters) + - [Overloading](#overloading) +- [Metaprogramming](#metaprogramming) +- [Execution abstractions](#execution-abstractions) + - [Abstract machine and execution model](#abstract-machine-and-execution-model) + - [Lambdas](#lambdas) + - [Co-routines](#co-routines) +- [Bidirectional interoperability with C/C++](#bidirectional-interoperability-with-cc) + + + +## Context and disclaimer + +Eventually, this document hopes to provide a high-level overview of the design +of the Carbon language. It should summarize the key points across the different +aspects of the language design and link to more detailed and comprehensive +design documents to expand on specific aspects of the design. That means it +isn't and doesn't intend to be complete or stand on its own. Notably, it doesn't +attempt to provide detailed and comprehensive justification for design +decisions. Those should instead be provided by the dedicated and focused designs +linked to from here. However, it should provide an overarching view of the +design and a good basis for diving into specific details. + +However, these are extremely early days for Carbon. Currently, this document +tries to capture two things: + +1. Initial musings about what _might_ make sense as a basis for Carbon. These + are largely informed by idle discussions between C++ and Clang developers + over the years, and should not be given any particular weight. +2. A summary and snapshot of in-progress efforts to flesh out and motivate + specific designs for parts of the language. + +The utility of capturing these at this early stage is primarily to give everyone +a reasonably consistent set of terminology and context as we begin fleshing out +concrete (and well justified) designs for each part of the language. In some +cases, it captures ideas that may be interesting to explore, but isn't meant to +overly anchor on them. Any ideas here need to be fully explored and justified +with a detailed analysis. The context of #1 (directly evolving C++, experience +building Clang, and experience working on C++ codebases including Clang and LLVM +themselves) is also important. It is both an important signal but also a bias. + +### Example code + +In order to keep example code consistent, we are making choices that may change +later. In particular, where `$` is shown in examples, it is a placeholder: `$` +is a well-known bad symbol due to international keyboard layouts, and will be +cleaned up during evolution. + +## Basic syntax + +### Code and comments + +> References: [Lexical conventions](lexical_conventions.md) +> +> **TODO:** References need to be evolved. + +- All source code is UTF-8 encoded text. For simplicity, no other encoding is + supported. +- Line comments look like `// ...`. However, they are required to be the only + non-whitespace on the line for readability. +- Block comments look like `//\{ ... //\}`, with each marker on its own line. + Nested block comments are supported using named regions. For example: + + ```carbon + live code + //\{ + commented code + //\{ nested block + commented code in nested block + //\} nested block + //\} + live code + ``` + +### Files, libraries, and packages + +> References: [Files, libraries and packages](files_libraries_and_packages.md) +> +> **TODO:** References need to be evolved. + +Carbon code is organized into files, libraries, and packages: + +- A **file** is the unit of compilation. +- A **library** can be made up of multiple files, and is the unit whose public + interface can be imported. +- A **package** is a collection of one or more libraries, typically ones with + a single common source and with some close association. + +A file belongs to precisely one library, and a library belongs to precisely one +package. + +Files have a `.6c` extension. They must start with a declaration of their +package and library. They may import both other libraries from within their +package, as well as libraries from other packages. For example: + +```carbon +// This is a file in the "Eucalyptus" library of the "Koala" package. +package Koala library Eucalyptus; + +// Import the "Wombat" library from the "Widget" package. +import Widget library Wombat; + +// Import the "Container" library from the "Koala" package. +import library Container; +``` + +### Names and scopes + +> References: [Lexical conventions](lexical_conventions.md) +> +> **TODO:** References need to be evolved. + +Various constructs introduce a named entity in Carbon. These can be functions, +types, variables, or other kinds of entities that we'll cover. A name in Carbon +is always formed out of an "identifier", or a sequence of letters, numbers, and +underscores which starts with a letter. As a regular expression, this would be +`/[a-zA-Z][a-zA-Z0-9_]*/`. Eventually we may add support for more unicode +characters as well. + +#### Naming conventions + +> References: [Naming conventions](naming_conventions.md) +> +> **TODO:** References need to be evolved. + +Our current proposed naming convention are: + +- `UpperCamelCase` for names of compile-time resolved constants, whether they + participate in the type system or not. +- `lower_snake_case` for keywords and names of run-time resolved values. + +As a matter of style and consistency, we will follow these conventions where +possible and encourage convergence. + +For example: + +- An integer that is a compile-time constant sufficient to use in the + construction a compile-time array size, such as a template function + parameter, might be named `N`. +- A generic function parameter's value can't be used during type-checking, but + might still be named `N`, since it will be a constant available to the + compiler at code generation time. +- Functions and most types will be in `UpperCamelCase`. +- A type where only run-time type information queries are available would end + up as `lower_snake_case`. +- A keyword like `import` uses `lower_snake_case`. + +#### Aliases + +> References: [Aliases](aliases.md) +> +> **TODO:** References need to be evolved. + +Carbon provides a facility to declare a new name as an alias for a value. This +is a fully general facility because everything is a value in Carbon, including +types. + +For example: + +```carbon +alias MyInt = Int; +``` + +This creates an alias called `MyInt` for whatever `Int` resolves to. Code +textually after this can refer to `MyInt`, and it will transparently refer to +`Int`. + +#### Name lookup + +> References: [Name lookup](name_lookup.md) +> +> **TODO:** References need to be evolved. + +Names are always introduced into some scope which defines where they can be +referenced. Many of these scopes are themselves named. `namespace` is used to +introduce a dedicated named scope, and we traverse nested names in a uniform way +with `.`-separated names. Unqualified name lookup will always find a file-local +result, including aliases. + +For example: + +```carbon +package Koala library Eucalyptus; + +namespace Leaf { + namespace Vein { + fn Count() -> Int; + } +} +``` + +`Count` may be referred to as: + +- `Count` from within the `Vein` namespace. +- `Vein.Count` from within the `Leaf` namespace. +- `Leaf.Vein.Count` from within this file. +- `Koala.Leaf.Vein.Count` from any arbitrary location. + +Note that libraries do **not** introduce a scope; they share the scope of their +package. + +##### Name lookup for common types + +> References: [Name lookup](name_lookup.md) +> +> **TODO:** References need to be evolved. + +Common types that we expect to be used universally will be provided for every +file, including `Int` and `Bool`. These will likely be defined in a special +"prelude" package. + +### Expressions + +> References: [Lexical conventions](lexical_conventions.md) and +> [operators](operators.md) +> +> **TODO:** References need to be evolved. + +Expressions describe some computed value. The simplest example would be a +literal number like `42`: an expression that computes the integer value 42. + +Some common expressions in Carbon include: + +- Literals: `42`, `3.1419`, `"Hello World!"` +- Operators: + + - Increment and decrement: `++i`, `--j` + - These do not return any result. + - Unary negation: `-x` + - Arithmetic: `1 + 2`, `3 - 4`, `2 * 5`, `6 / 3` + - Bitwise: `2 & 3`, `2 | 4`, `3 ^ 1`, `~7` + - Bit shift: `1 << 3`, `8 >> 1` + - Comparison: `2 == 2`, `3 != 4`, `5 < 6`, `7 > 6`, `8 <= 8`, `8 >= 8` + - Logical: `a and b`, `c or d` + +- Parenthesized expressions: `(7 + 8) * (3 - 1)` + +### Functions + +> References: [Functions](functions.md) and +> [syntactic conventions](syntactic_conventions.md) +> +> **TODO:** References need to be evolved. + +Functions are the core unit of behavior. For example: + +```carbon +fn Sum(Int: a, Int: b) -> Int; +``` + +Breaking this apart: + +- `fn` is the keyword used to indicate a function. +- Its name is `Sum`. +- It accepts two `Int` parameters, `a` and `b`. +- It returns an `Int` result. + +You would call this function like `Sum(1, 2)`. + +### Blocks and statements + +> References: [Blocks and statements](blocks_and_statements.md) +> +> **TODO:** References need to be evolved. + +The body or definition of a function is provided by a block of code containing +statements. The body of a function is also a new, nested scope inside the +function's scope, meaning that parameter names are available. + +Statements within a block are terminated by a semicolon. Each statement can, +among other things, be an expression. + +For example, here is a function definition using a block of statements, one of +which is nested: + +```carbon +fn Foo() { + Bar(); + { + Baz(); + } +} +``` + +### Variables + +> References: [Variables](variables.md) and +> [syntactic conventions](syntactic_conventions.md) +> +> **TODO:** References need to be evolved. + +Blocks introduce nested scopes and can contain local variable declarations that +work similarly to function parameters. + +For example: + +```carbon +fn Foo() { + var Int: x = 42; +} +``` + +Breaking this apart: + +- `var` is the keyword used to indicate a variable. +- Its name is `x`. +- Its type is `Int`. +- It is initialized with the value `42`. + +### Lifetime and move semantics + +> References: TODO +> +> **TODO:** References need to be evolved. + +### Control flow + +> References: [Control flow](control_flow.md) +> +> **TODO:** References need to be evolved. + +Blocks of statements are generally executed sequentially. However, statements +are the primary place where this flow of execution can be controlled. + +#### `if`/`else` + +> References: [Control flow](control_flow.md) +> +> **TODO:** References need to be evolved. + +`if` and `else` are common flow control keywords, which can result in +conditional execution of statements. + +For example: + +```carbon +fn Foo(Int: x) { + if (x < 42) { + Bar(); + } else if (x > 77) { + Baz(); + } +} +``` + +Breaking the `Foo` function apart: + +- `Bar()` is invoked if `x` is less than `42`. +- `Baz()` is invoked if `x` is greater than `77`. +- Nothing happens if `x` is between `42` and `77`. + +#### `while`, `break`, and `continue` + +> References: [Control flow](control_flow.md) +> +> **TODO:** References need to be evolved. + +Loops will be supported with a low-level primitive `while` statement. `break` +will be a way to exit the `while` directly, while `continue` will skip the rest +of the current loop iteration. + +For example: + +```carbon +fn Foo() { + var Int: x = 0; + while (x < 42) { + if (ShouldStop()) break; + if (ShouldSkip(x)) { + ++x; + continue; + } + Bar(x); + ++x; + } +} +``` + +Breaking the `Foo` function apart: + +- The while body is normally executed for all values of `x` in [0, 42). + - The increment of x at the end causes this. +- If `ShouldStop()` returns true, the `break` causes the `while` to exit + early. +- If `ShouldSkip()` returns true, the `continue` causes the `while` to restart + early. +- Otherwise, `Bar(x)` is called for values of `x` in [0, 42). + +#### `return` + +> References: [Control flow](control_flow.md) +> +> **TODO:** References need to be evolved. + +The `return` statement ends the flow of execution within a function, returning +execution to the caller. If the function returns a value to the caller, that +value is provided by an expression in the return statement. This allows us to +complete the definition of our `Sum` function from earlier as: + +```carbon +fn Sum(Int: a, Int: b) -> Int { + return a + b; +} +``` + +## Types + +> References: [Primitive types](primitive_types.md), [tuples](tuples.md), and +> [structs](structs.md) +> +> **TODO:** References need to be evolved. + +Carbon's core types are broken down into three categories: + +- Primitive types +- Composite types +- User-defined types + +The first two are intrinsic and directly built in the language. The last aspect +of types allows for defining new types. + +Expressions compute values in Carbon, and these values are always strongly typed +much like in C++. However, an important difference from C++ is that types are +themselves modeled as values; specifically, compile-time constant values. +However, in simple cases this doesn't make much difference. + +### Primitive types + +> References: [Primitive types](primitive_types.md) +> +> **TODO:** References need to be evolved. + +These types are fundamental to the language as they aren't either formed from or +modifying other types. They also have semantics that are defined from first +principles rather than in terms of other operations. These will be made +available through the [prelude package](#name-lookup-for-common-types). + +Primitive types fall into the following categories: + +- `Void` - a type with only one possible value: empty. +- `Bool` - a boolean type with two possible values: `True` and `False`. +- `Int` and `UInt` - signed and unsigned 64-bit integer types. + - Standard sizes are available, both signed and unsigned, including + `Int8`, `Int16`, `Int32`, `Int128`, and `Int256`. + - Overflow in either direction is an error. +- `Float64` - a floating point type with semantics based on IEEE-754. + - Standard sizes are available, including `Float16`, `Float32`, and + `Float128`. + - [`BFloat16`](primitive_types.md#bfloat16) is also provided. +- `String` - a byte sequence treated as containing UTF-8 encoded text. + - `StringView` - a read-only reference to a byte sequence treated as + containing UTF-8 encoded text. + +### Composite types + +#### Tuples + +> References: [Tuples](tuples.md) +> +> **TODO:** References need to be evolved. + +The primary composite type involves simple aggregation of other types as a +tuple. In formal type theory, tuples are product types. + +An example use of tuples is: + +```carbon +fn DoubleBoth(Int: x, Int: y) -> (Int, Int) { + return (2 * x, 2 * y); +} +``` + +Breaking this example apart: + +- The return type is a tuple of two `Int` types. +- The expression uses tuple syntax to build a tuple of two `Int` values. + +Both of these are expressions using the tuple syntax +`(, )`. The only difference is the type of the tuple +expression: one is a tuple of types, the other a tuple of values. + +Element access uses subscript syntax: + +```carbon +fn DoubleTuple((Int, Int): x) -> (Int, Int) { + return (2 * x[0], 2 * x[1]); +} +``` + +Tuples also support multiple indices and slicing to restructure tuple elements: + +```carbon +// This reverses the tuple using multiple indices. +fn Reverse((Int, Int, Int): x) -> (Int, Int, Int) { + return x[2, 1, 0]; +} + +// This slices the tuple by extracting elements [0, 2). +fn RemoveLast((Int, Int, Int): x) -> (Int, Int) { + return x[0 .. 2]; +} +``` + +#### Variants + +> **TODO:** Needs a feature design and a high level summary provided inline. + +#### Pointers and references + +> **TODO:** Needs a feature design and a high level summary provided inline. + +#### Arrays and slices + +> **TODO:** Needs a feature design and a high level summary provided inline. + +### User-defined types + +#### Structs + +> References: [Structs](structs.md) +> +> **TODO:** References need to be evolved. + +`struct`s are a way for users to define their own data strutures or named +product types. + +For example: + +```carbon +struct Widget { + var Int: x; + var Int: y; + var Int: z; + + var String: payload; +} +``` + +Breaking apart `Widget`: + +- `Widget` has three `Int` members: `x`, `y`, and `z`. +- `Widget` has one `String` member: `payload`. +- Given an instance `dial`, a member can be referenced with `dial.paylod`. + +More advanced `struct`s may be created: + +```carbon +struct AdvancedWidget { + // Do a thing! + fn DoSomething(AdvancedWidget: self, Int: x, Int: y); + + // A nested type. + struct Nestedtype { + // ... + } + + private var Int: x; + private var Int: y; +} + +fn Foo(AdvancedWidget: thing) { + thing.DoSomething(1, 2); +} +``` + +Breaking apart `AdvancedWidget`: + +- `AdvancedWidget` has a public object method `DoSomething`. + - `DoSomething` explicitly indicates how the `AdvancedWidget` is passed to + it, and there is no automatic scoping - `self` must be specified as the + first input. The `self` name is also a keyword that explains how to + invoke this method on an object. + - `DoSomething` accepts `AdvancedWidget` _by value_, which is easily + expressed here along with other constraints on the object parameter. +- `AdvancedWidget` has two private data members: `x` and `y`. + - Private methods and data members are restricted to use by + `AdvancedWidget` only, providing a layer of easy validation of the most + basic interface constraints. +- `Nestedtype` is a nested type, and can be accessed as + `AdvancedWidget.Nestedtype`. + +##### Allocation, construction, and destruction + +> **TODO:** Needs a feature design and a high level summary provided inline. + +##### Assignment, copying, and moving + +> **TODO:** Needs a feature design and a high level summary provided inline. + +##### Comparison + +> **TODO:** Needs a feature design and a high level summary provided inline. + +##### Implicit and explicit conversion + +> **TODO:** Needs a feature design and a high level summary provided inline. + +##### Inline type composition + +> **TODO:** Needs a feature design and a high level summary provided inline. + +#### Unions + +> **TODO:** Needs a detailed design and a high level summary provided inline. + +## Pattern matching + +> References: [Pattern matching](pattern_matching.md) +> +> **TODO:** References need to be evolved. + +The most prominent mechanism to manipulate and work with types in Carbon is +pattern matching. This may seem like a deviation from C++, but in fact this is +largely about building a clear, coherent model for a fundamental part of C++: +overload resolution. + +### `match` control flow + +> References: [Pattern matching](pattern_matching.md) +> +> **TODO:** References need to be evolved. + +`match` is a control flow similar to `switch` of C/C++ and mirrors similar +constructs in other languages, such as Swift. + +An example `match` is: + +```carbon +fn Bar() -> (Int, (Float, Float)); + +fn Foo() -> Float { + match (Bar()...) { + case (42, (Float: x, Float: y)) => { + return x - y; + } + case (Int: p, (Float: x, Float: _)) if (p < 13) => { + return p * x; + } + case (Int: p, auto: _) if (p > 3) => { + return p * Pi; + } + default => { + return Pi; + } + } +} +``` + +Breaking apart this `match`: + +- It accepts a value that will be inspected; in this case, the result of the + call to `Bar()`. + - It then will find the _first_ `case` that matches this value, and + execute that block. + - If none match, then it executes the default block. +- Each `case` pattern contains a value pattern, such as `(Int: p, auto: _)`, + followed by an optional boolean predicate introduced by the `if` keyword. + - The value pattern must first match, and then the predicate must also + evaluate to true for the overall `case` pattern to match. + - Using `auto` for a type will always match. + +Value patterns may be composed of the following: + +- An expression, such as `42`, whose value must be equal to match. +- An optional type, such as `Int`, followed by a `:` and an identifier to bind + the value. + - The special identifier `_` may be used to discard the value once + matched. +- A destructuring pattern containing a sequence of value patterns, such as + `(Float: x, Float: y)`, which match against tuples and tuple-like values by + recursively matching on their elements. +- An unwrapping pattern containing a nested value pattern which matches + against a variant or variant-like value by unwrapping it. + +### Pattern matching in local variables + +> References: [Pattern matching](pattern_matching.md) +> +> **TODO:** References need to be evolved. + +Value patterns may be used when declaring local variables to conveniently +destructure them and do other type manipulations. However, the patterns must +match at compile time, so a boolean predicate cannot be used directly. + +An example use is: + +```carbon +fn Bar() -> (Int, (Float, Float)); +fn Foo() -> Int { + var (Int: p, auto: _) = Bar(); + return p; +} +``` + +To break this apart: + +- The `Int` returned by `Bar()` matches and is bound to `p`, then returned. +- The `(Float, Float)` returned by `Bar()` matches and is discarded by + `auto: _`. + +### Pattern matching as function overload resolution + +> References: [Pattern matching](pattern_matching.md) +> +> **TODO:** References need to be evolved. Needs a detailed design and a high +> level summary provided inline. + +## Type abstractions + +### Interfaces + +> **TODO:** Needs a feature design and a high level summary provided inline. + +### Generics + +> **TODO:** Needs a feature design and a high level summary provided inline. + +### Templates + +> References: [Templates](templates.md) +> +> **TODO:** References need to be evolved. + +Carbon templates follow the same fundamental paradigm as C++ templates: they are +instantiated when called, resulting in late type checking, duck typing, and lazy +binding. Although generics are generally preferred, templates enable translation +of code between C++ and Carbon, and address some cases where the type checking +rigor of generics are problematic. + +#### Types with template parameters + +> References: [Templates](templates.md) +> +> **TODO:** References need to be evolved. + +User-defined types may have template parameters. The resulting type-function may +be used to instantiate the parameterized definition with the provided arguments +in order to produce a complete type. For example: + +```carbon +struct Stack(Type:$$ T) { + var Array(T): storage; + + fn Push(T: value); + fn Pop() -> T; +} +``` + +Breaking apart the template use in `Stack`: + +- `Stack` is a paremeterized type accepting a type `T`. +- `T` may be used within the definition of `Stack` anywhere a normal type + would be used, and will only be type checked on instantiation. +- `var Array(T)` instantiates a parameterized type `Array` when `Stack` is + instantiated. + +#### Functions with template parameters + +> References: [Templates](templates.md) +> +> **TODO:** References need to be evolved. + +Both implicit and explicit function parameters in Carbon can be marked as +_template_ parameters. When called, the arguments to these parameters trigger +instantiation of the function definition, fully type checking and resolving that +definition after substituting in the provided (or computed if implicit) +arguments. The runtime call then passes the remaining arguments to the resulting +complete definition. + +```carbon +fn Convert[Type:$$ T](T: source, Type:$$ U) -> U { + var U: converted = source; + return converted; +} + +fn Foo(Int: i) -> Float { + // Instantiates with the `T` implicit argument set to `Int` and the `U` + // explicit argument set to `Float`, then calls with the runtime value `i`. + return Convert(i, Float); +} +``` + +Here we deduce one type parameter and explicitly pass another. It is not +possible to explicitly pass a deduced type parameter; instead the call site +should cast or convert the argument to control the deduction. In this particular +example, the explicit type is passed after a runtime parameter. While this makes +that type unavailable to the declaration of _that_ runtime parameter, it still +is a _template_ parameter and available to use as a type in the remaining parts +of the function declaration. + +#### Overloading + +> References: [Templates](templates.md) +> +> **TODO:** References need to be evolved. + +An important feature of templates in C++ is the ability to customize how they +end up specialized for specific arguments. Because template parameters (whether +as type parameters or function parameters) are pattern matched, we expect to +leverage pattern matching techniques to provide "better match" definitions that +are selected analogously to specializations in C++ templates. When expressed +through pattern matching, this may enable things beyond just template parameter +specialization, but that is an area that we want to explore cautiously. + +> **TODO:** lots more work to flesh this out needs to be done... + +## Metaprogramming + +> References: [Metaprogramming](metaprogramming.md) +> +> **TODO:** References need to be evolved. Needs a detailed design and a high +> level summary provided inline. + +Carbon provides metaprogramming facilities that look similar to regular Carbon +code. These are structured, and do not offer arbitrary inclusion or +preprocessing of source text such as C/C++ does. + +## Execution abstractions + +Carbon provides some higher-order abstractions of program execution, as well as +the critical underpinnings of such abstractions. + +### Abstract machine and execution model + +> **TODO:** Needs a feature design and a high level summary provided inline. + +### Lambdas + +> **TODO:** Needs a feature design and a high level summary provided inline. + +### Co-routines + +> **TODO:** Needs a feature design and a high level summary provided inline. + +## Bidirectional interoperability with C/C++ + +> References: +> [Bidirectional interoperability with C/C++](interoperability/README.md) +> +> **TODO:** References need to be evolved. Needs a detailed design and a high +> level summary provided inline. diff --git a/docs/design/aliases.md b/docs/design/aliases.md new file mode 100644 index 0000000000000..fd638ee081849 --- /dev/null +++ b/docs/design/aliases.md @@ -0,0 +1,49 @@ +# Aliases + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Alternatives](#alternatives) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Naming is one of the things that most often requires careful management over +time -- things tend to get renamed and moved around. + +Carbon provides a fully general name aliasing facility to declare a new name as +an alias for a value; everything is a value in Carbon. This is a fully general +facility because everything is a value in Carbon, including types. + +For example: + +``` +alias MyInt = Int; +``` + +This creates an alias called `MyInt` for whatever `Int` resolves to. Code +textually after this can refer to `MyInt`, and it will transparently refer to +`Int`. + +### Alternatives + +The syntax here is not at all in a good state yet. We've considered a few +alternatives, but they all end up being confusing in some way. We need to figure +out a good and clean syntax that can be used here. diff --git a/docs/design/blocks_and_statements.md b/docs/design/blocks_and_statements.md new file mode 100644 index 0000000000000..29be2319873f1 --- /dev/null +++ b/docs/design/blocks_and_statements.md @@ -0,0 +1,51 @@ +# Blocks and statements + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +The body or definition of a function is provided by a block of code containing +statements, much like in C or C++. The body of a function is also a new, nested +scope inside the function's scope (meaning that parameter names are available). +Statements within a block are terminated by a semicolon. Each statement can, +among other things, be an expression. Here is a trivial example of a function +definition using a block of statements: + +``` +fn Foo() { + Bar(); + Baz(); +} +``` + +Statements can also themselves be a block of statements, which provide scopes +and nesting: + +``` +fn Foo() { + Bar(); + { + Baz(); + } +} +``` diff --git a/docs/design/control_flow.md b/docs/design/control_flow.md new file mode 100644 index 0000000000000..ffb1da57081e3 --- /dev/null +++ b/docs/design/control_flow.md @@ -0,0 +1,80 @@ +# Control flow + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) +- [Open questions](#open-questions) + - [`if` blocks](#if-blocks) + - [`break` and `continue`](#break-and-continue) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +At least summarize `if` and `else` to cover basics. Especially important to +surface the idea of using basic conditionals as both expressions and statements +to avoid needing conditional operators. + +Looping is an especially interesting topic to explore as there are lots of +challenges posed by the C++ loop structure. Even C++ itself has been seeing +significant interest and pressure to improve its looping facilities. + +## Overview + +Blocks of statements are generally executed linearly. However, statements are +the primary place where this flow of execution can be controlled. Carbon's +control flow constructs are mostly similar to those in C, C++, and other +languages. + +``` +fn Foo(Int: x) { + if (x < 42) { + Bar(); + } else if (x > 77) { + Baz(); + } +} +``` + +Loops will at least be supported with a low-level primitive `while` statement, +with `break` and `continue` statements which work the same as in C++. + +Last but not least, for the basics we need to include the `return` statement. +This statement ends the flow of execution within a function, returning it to the +caller. If the function returns a value to the caller, that value is provided by +an expression in the return statement. This allows us to complete the definition +of our `Sum` function from earlier as: + +``` +fn Sum(Int: a, Int: b) -> Int { + return a + b; +} +``` + +## Open questions + +### `if` blocks + +It is an open question whether a block is required or a single statement may be +nested in an `if` statement. Similarly, it is an open question whether `else if` +is a single keyword versus a nested `if` statement, and if it is a single +construct whether it should be spelled `elif` or something else. + +### `break` and `continue` + +If and how to support a "labeled break" or "labeled continue" is still a point +of open discussion. diff --git a/docs/design/files_libraries_and_packages.md b/docs/design/files_libraries_and_packages.md new file mode 100644 index 0000000000000..556704ad80a21 --- /dev/null +++ b/docs/design/files_libraries_and_packages.md @@ -0,0 +1,43 @@ +# Files, libraries, and packages + + + +## Table of contents + + + +- [TODO](#todo) +- [Alternatives](#alternatives) + - [File extensions](#file-extensions) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Alternatives + +### File extensions + +The use of `6c` as a short file extension or top-level CLI (with subcommands +below it similar to `git` or `go`) has some drawbacks. There are several other +possible extensions / commands: + +- `cb`: This collides with several acronyms and may not be especially + memorable as referring to Carbon. +- `c6`: This seems a weird incorrect ordering of the atomic number and has a + bad (if _extremely_ obscure) Internet slang association (NSFW, use caution + if searching, as with too much Internet slang). +- `carbon`: This is an obvious and unsurprising choice, but also quite long. + +This seems fairly easy for us to change as we go along, but we should at some +point do a formal proposal to gather other options and let the core team try to +find the set that they feel is close enough to be a bikeshed. diff --git a/docs/design/functions.md b/docs/design/functions.md new file mode 100644 index 0000000000000..2b2c89f3ef532 --- /dev/null +++ b/docs/design/functions.md @@ -0,0 +1,63 @@ +# Functions + + + +## Table of contents + + + +- [TODO](#todo) +- [Basic functions](#basic-functions) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Basic functions + +Programs written in Carbon, much like those written in other languages, are +primarily divided up into "functions" (or "procedures", "subroutines", or +"subprograms"). These are the core unit of behavior for the programming +language. Let's look at a simple example to understand how these work: + +``` +fn Sum(Int: a, Int: b) -> Int; +``` + +This declares a function called `Sum` which accepts two `Int` parameters, the +first called `a` and the second called `b`, and returns an `Int` result. C++ +might declare the same thing: + +``` +std::int64_t Sum(std::int64_t a, std::int64_t b); + +// Or with trailing return type syntax: +auto Sum(std::int64_t a, std::int64_t b) -> std::int64_t; +``` + +Let's look at how some specific parts of this work. The function declaration is +introduced with a keyword `fn` followed by the name of the function `Sum`. This +declares that name in the surrounding scope and opens up a new scope for this +function. We declare the first parameter as `Int: a`. The `Int` part is an +expression (here referring to a constant) that computes the type of the +parameter. The `:` marks the end of the type expression and introduces the +identifier for the parameter, `a`. The parameter names are introduced into the +function's scope and can be referenced immediately after they are introduced. +The return type is indicated with `-> Int`, where again `Int` is just an +expression computing the desired type. The return type can be completely omitted +in the case of functions which do not return a value. + +Calling functions involves a new form of expression: `Sum(1, 2)` for example. +The first part, `Sum`, is an expression referring to the name of the function. +The second part, `(1, 2)` is a parenthesized list of arguments to the function. +The juxtaposition of one expression with parentheses forms the core of a call +expression, similar to a postfix operator. diff --git a/docs/design/interoperability/README.md b/docs/design/interoperability/README.md new file mode 100644 index 0000000000000..2377a775d3940 --- /dev/null +++ b/docs/design/interoperability/README.md @@ -0,0 +1,28 @@ +# Bidirectional interoperability with C/C++ + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +See the draft in +[PR 80](https://github.com/carbon-language/carbon-lang/pull/80). diff --git a/docs/design/lexical_conventions.md b/docs/design/lexical_conventions.md new file mode 100644 index 0000000000000..6960d0b0a85e1 --- /dev/null +++ b/docs/design/lexical_conventions.md @@ -0,0 +1,25 @@ +# Lexical conventions + + + +## Table of contents + + + +- [TODO](#todo) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +See [PR 17](https://github.com/carbon-language/carbon-lang/pull/17) for context +-- that proposal may replace this. diff --git a/docs/design/metaprogramming.md b/docs/design/metaprogramming.md new file mode 100644 index 0000000000000..bb8e4e969ddba --- /dev/null +++ b/docs/design/metaprogramming.md @@ -0,0 +1,33 @@ +# Metaprogramming + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +See +[carbon-proposals PR 89](https://github.com/carbon-language/carbon-proposals/pull/89) +for context -- that proposal may replace this. + +## Overview + +Carbon provides metaprogramming facilities that look similar to regular Carbon +code. These are structured, and do not offer inclusion or arbitrary +preprocessing of source text such as C/C++ does. diff --git a/docs/design/name_lookup.md b/docs/design/name_lookup.md new file mode 100644 index 0000000000000..ef39f1db7379f --- /dev/null +++ b/docs/design/name_lookup.md @@ -0,0 +1,91 @@ +# Name lookup + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Unqualified name lookup](#unqualified-name-lookup) + - [Alternatives](#alternatives) + - [Name lookup for common, standard types](#name-lookup-for-common-standard-types) +- [Open questions](#open-questions) + - [Shadowing](#shadowing) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Names are always introduced into some scope which defines where they can be +referenced. Many of these scopes are themselves named. Carbon has a special +facility for introducing a dedicated named scope just like C++, but we traverse +nested names in a uniform way with `.`-separated names: + +``` +namespace Foo { + namespace Bar { + alias ??? MyInt = Int; + } +} + +fn F(Foo.Bar.MyInt: x); +``` + +Carbon packages are also namespaces so to get to an imported name from the +`Abseil` package you would write `Abseil.Foo`. The "top-level" file scope is +that of the Carbon package containing the file, meaning that there is no +"global" scope. Dedicated namespaces can be reopened within a package, but there +is no way to reopen a package without being a library and file _within_ that +package. + +Note that libraries (unlike packages) do **not** introduce a scope, they share +the scope of their package. This is based on the observation that in practice, a +fairly coarse scoping tends to work best, with some degree of global registry to +establish a unique package name. + +### Unqualified name lookup + +Unqualified name lookup in Carbon will always find a file-local result, other +than the implicit "prelude" of importing and aliasing the fundamentals of the +standard library. There will be an explicit mention of the name in the file that +declares the name in the current or enclosing scope, which must also precede the +reference. + +#### Alternatives + +This implies that other names within your own package but not declared within +the file must be found via the package name. It isn't clear if this is the +desirable end state. We need to consider alternatives where names from the same +library or any library in the same package are made immediately visible within +the package scope for unqualified name lookup. + +### Name lookup for common, standard types + +The Carbon standard library is in the `Carbon` package. A very small subset of +this standard library is provided implicitly in every file's scope. This is +called the "prelude" package. + +Names in the prelude package will be available without scoping names. For +example, `Bool` will be the commonly used name in code, even though the +underlying type may be `Carbon::Bool`. Also, no `import` will be necessary to +use `Bool`. + +## Open questions + +### Shadowing + +We can probably disallow the use of shadowed unqualified names, but the actual +design for such needs to be thought through. diff --git a/docs/design/naming_conventions.md b/docs/design/naming_conventions.md new file mode 100644 index 0000000000000..860762db78b51 --- /dev/null +++ b/docs/design/naming_conventions.md @@ -0,0 +1,65 @@ +# Naming conventions + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +We would like to have widespread and consistent naming conventions across Carbon +code to the extent possible. This is for the same core reason as naming +conventions are provided in most major style guides. Even migrating existing C++ +code at-scale presents a significant opportunity to converge even more broadly +and we're interested in pursuing this if viable. + +Our current proposed naming convention, which we at least are attempting to +follow within Carbon documentation in order to keep code samples as consistent +as possible, is: + +- `UpperCamelCase` for names of compile-time resolved constants, such that + they can participate in the type system and type checking of the program. + Comple-time constants fall into two categories: + - _Template_ constants that can be used in type checking, including + literals. + - _Generic_ constants whose value is not used in type checking, but will + be used as part of code generation. +- `lower_snake_case` for names of run-time resolved values. + +As an example, an integer that is a compile-time constant sufficient to use in +the construction a compile-time array size might be named `N`, where an integer +that is not available as part of the type system would be named `n`, even if it +happened to be immutable or only take on a single value. Functions and most +types will be in `UpperCamelCase`, but a type where only run-time type +information queries are available would end up as `lower_snake_case`. + +We only use `UpperCamelCase` and `lower_snake_case` (skipping other variations +on both snake-case and camel-case naming conventions) because these two have the +most significant visual separation. For example, the value of adding +`lowerCamelCase` for another set seems low given the small visual difference +provided; in particular, one-word identifiers would have no difference. + +The rationale for the specific division between the two isn't a huge or +fundamental concept, but it stems from a convention in Ruby where constants are +named with a leading capital letter. The idea is that it mirrors the English +language capitalization of proper nouns: the name of a constant refers to a +_specific_ value that is precisely resolved at compile time, not just to _some_ +value. For example, there are many different _shires_ in Britain, but Frodo +comes from the _Shire_ -- a specific fictional region. diff --git a/docs/design/operators.md b/docs/design/operators.md new file mode 100644 index 0000000000000..8655f31fde29f --- /dev/null +++ b/docs/design/operators.md @@ -0,0 +1,26 @@ +# Operators + + + +## Table of contents + + + +- [TODO](#todo) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +See +[carbon-proposals PR 5](https://github.com/carbon-language/carbon-proposals/pull/5) +for context -- that proposal may replace this. diff --git a/docs/design/pattern_matching.md b/docs/design/pattern_matching.md new file mode 100644 index 0000000000000..67049b22f0d62 --- /dev/null +++ b/docs/design/pattern_matching.md @@ -0,0 +1,125 @@ +# Pattern matching + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Pattern match control flow](#pattern-match-control-flow) + - [Pattern matching in local variables](#pattern-matching-in-local-variables) +- [Open questions](#open-questions) + - [Slice or array nested value pattern matching](#slice-or-array-nested-value-pattern-matching) + - [Generic/template pattern matching](#generictemplate-pattern-matching) + - [Pattern matching as function overload resolution](#pattern-matching-as-function-overload-resolution) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +The most prominent mechanism to manipulate and work with types in Carbon is +pattern matching. This may seem like a deviation from C++, but in fact this is +largely about building a clear, coherent model for a fundamental part of C++: +overload resolution. + +### Pattern match control flow + +The most powerful form and easiest to explain form of pattern matching is a +dedicated control flow construct that subsumes the `switch` of C and C++ into +something much more powerful, `match`. This is not a novel construct, and is +widely used in existing languages (Swift and Rust among others) and is currently +under active investigation for C++. Carbon's `match` can be used as follows: + +``` +fn Bar() -> (Int, (Float, Float)); +fn Foo() -> Float { + match (Bar()) { + case (42, (Float: x, Float: y)) => { + return x - y; + } + case (Int: p, (Float: x, Float: _)) if (p < 13) => { + return p * x; + } + case (Int: p, auto: _) if (p > 3) => { + return p * Pi; + } + default => { + return Pi; + } + } +} +``` + +There is a lot going on here. First, let's break down the core structure of a +`match` statement. It accepts a value that will be inspected, here the result of +the call to `Bar()`. It then will find the _first_ `case` that matches this +value, and execute that block. If none match, then it executes the default +block. + +Each `case` contains a pattern. The first part is a value pattern +(`(Int: p, auto: _)` for example) followed by an optional boolean predicate +introduced by the `if` keyword. The value pattern has to match, and then the +predicate has to evaluate to true for the overall pattern to match. Value +patterns can be composed of the following: + +- An expression (`42` for example), whose value must be equal to match. +- An optional type (`Int` for example), followed by a `:` and either an + identifier to bind to the value or the special identifier `_` to discard the + value once matched. +- A destructuring pattern containing a sequence of value patterns + (`(Float: x, Float: y)`) which match against tuples and tuple like values by + recursively matching on their elements. +- An unwrapping pattern containing a nested value pattern which matches + against a variant or variant-like value by unwrapping it. + +In order to match a value, whatever is specified in the pattern must match. +Using `auto` for a type will always match, making `auto: _` the wildcard +pattern. + +### Pattern matching in local variables + +Value patterns may be used when declaring local variables to conveniently +destructure them and do other type manipulations. However, the patterns must +match at compile time which is why the boolean predicate cannot be used +directly. + +``` +fn Bar() -> (Int, (Float, Float)); +fn Foo() -> Int { + var (Int: p, auto: _) = Bar(); + return p; +} +``` + +This extracts the first value from the result of calling `Bar()` and binds it to +a local variable named `p` which is then returned. + +## Open questions + +### Slice or array nested value pattern matching + +An open question is how to effectively fit a "slice" or "array" pattern into +nested value pattern matching, or whether we shouldn't do so. + +### Generic/template pattern matching + +An open question is going beyond a simple "type" to things that support generics +and/or templates. + +### Pattern matching as function overload resolution + +Need to flesh out specific details of how overload selection leverages the +pattern matching machinery, what (if any) restrictions are imposed, etc. diff --git a/docs/design/primitive_types.md b/docs/design/primitive_types.md new file mode 100644 index 0000000000000..b9373b138335e --- /dev/null +++ b/docs/design/primitive_types.md @@ -0,0 +1,102 @@ +# Primitive types + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Integers](#integers) + - [Floats](#floats) + - [BFloat16](#bfloat16) +- [Open questions](#open-questions) + - [Primitive types as code vs built-in](#primitive-types-as-code-vs-built-in) + - [String view vs owning string](#string-view-vs-owning-string) + - [Syntax for wrapping operations](#syntax-for-wrapping-operations) + - [Non-power-of-two sizes](#non-power-of-two-sizes) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +These types are fundamental to the language as they aren't either formed from or +modifying other types. They also have semantics that are defined from first +principles rather than in terms of other operations. These will be made +available through the [prelude package](README.md#name-lookup-for-common-types). + +- `Void` - a type with only one possible value: empty. +- `Bool` - a boolean type with two possible values: `True` and `False`. +- `Int` and `UInt` - signed and unsigned 64-bit integer types. + - Standard sizes are available, both signed and unsigned, including + `Int8`, `Int16`, `Int32`, `Int128`, and `Int256`. + - Overflow in either direction is an error. +- `Float64` - a floating point type with semantics based on IEEE-754. + - Standard sizes are available, including `Float16`, `Float32`, and + `Float128`. + - [`BFloat16`](primitive_types.md#bfloat16) is also provided. +- `String` - a byte sequence treated as containing UTF-8 encoded text. + - `StringView` - a read-only reference to a byte sequence treated as + containing UTF-8 encoded text. + +### Integers + +Integer types can be either signed or unsigned, much like in C++. Signed +integers are represented using 2's complement and notionally modeled as +unbounded natural numbers. Overflow in either direction is an error. That +includes unsigned integers, differing from C++. The default size for both is +64-bits: `Int` and `UInt`. Specific sizes are also available, for example: +`Int8`, `Int16`, `Int32`, `Int128`, `UInt256`. Arbitrary powers of two above `8` +are supported for both (although perhaps we'll want to avoid _huge_ values for +implementation simplicity). + +### Floats + +Floating point types are based on the binary floating point formats provided by +IEEE-754. `Float16`, `Float32`, `Float64` and `Float128` correspond exactly to +those sized IEEE-754 formats, and have the semantics defined by IEEE-754. + +### BFloat16 + +Carbon also supports the +`[BFloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)` +format, a 16-bit truncation of a "binary32" IEEE-754 format floating point +number. + +## Open questions + +### Primitive types as code vs built-in + +There are open questions about the extent to which these types should be defined +in Carbon code rather than special. Clearly they can't be directly implemented +w/o help, but it might still be useful to force the programmer-observed +interface to reside in code. However, this can cause difficulty with avoiding +the need to import things gratuitously. + +### String view vs owning string + +The right model of a string view vs. an owning string is still very much +unsettled. + +### Syntax for wrapping operations + +Open question around allowing special syntax for wrapping operations (even on +signed types) and/or requiring such syntax for wrapping operations on unsigned +types. + +### Non-power-of-two sizes + +Supporting non-power-of-two sizes is likely needed to have a clean model for +bitfields, but requires more details to be worked out around memory access. diff --git a/docs/design/structs.md b/docs/design/structs.md new file mode 100644 index 0000000000000..d120b341e7d73 --- /dev/null +++ b/docs/design/structs.md @@ -0,0 +1,113 @@ +# Structs + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) +- [Open questions](#open-questions) + - [`self` type](#self-type) + - [Default access control level](#default-access-control-level) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Beyond simple tuples, Carbon of course allows defining named product types. This +is the primary mechanism for users to extend the Carbon type system and +fundamentally is deeply rooted in C++ and its history (C and Simula). We simply +call them `struct`s rather than other terms as it is both familiar to existing +programmers and accurately captures their essence: they are a mechanism for +structuring data: + +``` +struct Widget { + var Int: x; + var Int: y; + var Int: z; + + var String: payload; +} +``` + +Most of the core features of structures from C++ remain present in Carbon, but +often using different syntax: + +``` +struct AdvancedWidget { + // Do a thing! + fn DoSomething(AdvancedWidget: self, Int: x, Int: y); + + // A nested type. + struct NestedType { + // ... + } + + private var Int: x; + private var Int: y; +} + +fn Foo(AdvancedWidget: thing) { + thing.DoSomething(1, 2); +} +``` + +Here we provide a public object method and two private data members. The method +explicitly indicates how the object parameter is passed to it, and there is no +automatic scoping - you have to use `self` here. The `self` name is also a +keyword, though, that explains how to invoke this method on an object. This +member function accepts the object _by value_, which is easily expressed here +along with other constraints on the object parameter. Private members work the +same as in C++, providing a layer of easy validation of the most basic interface +constraints. + +The type itself is a compile-time constant value. All name access is done with +the `.` notation. Constant members (including member types and member functions +which do not need an implicit object parameter) can be accessed via that +constant: `AdvancedWidget.NestedType`. Other members and member functions +needing an object parameter (or "methods") must be accessed from an object of +the type. + +Some things in C++ are notably absent or orthogonally handled: + +- No need for `static` functions, they simply don't take an initial `self` + parameter. +- No `static` variables because there are no global variables. Instead, can + have scoped constants. + +## Open questions + +### `self` type + +Requiring the type of `self` makes method declarations quite verbose. Unclear +what is the best way to mitigate this, there are many options. One is to have a +special `Self` type. + +It may be interesting to consider separating the `self` syntax from the rest of +the parameter pattern as it doesn't seem necessary to inject all of the special +rules (covariance vs. contravariance, special pointer handling) for `self` into +the general pattern matching system. + +### Default access control level + +The default access control level, and the options for access control, are pretty +large open questions. Swift and C++ (especially w/ modules) provide a lot of +options and a pretty wide space to explore here. If the default isn't right most +of the time, access control runs the risk of becoming a significant ceremony +burden that we may want to alleviate with grouped access regions instead of +per-entity specifiers. Grouped access regions have some other advantages in +terms of pulling the public interface into a specific area of the type. diff --git a/docs/design/syntactic_conventions.md b/docs/design/syntactic_conventions.md new file mode 100644 index 0000000000000..4d250147007f6 --- /dev/null +++ b/docs/design/syntactic_conventions.md @@ -0,0 +1,75 @@ +# Syntactic conventions + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) +- [Alternatives](#alternatives) + - [Types before or after name](#types-before-or-after-name) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Right now we expect variable syntax like: `Int: x`. + +There are probably other syntactic conventions that can be added here, too. + +## Alternatives + +### Types before or after name + +While we are currently keeping types first, matching C++, there is significant +uncertainty around the right approach here. While adding the colon improves the +grammar by unambiguously marking the transition from type to a declared +identifier, in essentially every other language with a colon in a similar +position, the identifier is first and the type follows. However, that ordering +would be very _inconsistent_ with C++. + +One very important consideration here is the fundamental approach to type +inference. Languages which use the syntax `: ` typically allow +completely omitting the colon and the type to signify inference. With C++, +inference is achieved with a placeholder keyword `auto`, and Carbon is currently +being consistent there as well with `auto: `. For languages which +simply allow omission, this seems an intentional incentive to encourage +inference. On the other hand, there has been strong advocacy in the C++ +community to not overly rely on inference and to write the explicit type +whenever convenient. Being consistent with the _ordering_ of identifier and type +may ultimately be less important than being consistent with the incentives and +approach to type inference. What should be the default that we teach? Teaching +to avoid inference unless it specifically helps readability by avoiding a +confusing or unhelpfully complex type name, and incentivizing that by requiring +`auto` or another placeholder, may cause as much or more inconsistency with +languages that use `: ` as retaining the C++ ordering. + +That said, all of this is largely unknown. It will require a significant +exploration of the trade-offs and consistency differences. It should also factor +in further development of pattern matching generally and whether that has an +influence on one or another approach. Last but not least, while this may seem +like something that people will get used to with time, it may be worthwhile to +do some user research to understand the likely reaction distribution, strength +of reaction, and any quantifiable impact these options have on measured +readability. We have only found one _very_ weak source of research that focused +on the _order_ question (rather than type inference vs. explicit types or other +questions in this space). That was a very limited PhD student's study of Java +programmers that seemed to indicate improved latency for recalling the type of a +given variable name with types on the left (as in C++). However, those results +are _far_ from conclusive. + +**TODO**: Get a useful link to this PhD research (a few of us got a copy from +the professor directly). diff --git a/docs/design/templates.md b/docs/design/templates.md new file mode 100644 index 0000000000000..2763faa663f93 --- /dev/null +++ b/docs/design/templates.md @@ -0,0 +1,111 @@ +# Templates + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Types with template parameters](#types-with-template-parameters) + - [Functions with template parameters](#functions-with-template-parameters) + - [Overloading](#overloading) + - [Constraining templates with interfaces](#constraining-templates-with-interfaces) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Carbon templates follow the same fundamental paradigm as C++ templates: they are +instantiated, resulting in late type checking, duck typing, and lazy binding. +They both enable interoperability between Carbon and C++ and address some +(hopefully limited) use cases where the type checking rigor imposed by generics +isn't helpful. + +### Types with template parameters + +When parameterizing a user-defined type, the parameters can be marked as +template parameters. The resulting type-function will instantiate the +parameterized definition with the provided arguments to produce a complete type +when used. Note that only the parameters marked as having this template behavior +are subject to full instantiation -- other parameters will be type checked and +bound early to the extent possible. For example: + +``` +struct Stack(Type:$$ T) { + var Array(T): storage; + + fn Push(T: value); + fn Pop() -> T; +} +``` + +This both defines a parameterized type (`Stack`) and uses one (`Array`). Within +the definition of the type, the template type parameter `T` can be used in all +of the places a normal type would be used, and it will only by type checked on +instantiation. + +### Functions with template parameters + +Both implicit and explicit function parameters in Carbon can be marked as +template parameters. When called, the arguments to these parameters trigger +instantiation of the function definition, fully type checking and resolving that +definition after substituting in the provided (or computed if implicit) +arguments. The runtime call then passes the remaining arguments to the resulting +complete definition. + +``` +fn Convert[Type:$$ T](T: source, Type:$$ U) -> U { + var U: converted = source; + return converted; +} + +fn Foo(Int: i) -> Float { + // Instantiates with the `T` implicit argument set to `Int` and the `U` + // explicit argument set to `Float`, then calls with the runtime value `i`. + return Convert(i, Float); +} +``` + +Here we deduce one type parameter and explicitly pass another. It is not +possible to explicitly pass a deduced type parameter, instead the call site +should cast or convert the argument to control the deduction. The explicit type +is passed after a runtime parameter. While this makes that type unavailable to +the declaration of _that_ runtime parameter, it still is a template parameter +and available to use as a type even within the remaining parts of the function +declaration. + +### Overloading + +An important feature of templates in C++ is the ability to customize how they +end up specialized for specific types. Because template parameters (whether as +type parameters or function parameters) are pattern matched, we expect to +leverage pattern matching techniques to provide "better match" definitions that +are selected analogously to specializations in C++ templates. When expressed +through pattern matching, this may enable things beyond just template parameter +specialization, but that is an area that we want to explore cautiously. + +### Constraining templates with interfaces + +Because we consider only specific _parameters_ to be templated and they could be +individually migrated to a constrained interface using the +[generics system](README.md#generics), constraining templates themselves may be +less critical. Instead, we expect parameterized types and functions may use a +mixture of generic parameters and templated parameters based on where they are +constrained. + +However, if there are still use cases, we would like to explore applying the +interface constraints of the generics system directly to template parameters +rather than create a new constraint system. diff --git a/docs/design/tuples.md b/docs/design/tuples.md new file mode 100644 index 0000000000000..d3b64bfdf06c6 --- /dev/null +++ b/docs/design/tuples.md @@ -0,0 +1,124 @@ +# Tuples + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Indices as compile-time constants](#indices-as-compile-time-constants) +- [Open questions](#open-questions) + - [Slicing ranges](#slicing-ranges) + - [Single-value tuples](#single-value-tuples) + - [Function pattern match](#function-pattern-match) + - [Type vs tuple of types](#type-vs-tuple-of-types) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +The primary composite type involves simple aggregation of other types as a tuple +(called a "product type" in formal type theory): + +``` +fn DoubleBoth(Int: x, Int: y) -> (Int, Int) { + return (2 * x, 2 * y); +} +``` + +This function returns a tuple of two integers represented by the type +`(Int, Int)`. The expression to return it uses a special tuple syntax to build a +tuple within an expression: `(, )`. This is actually the +same syntax in both cases. The return type is a tuple expression, and the first +and second elements are expressions referring to the `Int` type. The only +difference is the type of these expressions. Both are tuples, but one is a tuple +of types. + +Element access uses subscript syntax: + +``` +fn Bar(Int: x, Int: y) -> Int { + var (Int, Int): t = (x, y); + return t[0] + t[1]; +} +``` + +Tuples also support multiple indices and slicing to restructure tuple elements: + +``` +fn Baz(Int: x, Int: y, Int: z) -> (Int, Int) { + var (Int, Int, Int): t1 = (x, y, z); + var (Int, Int, Int): t2 = t1[(2, 1, 0)]; + return t2[0 .. 2]; +} +``` + +This code first reverses the tuple, and then extracts a slice using a half-open +range of indices. + +### Indices as compile-time constants + +In the example `t1[(2, 1, 0)]`, we will likely want to restrict these indices to +compile-time constants. Without that, run-time indexing would need to suddenly +switch to a variant-style return type to handle heterogeneous tuples. This would +both be surprising and complex for little or no value. + +## Open questions + +### Slicing ranges + +The intent of `0 .. 2` is to be syntax for forming a sequence of indices based +on the half-open range [0, 2). There are a bunch of questions we'll need to +answer here: + +- Is this valid anywhere? Only some places? +- What _is_ the sequence? + - If it is a tuple of indices, maybe that solves the above issue, and + unlike function call indexing with multiple indices is different from + indexing with a tuple of indexes. +- Do we need syntax for a closed range (`...` perhaps, unclear if that ends up + _aligned_ or in _conflict_ with other likely uses of `...` in pattern + matching)? +- All of these syntaxes are also very close to `0.2`, is that similarity of + syntax OK? + - Do we want to require the `..` to be surrounded by whitespace to + minimize that collision? + +### Single-value tuples + +This remains an area of active investigation. There are serious problems with +all approaches here. Without the collapse of one-tuples to scalars we need to +distinguish between a parenthesized expression (`(42)`) and a one tuple (in +Python or Rust, `(42,)`), and if we distinguish them then we cannot model a +function call as simply a function name followed by a tuple of arguments; one of +`f(0)` and `f(0,)` becomes a special case. With the collapse, we either break +genericity by forbidding `(42)[0]` from working, or it isn't clear what it means +to access a nested tuple's first element from a parenthesized expression: +`((1, 2))[0]`. + +### Function pattern match + +There are some interesting corner cases we need to expand on to fully and more +precisely talk about the exact semantic model of function calls and their +pattern match here, especially to handle variadic patterns and forwarding of +tuples as arguments. We are hoping for a purely type system answer here without +needing templates to be directly involved outside the type system as happens in +C++ variadics. + +### Type vs tuple of types + +Is `(Int, Int)` a type, a tuple of types, or is there even a difference between +the two? Is different syntax needed for these cases? diff --git a/docs/design/variables.md b/docs/design/variables.md new file mode 100644 index 0000000000000..f52a922385c96 --- /dev/null +++ b/docs/design/variables.md @@ -0,0 +1,76 @@ +# Variables + + + +## Table of contents + + + +- [TODO](#todo) +- [Overview](#overview) + - [Declaring constants](#declaring-constants) +- [Alternatives](#alternatives) + - [Declaring constants](#declaring-constants-1) + - [Global variables](#global-variables) + + + +## TODO + +This is a skeletal design, added to support [the overview](README.md). It should +not be treated as accepted by the core team; rather, it is a placeholder until +we have more time to examine this detail. Please feel welcome to rewrite and +update as appropriate. + +## Overview + +Blocks introduce nested scopes and can contain local variable declarations that +work similarly to function parameters. + +For example: + +``` +fn Foo() { + var Int: x = 42; +} +``` + +This introduces a local variable named `x` into the block's scope. It has the +type `Int` and is initialized with the value `42`. These variable declarations +(and function declarations) have a lot more power than what we're covering just +yet, but this gives you the basic idea. + +While there can be global constants, there are no global variables. + +### Declaring constants + +Constants will use template-like syntax for declarations. For example, a simple +integer constant looks like: + +```carbon +var Int:$$ MyVal = 42; +``` + +## Alternatives + +### Declaring constants + +There is other syntax that could be used for declaring constants. There are +serious problems with the use of `const` in C++ as part of the type system. +Another alternative is `let` from Swift, although there are some questions +around how intuitive it is for this to introduce a constant. Another candidate +is `val` from Kotlin. Another thing we need to contend with is the surprise of +const and reference (semantic) types. At present we are leaning towards the +tempalte-like syntax for consistency within Carbon. + +### Global variables + +We are exploring several different ideas for how to design less bug-prone +patterns to replace the important use cases programmers still have for global +variables. We may be unable to fully address them, at least for migrated code, +and be forced to add some limited form of global variables back. We may also +discover that their convenience outweighs any improvements afforded. diff --git a/proposals/README.md b/proposals/README.md index 3d0ec2da45492..98df483ac4097 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -30,5 +30,6 @@ request: - [0051 - Goals](p0051.md) - [0074 - Change comment/decision timelines in proposal process](p0074.md) - [Decision](p0074-decision.md) +- [0083 - In-progress design overview](p0083.md) diff --git a/proposals/p0083.md b/proposals/p0083.md new file mode 100644 index 0000000000000..28d77e41687b6 --- /dev/null +++ b/proposals/p0083.md @@ -0,0 +1,157 @@ +# In-progress design overview + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/83) + +## Table of contents + + + +- [Problem](#problem) +- [Goals](#goals) +- [Background](#background) +- [Proposal](#proposal) +- [Alternatives considered](#alternatives-considered) + - [Single-file design overview](#single-file-design-overview) + - [No in-progress design overview](#no-in-progress-design-overview) + - [No overview of designs](#no-overview-of-designs) + + + +## Problem + +We need some vaguely consistent shared understanding of the language design as a +whole. The goal is merely consistency in discussions, in-progress syntax, and +general approach. It in no way needs reliably to reflect the end state, either +desired or realized. Instead, it should evolve as each area matures and becomes +concrete, while providing an overarching overview that connects the language +together, and the consistent background that we all can refer back to during +discussions. + +## Goals + +This is intended to offer a reasonable starting point for: + +- Example code. +- Conceptualizing Carbon at a high level. +- Reasonable, but not necessarily final, approaches to features in README.md. + - If any idea is obviously bad, we can clean it up here. + +This proposal is not intended to achieve: + +- A whole language design. + - This is way too much work for a single proposal; this is a skeletal + framework only. + - As we work on feature-specific designs, we may decide to use other + approaches. That's fine: we only need somewhere to start. + - The summaries in README.md may be expected to change over time. +- Feature-specific files aren't intended to be well-written or comprehensive. + They are a quick jot of prior thoughts. + - We want to avoid getting stuck on language details that we should + consider more carefully regardless. If you're passionate about a + feature, please feel free to start a new proposal for it. + - Each and every aspect of the suggested overview should be subject to + careful examination and justification before it becomes a settled plan + of record. + +## Background + +Many of the ideas here stem from discussions between several of the initial +people working on Carbon over several years. That doesn't make them good, but +may give some context on where they came from. They are also heavily informed by +the experience several of us have both working on the Clang C++ frontend and +several C++ codebases including those of LLVM and Clang themselves. + +## Proposal + +See [the language design overview document](/docs/design/README.md). + +## Alternatives considered + +### Single-file design overview + +We also considered putting the full design overview in one file, as in +[PR 22](https://github.com/carbon-language/carbon-lang/pull/22). This is versus +the hierarchy proposed here. + +Pros: + +- All proposed changes are in one place. +- Easier for people to skim rationale and considered changes. + +Cons: + +- Encourages more single-file designs. + - A principle of the multi-file approach is that complex features may have + subdirectories with their own README.md and files for sub-features, + similar to the relationship between this design overview and features. + - Single-file designs may be harder to evolve long-term, as the volume of + information contained impedes reading. + +### No in-progress design overview + +The primary alternative is to avoid even having a draft or in-progress design of +this form until each constituent component is more thoroughly conceived and +considered. + +Pros: + +- Avoids anchoring design on approaches that haven't yet been fully explored. + - Avoids getting stuck on discussing details where a proposal isn't + fleshed out. + +Cons: + +- The lack of an overview can lead to significant confusion and + inconsistencies in discussion, hindering fleshing out details. + - An overview offers basic shaping of the language as a whole, even as it + evolves. + +The compromise chosen is to have the in-progress design and simply work to +resist both anchoring and distraction stemming from it. We want to get the +benefits we can here while minimizing the cost. + +### No overview of designs + +The overview will result in content duplication from individual designs. At the +time of this proposal, this may be significant because individual designs are +not fleshed out, and should thus duplication should be expected to reduce over +time. However, it should be expected to remain as the duplication is fundamental +to having an overview. + +This duplication could be addressed by removing the overview. Instead, +design/README.md could be restricted to listing existing designs, with no +additional content. + +The proposed approach assumes that the proposed overviews offer significant +value for ramp-up. + +Pros: + +- Eliminates content duplication. +- A simple index is easier to maintain long-term, with less to become stale. + - It could be fully automated. + +Cons: + +- No quick way to get a high-level understanding. + - The overview is the only step before "reading every design". + - For example, we summarize common control flow keywords, so that readers + don't need to identify which documents they come from and what exists. +- Harder to show relationships between various features. + - While examples can show how designs relate, it may not be as obvious + from a simple link, even when reading the associated design. + - For example, lexical conventions come up as references for three + otherwise distinct sections. If we had a simple index of files, we + should expect users to need to read individual designs to understand + relationships. + - For example, we explain in brief the relationships between categories of + types. + - There's disagreement about whether the text of README.md offers any + utility: + [comment thread](https://github.com/carbon-language/carbon-lang/pull/83/files/25437de9e61b3a15e8ddde67b6297f1795922355..97da855dbe6023930e02473af46abea03af991e7#r444487049)