Skip to content

Latest commit

 

History

History
2284 lines (1829 loc) · 94.7 KB

classes.md

File metadata and controls

2284 lines (1829 loc) · 94.7 KB

Classes

Table of contents

Overview

A Carbon class is a user-defined record type. A class has members that are referenced by their names, in contrast to a Carbon tuple which defines a product type whose members are referenced positionally.

Classes are the primary mechanism for users to extend the Carbon type system and are deeply rooted in C++ and its history (C and Simula). We call them classes rather than other terms as that is both familiar to existing programmers and accurately captures their essence: they define the types of objects with (optional) support for methods, encapsulation, and so on.

Carbon supports both named, or "nominal", and unnamed, anonymous, or "structural", class types. Nominal class types are all distinct, but structural types are equal if they have the same sequence of member types and names. Structural class literals may be used to initialize or assign values to nominal class variables.

A class type defines the interpretation of the bytes of a value of that type, including the size, data members, and layout. It defines the operations that may be performed on those values, including what methods may be called. A class type may directly have constant members. The type itself is a compile-time immutable constant value.

Use cases

The use cases for classes include both cases motivated by C++ interop, and cases that we expect to be included in idiomatic Carbon-only code.

This design currently only attempts to address the "data classes" and "encapsulated types" use cases. Addressing the "interface as base class", "interop with C++ multiple inheritance" and "mixin" use cases is future work.

Data classes

Data classes are types that consist of data fields that are publicly accessible and directly read and manipulated by client code. They have few if any methods, and generally are not involved in inheritance at all.

Examples include:

  • a key and value pair returned from a SortedMap or HashMap
  • a 2D point that might be used in a rendering API

Properties:

  • Operations like copy, move, destroy, unformed, and so on are defined field-wise.
  • Anonymous classes types and literals should match data class semantics.

Expected in idiomatic Carbon-only code.

Background: Kotlin has a dedicated concise syntax for defining data classes that avoids boilerplate. Python has a data class library, proposed in PEP 557, that fills a similar role.

Encapsulated types

There are several categories of types that support encapsulation. This is done by making their data fields private so access and modification of values are all done through methods defined on the type.

Without inheritance

The common case for encapsulated types are those that do not participate in inheritance. These types neither support being inherited from (they are "final") nor do they extend other types.

Examples of this use case include:

  • strings, containers, iterators
  • types with invariants such as Date
  • RAII types that are movable but not copyable like C++'s std::unique_ptr or a file handle
  • non-movable types like Mutex

We expect two kinds of methods on these types: public methods defining the API for accessing and manipulating values of the type, and private helper methods used as an implementation detail of the public methods.

These types are expected in idiomatic Carbon-only code.

With inheritance and subtyping

The subtyping you get with inheritance is that you may assign the address of an object of a derived type to a pointer to its base type. For this to work, the compiler needs implementation strategies that allow operations performed through the pointer to the base type work independent of which derived type it actually points to. These strategies include:

  • Arranging for the data layout of derived types to start with the data layout of the base type as a prefix.
  • Putting a pointer to a table of function pointers, a vtable, as the first data member of the object. This allows methods to be virtual and have a derived-type-specific implementation, an override, that is used even when invoking the method on a pointer to a base type.
  • Non-virtual methods implemented on a base type should be applicable to all derived types. In general, derived types should not attempt to overload or override non-virtual names defined in the base type.

Note that these subtyping implementation strategies generally rely on encapsulation, but encapsulation is not a strict requirement in all cases.

This subtyping relationship also creates safety concerns, which Carbon should protect against. Slicing problems can arise when the source or target of an assignment is a dereferenced pointer to the base type. It is also incorrect to delete an object with a non-virtual destructor through a pointer to a base type.

Polymorphic types

Carbon will fully support single-inheritance type hierarchies with polymorphic types.

Polymorphic types support dynamic dispatch using a vtable, and data members, but only single inheritance. Individual methods opt in to using dynamic dispatch, so types will have a mix of "virtual" and non-virtual methods. Polymorphic types support traditional object-oriented single inheritance, a mix of subtyping and implementation and code reuse.

We exclude complex multiple inheritance schemes, virtual inheritance, and so on from this use case. This is to avoid the complexity and overhead they bring, particularly since the use of these features in C++ is generally discouraged. The rule is that every type has at most one base type with data members for subtyping purposes. Carbon will support additional base types as long as they don't have data members or don't support subtyping.

Background: The "Nothing is Something" talk by Sandi Metz and the Composition Over Inheritance Principle describe design patterns to use instead of multiple inheritance to support types that vary over multiple axes.

In rare cases where the complex multiple inheritance schemes of C++ are truly needed, they can be effectively approximated using a combination of these simpler building blocks.

Polymorphic types support a number of different kinds of methods:

  • They will have virtual methods:
    • Polymorphic types will typically include virtual destructors.
    • The virtual methods types may have default implementations or be abstract (or pure virtual). In the latter case, they must be implemented in any derived class that can be instantiated.
    • Virtual methods may be protected or private, intended to be called by methods in the base type but implemented in the descendant.
  • They may have non-virtual public or private helper methods, like encapsulated types without inheritance. These avoid the overhead of a virtual function call, and can be written when the base class has sufficient data members.
  • They may have protected helper methods, typically non-virtual, provided by the base type to be called by the descendant.

Note that there are two uses for protected methods: those implemented in the base and called in the descendant, and the other way around. "The End Of Object Inheritance & The Beginning Of A New Modularity" talk by Augie Fackler and Nathaniel Manista discusses design patterns that split up types to reduce the number of kinds of calls between base and derived types, and make sure calls only go in one direction.

We expect polymorphic types in idiomatic Carbon-only code, at least for the medium term. Extending this design to support polymorphic types is future work.

Interface as base class

We distinguish the specific case of polymorphic base classes that have no data members:

  • From an implementation perspective, the lack of data members removes most of the problems with supporting multiple inheritance.
  • They are about decoupling two pieces of code instead of collaborating.
  • As a use case, they are used primarily for subtyping and much less implementation reuse than other polymorphic types.
  • This case overlaps with the interface concept introduced for Carbon generics.

Removing support for data fields greatly simplifies supporting multiple inheritance. For example, it removes the need for a mechanism to figure out the offset of those data fields in the object. Similarly we don't need C++'s virtual inheritance to avoid duplicating those fields. Some complexities still remain, such as pointers changing values when casting to a secondary parent type, but these seem manageable given the benefits of supporting this useful case of multiple inheritance.

While an interface base class is generally for providing an API that allows decoupling two pieces of code, a polymorphic type is a collaboration between a base and derived type to provide some functionality. This is a bit like the difference between a library and a framework, where you might use many of the former but only one of the latter.

Interface base classes are primarily used for subtyping. The extent of implementation reuse is generally limited by the lack of data members, and the decoupling role they play is usually about defining an API as a set of public pure-virtual methods. Compared to other polymorphic types, they more rarely have methods with implementations (virtual or not), or have methods with restricted access. The main use case is when there is a method that is implemented in terms of pure-virtual methods. Those pure-virtual methods may be marked as protected to ensure they are only called through the non-abstract API, but can still be implemented in descendants.

While it is typical for this case to be associated with single-level inheritance hierarchies, there are some cases where there is an interface at the root of a type hierarchy and polymorphic types as interior branches of the tree. The case of interfaces extending or requiring other interface would also be modeled by deeper inheritance hierarchies.

An interface as base class needs to either have a virtual destructor or forbid deallocation.

There is significant overlap between interface base classes and Carbon interfaces. Both represent APIs as a collection of method names and signatures to implement. The subset of interfaces that support dynamic dispatch are called object-safe, following Rust:

  • They don't have a Self in the signature of a method in a contravariant position like a parameter.
  • They don't have free associated facets or other associated items used in a method signature.

The restrictions on object-safe interfaces match the restrictions on base class methods. The main difference is the representation in memory. A type extending a base class with virtual methods includes a pointer to the table of methods in the object value itself, while a type implementing an interface would store the pointer alongside the pointer to the value in a DynPtr(MyInterface). Of course, the interface option also allows the method table to be passed at compile time.

Note: This presumes that we include some concept of final methods in interfaces to match non-virtual functions in base classes.

We expect idiomatic Carbon-only code to generally use Carbon interfaces instead of interface base classes. We may still support interface base classes long term if we determine that the ability to put the pointer to the method implementations in the object value is important for users, particularly with a single parent as in the polymorphic type case. Extending this design to support interface base classes is future work.

Background: C++ abstract base classes that don't have data members and Java interfaces model this case.

Non-polymorphic inheritance

While it is not common, there are cases where C++ code uses inheritance without dynamic dispatch or a vtable. Instead, methods are never overridden, and derived types only add data and methods. There are some cases where this is done in C++ but would be done differently in Carbon:

  • For implementation reuse without subtyping, Carbon code should use mixins or composition. Carbon won't support private inheritance.
  • Carbon will allow data members to have size zero, so the empty-base optimization is unnecessary.
  • For cases where the derived type does not add any data members, in Carbon you can potentially use adapter types instead of inheritance.

However, there are still some cases where non-virtual inheritance makes sense. One is a parameterized type where a prefix of the data is the same independent of the parameter. An example of this is containers with a small-buffer optimization, as described in the talk CppCon 2016: Chandler Carruth "High Performance Code 201: Hybrid Data Structures". By moving the data and methods that don't depend on the buffer size to a base class, we reduce the instantiation overhead for monomorphization. The base type is also useful for reducing instantiation for consumers of the container, as long as they only need to access methods defined in the base.

Another case for non-virtual inheritance is for different node types within a data structure that have some data members in common. This is done in LLVM's map, red-black tree, and list data structure types. In a linked list, the base type might have the next and previous pointers, which is enough for a sentinel node, and there would also be a derived type with the actual data member. The base type can define operations like "splice" that only operate on the pointers not the data, and this is in fact enforced by the type system. Only the derived node type needs to be parameterized by the element type, saving on instantiation costs as before.

Many of the concerns around non-polymorphic inheritance are the same as for the non-virtual methods of polymorphic types. Assignment and destruction are examples of operations that need particular care to be sure they are only done on values of the correct type, rather than through a subtyping relationship. This means having some extrinsic way of knowing when it is safe to downcast before performing one of those operations, or performing them on pointers that were never upcast to the base type.

Interop with C++ multiple inheritance

While Carbon won't support all the C++ forms of multiple inheritance, Carbon code will still need to interoperate with C++ code that does. Of particular concern are the std::iostream family of types. Most uses of those types are the input and output variations or could be migrated to use those variations, not the harder bidirectional cases.

Much of the complexity of this interoperation could be alleviated by adopting the restriction that Carbon code can't directly access the fields of a virtual base class. In the cases where such access is needed, the workaround is to access them through C++ functions.

We do not expect idiomatic Carbon-only code to use multiple inheritance. Extending this design to support interoperating with C++ types using multiple inheritance is future work.

Mixins

A mixin is a declaration of data, methods, and interface implementations that can be added to another type, called the "main type". The methods of a mixin may also use data, methods, and interface implementations provided by the main type. Mixins are designed around implementation reuse rather than subtyping, and so don't need to use a vtable.

A mixin might be an implementation detail of a data class, or encapsulated type. A mixin might partially implement an interface as base class.

Examples: intrusive linked list, intrusive reference count

In both of these examples, the mixin needs the ability to convert between a pointer to the mixin's data (like a "next" pointer or reference count) and a pointer to the containing object with the main type.

Mixins are expected in idiomatic Carbon-only code. Extending this design to support mixins is future work.

Background: Mixins are typically implemented using the curiously recurring template pattern in C++, but other languages support them directly.

Background

See how other languages tackle this problem:

Members

The members of a class are named, and are accessed with the . notation. For example:

var p: Point2D = ...;
// Data member access
p.x = 1;
p.y = 2;
// Method call
Print(p.DistanceFromOrigin());

Tuples are used for cases where accessing the members positionally is more appropriate.

Data members have an order

The data members of a class, or fields, have an order that matches the order they are declared in. This determines the order of those fields in memory, and the order that the fields are destroyed when a value goes out of scope or is deallocated.

Struct types

Structural data classes, or struct types, are convenient for defining data classes in an ad-hoc manner. They would commonly be used:

  • as the return type of a function that returns multiple values and wants those values to have names so a tuple is inappropriate
  • as an initializer for other class variables or values
  • as a type parameter to a container

Note that struct types are examples of data class types and are still classes. The "nominal data classes" section describes another way to define a data class type. Also note that there is no struct keyword, "struct" is just convenient shorthand terminology for a structural data class.

Literals

Structural data class literals, or struct literals, are written using this syntax:

var kvpair: auto = {.key = "the", .value = 27};

This produces a struct value with two fields:

  • The first field is named "key" and has the value "the". The type of the field is set to the type of the value, and so is String.
  • The second field is named "value" and has the value 27. The type of the field is set to the type of the value, and so is i32.

Note: A comma , may optionally be included after the last field:

var kvpair: auto = {.key = "the", .value = 27,};

Open question: To keep the literal syntax from being ambiguous with compound statements, Carbon will adopt some combination of:

  • looking ahead after a { to see if it is followed by .name;
  • not allowing a struct literal at the beginning of a statement;
  • only allowing { to introduce a compound statement in contexts introduced by a keyword where they are required, like requiring { ... } around the cases of an if...else statement.

Type expression

The type of kvpair in the last example would be represented by this expression:

{.key: String, .value: i32}

This syntax is intended to parallel the literal syntax, and so uses commas (,) to separate fields instead of a semicolon (;) terminator. This choice also reflects the expected use inline in function signature declarations.

Struct types may only have data members, so the type declaration is just a list of field names and types. The result of a struct type expression is an immutable compile-time type value.

Note: Like with struct literal expressions, a comma , may optionally be included after the last field:

{.key: String, .value: i32,}

Also note that {} represents both the empty struct literal and its type.

Assignment and initialization

When initializing or assigning a variable with a data class such as a struct type to a struct value on the right hand side, the order of the fields does not have to match, just the names.

var different_order: {.x: i32, .y: i32} = {.y = 2, .x = 3};
Assert(different_order.x == 3);
Assert(different_order.y == 2);

Initialization and assignment occur field-by-field. The order of fields is determined from the target on the left side of the =. This rule matches what we expect for classes with encapsulation more generally.

Open question: What operations and in what order happen for assignment and initialization?

  • Is assignment just destruction followed by initialization? Is that destruction completed for the whole object before initializing, or is it interleaved field-by-field?
  • When initializing to a literal value, is a temporary containing the literal value constructed first or are the fields initialized directly? The latter approach supports types that can't be moved or copied, such as mutex.
  • Perhaps some operations are not ordered with respect to each other?

Operations performed field-wise

Generally speaking, the operations that are available on a data class value, such as a value with a struct type, are dependent on those operations being available for all the types of the fields.

For example, two values of the same data class type may be compared for equality or inequality if equality is supported for every member of the type:

var p: auto = {.x = 2, .y = 3};
Assert(p == {.x = 2, .y = 3});
Assert(p != {.x = 2, .y = 4});
Assert({.x = 2, .y = 4} != {.x = 5, .y = 3});

Equality and inequality comparisons are also allowed between different data class types when:

  • At least one is a struct type.
  • They have the same set of field names, though the order may be different.
  • Equality comparison is defined between the pairs of member types with the same field names.

For example, since comparison between i32 and u32 is defined, equality comparison between values of types {.x: i32, .y: i32} and {.y: u32, .x: u32} is as well. Equality and inequality comparisons compare fields using the field order of the left-hand operand and stop once the outcome of the comparison is determined. However, the comparison order and short-circuiting are generally expected to affect only the performance characteristics of the comparison and not its meaning.

Ordering comparisons, such as < and <=, use the order of the fields to do a lexicographical comparison. The argument types must have a matching order of the field names. Otherwise, the restrictions on ordering comparisons between different data class types are analogous to equality comparisons:

  • At least one is a struct type.
  • Ordering comparison is defined between the pairs of member types with the same field names.

Implicit conversion from a struct type to a data class type is allowed when the set of field names is the same and implicit conversion is defined between the pairs of member types with the same field names. So calling a function effectively performs an initialization of each of the function's parameters from the caller's arguments, and will be valid when those initializations are all valid.

A data class has an unformed state if all its members do. Treatment of unformed state follows proposal #257.

Destruction is performed field-wise in reverse order.

Extending user-defined operations on the fields to an operation on an entire data class is future work.

References: The rules for assignment, comparison, and implicit conversion for argument passing were decided in question-for-leads issue #710.

Nominal class types

The declarations for nominal class types will have:

  • an optional abstract or base prefix
  • class introducer
  • the name of the class
  • {, an open curly brace
  • a sequence of declarations
  • }, a close curly brace

Declarations should generally match declarations that can be declared in other contexts, for example variable declarations with var will define instance variables:

class TextLabel {
  var x: i32;
  var y: i32;

  var text: String = "default";
}

The main difference here is that "default" is a default instead of an initializer, and will be ignored if another value is supplied for that field when constructing a value. Defaults must be constants whose value can be determined at compile time.

Forward declaration

To support circular references between class types, we allow forward declaration of types. Forward declarations end with semicolon ; after the name of the class, instead of the block of declarations in curly braces {...}. A type that is forward declared is considered incomplete until the end of a definition with the same name.

// Forward declaration of `GraphNode`.
class GraphNode;

class GraphEdge {
  var head: GraphNode*;
  var tail: GraphNode*;
}

class GraphNode {
  var edges: Vector(GraphEdge*);
}
// `GraphNode` is first complete here.

Open question: What is specifically allowed and forbidden with an incomplete type has not yet been decided.

Self

A class definition may provisionally include references to its own name in limited ways. These limitations arise from the type not being complete until the end of its definition is reached.

class IntListNode {
  var data: i32;
  var next: IntListNode*;
}

An equivalent definition of IntListNode, since the Self keyword is an alias for the current type, is:

class IntListNode {
  var data: i32;
  var next: Self*;
}

Self refers to the innermost type declaration:

class IntList {
  class IntListNode {
    var data: i32;
    // `Self` is `IntListNode`, not `IntList`.
    var next: Self*;
  }
  var first: IntListNode*;
}

Construction

Any function with access to all the data fields of a class can construct one by converting a struct value to the class type:

var tl1: TextLabel = {.x = 1, .y = 2};
var tl2: auto = {.x = 1, .y = 2} as TextLabel;

Assert(tl1.x == tl2.x);

fn ReturnsATextLabel() -> TextLabel {
  return {.x = 1, .y = 2};
}
var tl3: TextLabel = ReturnsATextLabel();

fn AcceptsATextLabel(tl: TextLabel) -> i32 {
  return tl.x + tl.y;
}
Assert(AcceptsATextLabel({.x = 2, .y = 4}) == 6);

Note that a nominal class, unlike a struct type, can define default values for fields, and so may be initialized with a struct value that omits some or all of those fields.

Assignment

Assignment to a struct value is also allowed in a function with access to all the data fields of a class. Assignment always overwrites all of the field members.

var tl: TextLabel = {.x = 1, .y = 2};
Assert(tl.text == "default");

// ✅ Allowed: assigns all fields
tl = {.x = 3, .y = 4, .text = "new"};

// ✅ Allowed: This statement is evaluated in two steps:
// 1. {.x = 5, .y = 6} is converted into a new TextLabel value,
//    using default for field `text`.
// 2. tl is assigned to a TextLabel, which has values for all
//    fields.
tl = {.x = 5, .y = 6};
Assert(tl.text == "default");

Open question: This behavior might be surprising because there is an ambiguity about whether to use the default value or the previous value for a field. We could require all fields to be specified when assigning, and only use field defaults when initializing a new value.

// ❌ Forbidden: should tl.text == "default" or "new"?
tl = {.x = 5, .y = 6};

Member functions

Member functions can either be class functions or methods. Class functions are members of the type, while methods can only be called on instances.

Class functions

A class function is like a C++ static member function, and is declared like a function at file scope. The declaration can include a definition of the function body, or that definition can be provided out of line after the class definition is finished. A common use is for constructor functions.

class Point {
  fn Origin() -> Self {
    return {.x = 0, .y = 0};
  }
  fn CreateCentered() -> Self;

  var x: i32;
  var y: i32;
}

fn Point.CreateCentered() -> Self {
  return {.x = ScreenWidth() / 2, .y = ScreenHeight() / 2};
}

Class functions are members of the type, and may be accessed as using dot . member access either the type or any instance.

var p1: Point = Point.Origin();
var p2: Point = p1.CreateCentered();

Methods

Method declarations are distinguished from class function declarations by having a self parameter in square brackets [...] before the explicit parameter list in parens (...). There is no implicit member access in methods, so inside the method body members are accessed through the self parameter. Methods may be written lexically inline or after the class declaration.

class Circle {
  fn Diameter[self: Self]() -> f32 {
    return self.radius * 2;
  }
  fn Expand[addr self: Self*](distance: f32);

  var center: Point;
  var radius: f32;
}

fn Circle.Expand[addr self: Self*](distance: f32) {
  self->radius += distance;
}

var c: Circle = {.center = Point.Origin(), .radius = 1.5 };
Assert(Math.Abs(c.Diameter() - 3.0) < 0.001);
c.Expand(0.5);
Assert(Math.Abs(c.Diameter() - 4.0) < 0.001);
  • Methods are called using the dot . member syntax, c.Diameter() and c.Expand(...).
  • Diameter computes and returns the diameter of the circle without modifying the Circle instance. This is signified using [self: Self] in the method declaration.
  • c.Expand(...) does modify the value of c. This is signified using [addr self: Self*] in the method declaration.

The pattern 'addr self: type' means "first take the address of the argument, which must be an l-value, and then match pattern 'self: type' against it".

If the method declaration also includes deduced compile-time parameters, the self parameter must be in the same list in square brackets [...]. The self parameter may appear in any position in that list, as long as it appears after any names needed to describe its type.

Deferred member function definitions

When defining a member function lexically inline, the body is deferred and processed as if it appeared immediately after the end of the outermost enclosing class, like in C++.

For example, given a class with inline function definitions:

class Point {
  fn Distance[self: Self]() -> f32 {
    return Math.Sqrt(self.x * self.x + self.y * self.y);
  }

  fn Make(x: f32, y: f32) -> Point {
    return {.x = x, .y = y};
  }

  var x: f32;
  var y: f32;
}

These are all parsed as if they were defined outside the class scope:

class Point {
  fn Distance[self: Self]() -> f32;
  fn Make(x: f32, y: f32) -> Point;

  var x: f32;
  var y: f32;
}

fn Point.Distance[self: Self]() -> f32 {
  return Math.Sqrt(self.x * self.x + self.y * self.y);
}

fn Point.Make(x: f32, y: f32) -> Point {
  return {.x = x, .y = y};
}

Name lookup in classes

Member access is an expression; details are covered there. Because function definitions are deferred, name lookup in classes works the same regardless of whether a function is inline. The class body forms a scope for name lookup, and function definitions can perform unqualified name lookup within that scope.

For example:

class Square {
  fn GetArea[self: Self]() -> f32 {
    // ✅ OK: performs name lookup on `self`.
    return self.size * self.size;
    // ❌ Error: finds `Square.size`, but an instance is required.
    return size * size;
    // ❌ Error: an instance is required.
    return Square.size * Square.size;
    // ✅ OK: performs instance binding with `self`.
    return self.(Square.size) * self.(Square.size);
    // ✅ OK: uses unqualified name lookup to find `Square.size`, then performs
    // instance binding with `self`.
    return self.(size) * self.(size);
  }

  fn GetDoubled[self: Self]() -> Square {
    // ✅ OK: performs name lookup on `Square` for `Create`.
    return Square.Make(self.size);
    // ✅ OK: performs unqualified name lookup within class scope for `Create`.
    return Make(self.size);
    // ✅ OK: performs name lookup on `self` for `Create`.
    return self.Make(self.size);
  }

  fn Make(size: f32) -> Square;

  var size: f32;
}

The example's name lookups refer to Create and size which are defined after the example member access; this is valid because of deferred member function definitions.

However, function signatures must still complete lookup without deferring. For example:

class List {
  // ❌ Error: `Iterator` has not yet been defined.
  fn Iterate() -> Iterator;

  class Iterator {
    ...
  }

  // ✅ OK: The definition of Iterator is now available.
  fn Iterate() -> Iterator;
}

An out-of-line function definition's parameters, return type, and body are evaluated as if in-scope. For example:

// ✅ OK: The return type performs unqualified name lookup into `List` for
// `Iterator`.
fn List.Iterate() -> Iterator {
  ...
}

Nominal data classes

We will mark data classes with an impl as Data {} line.

class TextLabel {
  var x: i32;
  var y: i32;

  var text: String;

  // This line makes `TextLabel` a data class, which defines
  // a number of operations field-wise.
  impl as Data {}
}

The fields of data classes must all be public. That line will add field-wise implementations and operations of all interfaces that a struct with the same fields would get by default.

The word Data here refers to an empty interface in the Carbon prologue. That interface would then be part of our strategy for defining how other interfaces are implemented for data classes.

References: Rationale for this approach is given in proposal #722.

Member type

Additional types may be defined in the scope of a class definition.

class StringCounts {
  class Node {
    var key: String;
    var count: i32;
  }
  var counts: Vector(Node);
}

The inner type is a member of the type, and is given the name StringCounts.Node. This case is called a member class since the type is a class, but other kinds of type declarations, like choice types, are allowed.

Let

Other type constants can be defined using a let declaration:

class MyClass {
  let Pi:! f32 = 3.141592653589793;
  let IndexType:! type = i32;
}

The :! indicates that this is defining a compile-time constant, and so does not affect the storage of instances of that class.

Alias

You may declare aliases of the names of class members. This is to allow them to be renamed in multiple steps or support alternate names.

class StringPair {
  var key: String;
  var value: String;
  alias first = key;
  alias second = value;
}

var sp1: StringPair = {.key = "K", .value = "1"};
var sp2: StringPair = {.first = "K", .second = "2"};
Assert(sp1.first == sp2.key);
Assert(&sp1.first == &sp1.key);

Future work: This needs to be connected to the broader design of aliases, once that lands.

Inheritance

Carbon supports inheritance using a class hierarchy, on an opt-in basis. Classes by default are final, which means they may not be extended. To declare a class as allowing extension, use either the base class or abstract class introducer:

base class MyBaseClass { ... }

A base class may be extended to get a derived class:

base class MiddleDerived {
  extend base: MyBaseClass;
  ...
}
class FinalDerived {
  extend base: MiddleDerived;
  ...
}
// ❌ Forbidden: class Illegal { extend base: FinalDerived; ... }
// may not extend `FinalDerived` since not declared `base` or `abstract`.

An abstract class or abstract base class is a base class that may not be instantiated.

abstract class MyAbstractClass { ... }
// ❌ Forbidden: var a: MyAbstractClass = ...;

Future work: For now, the Carbon design only supports single inheritance. In the future, Carbon will support multiple inheritance with limitations on all base classes except the one listed first.

Terminology: We say MiddleDerived and FinalDerived are derived classes, transitively extending or derived from MyBaseClass. Similarly FinalDerived is derived from or extends MiddleDerived. MiddleDerived is FinalDerived's immediate base class, and both MiddleDerived and MyBaseClass are base classes of FinalDerived. Base classes that are not abstract are called extensible classes.

A derived class has all the members of the class it extends, including data members and methods, though it may not be able to access them if they were declared private.

Virtual methods

A base class may define virtual methods. These are methods whose implementation may be overridden in a derived class.

Only methods defined in the scope of the class definition may be virtual, not any defined in out-of-line interface impl declarations. Interface methods may be implemented using virtual methods when the impl is inline, and calls to those methods by way of the interface will do virtual dispatch just like a direct call to the method does.

Class functions may not be declared virtual.

Virtual modifier keywords

A method is declared as virtual by using a virtual modifier keyword in its declaration before fn.

base class MyBaseClass {
  virtual fn Overridable[self: Self]() -> i32 { return 7; }
}

This matches C++, and makes it relatively easy for authors of derived classes to find the functions that can be overridden.

If no keyword is specified, the default for methods is that they are non-virtual. This means:

  • they can't override methods in bases of this class;
  • they can't be overridden in derived classes; and
  • they have an implementation in the current class, and that implementation must work for all derived classes.

There are three virtual modifier keywords:

  • virtual - This marks a method as not present in bases of this class and having an implementation in this class. That implementation may be overridden in derived classes.
  • abstract - This marks a method that must be overridden in a derived class since it has no implementation in this class. This is short for "abstract virtual" but is called "pure virtual" in C++. Only abstract classes may have unimplemented abstract methods.
  • impl - This marks a method that overrides a method marked virtual or abstract in the base class with an implementation specific to -- and defined within -- this class. The method is still virtual and may be overridden again in subsequent derived classes if this is a base class. See method overriding in Wikipedia. Requiring a keyword when overriding allows the compiler to diagnose when the derived class accidentally uses the wrong signature or spelling and so doesn't match the base class. We intentionally use the same keyword here as for implementing interfaces, to emphasize that they are similar operations.
Keyword on
method in C
Allowed in
abstract class C
Allowed in
base class C
Allowed in
final class C
in B where
C extends B
in D where
D extends C
virtual not present abstract
impl
not mentioned
abstract not present
virtual
abstract
impl
abstract
impl
may not be
mentioned if
D is not final
impl virtual
abstract
impl
abstract
impl

Since validating a method with a virtual modifier keyword involves looking for methods with the same name in the base class, virtual methods must be declared after the extend base declaration when present in a class definition. This simplifies the compiler, and follows the information accumulation principle.

Subtyping

A pointer to a base class, like MyBaseClass* is actually considered to be a pointer to that type or any derived class, like MiddleDerived or FinalDerived. This means that a FinalDerived* value may be implicitly cast to type MiddleDerived* or MyBaseClass*.

This is accomplished by making the data layout of a type extending MyBaseClass have MyBaseClass as a prefix. In addition, the first class in the inheritance chain with a virtual method will include a virtual pointer, or vptr, pointing to a virtual method table, or vtable. Any calls to virtual methods will perform dynamic dispatch by calling the method using the function pointer in the vtable, to get the overridden implementation from the most derived class that implements the method.

This data layout is reflected in the order of declarations in a class definition. An extend base declaration, when present in a class definition, must appear before any other declarations adding data to the class instances, such as instance variables.

Since a final class may not be extended, the compiler can bypass the vtable and use static dispatch. In general, you can use a combination of an abstract base class and a final class instead of an extensible class if you need to distinguish between "exactly a type" and "possibly a subtype."

base class Extensible { ... }

// Can be replaced by:

abstract class ExtensibleBase { ... }
class ExactlyExtensible {
  extend base: ExtensibleBase;
  ...
}

Self refers to the current type

Note that Self in a class definition means "the current type being defined" not "the type implementing this method." To implement a method in a derived class that uses Self in the declaration in the base class, only the type of self should change:

base class B1 {
  virtual fn F[self: Self](x: Self) -> Self;
  // Means exactly the same thing as:
  //   virtual fn F[self: B1](x: B1) -> B1;
}

class D1 {
  extend base: B1;
  // ❌ Illegal:
  //   impl fn F[self: Self](x: Self) -> Self;
  // since that would mean the same thing as:
  //   impl fn F[self: Self](x: D1) -> D1;
  // and `D1` is a different type than `B1`.

  // ✅ Allowed: Parameter and return types
  //  of `F` match declaration in `B1`.
  impl fn F[self: Self](x: B1) -> B1;
  // Or: impl fn F[self: D1](x: B1) -> B1;
}

The exception is when there is a subtyping relationship such that it would be legal for a caller using the base classes signature to actually be calling the derived implementation, as in:

base class B2 {
  virtual fn Clone[self: Self]() -> Self*;
  // Means exactly the same thing as:
  //   virtual fn Clone[self: B2]() -> B2*;
}

class D2 {
  extend base: B2;
  // ✅ Allowed
  impl fn Clone[self: Self]() -> Self*;
  // Means the same thing as:
  //   impl fn Clone[self: D2]() -> D2*;
  // which is allowed since `D2*` is a
  // subtype of `B2*`.
}

Constructors

Like for classes without inheritance, constructors for a derived class are ordinary functions that return an instance of the derived class. Generally constructor functions should return the constructed value without copying, as in proposal #257: Initialization of memory and variables. This means either creating the object in the return statement itself, or in a returned var declaration. As before, instances can be created by casting a struct value into the class type, this time with a .base member to initialize the members of the immediate base type.

class MyDerivedType {
  extend base: MyBaseType;
  fn Make() -> MyDerivedType {
    return {.base = MyBaseType.Make(), .derived_field = ...};
  }
}

There are two cases that aren't well supported with this pattern:

  • Users cannot create a value of an abstract class, which is necessary when it has private fields or otherwise requires initialization.
  • Users may want to reduce the chance of mistakes from calling a method on a partially constructed object. Of particular concern is calling a virtual method prior to forming the derived class and so it uses the base class implementation.

While expected to be relatively rarely needed, we will address both of these concerns with a specialized type just used during construction of base classes, called the partial class type for the class.

Partial class type

The partial class type for a base class type like MyBaseType is written partial MyBaseType.

  • Only methods that take the partial class type may be called on the partial class type, so methods have to opt in to being called on an object that isn't fully constructed.
  • No virtual methods may take the partial class type, so there is no way to transitively call a virtual method on an object that isn't fully constructed.
  • partial MyBaseClass and MyBaseClass have the same fields in the same order with the same data layout. The only difference is that partial MyBaseClass doesn't use (look into) its hidden vptr slot. To reliably catch any bugs where virtual function calls occur in this state, both fast and hardened release builds will initialize the hidden vptr slot to a null pointer. Debug builds will initialize it to an alternate vtable whose functions will abort the program with a clear diagnostic.
  • Since partial MyBaseClass has the same data layout but only uses a subset, there is a subtyping relationship between these types. A MyBaseClass value is a partial MyBaseClass value, but not the other way around. So you can cast MyBaseClass* to partial MyBaseClass*, but the other direction is not safe.
  • When MyBaseClass may be instantiated, there is a conversion from partial MyBaseClass to MyBaseClass. It changes the value by filling in the hidden vptr slot. If MyBaseClass is abstract, then attempting that conversion is an error.
  • partial MyBaseClass is considered final, even if MyBaseClass is not. This is despite the fact that from a data layout perspective, partial MyDerivedClass will have partial MyBaseClass as a prefix if MyDerivedClass extends MyBaseClass. The type partial MyBaseClass specifically means "exactly this and no more." This means we don't need to look at the hidden vptr slot, and we can instantiate it even if it doesn't have a virtual destructor.
  • The keyword partial may only be applied to a base class. For final classes, there is no need for a second type.
Usage

The general pattern is that base classes can define constructors returning the partial class type.

base class MyBaseClass {
  fn Make() -> partial Self {
    return {.base_field_1 = ..., .base_field_2 = ...};
  }
  // ...
}

Extensible classes can be instantiated even from a partial class type value:

var mbc: MyBaseClass = MyBaseClass.Make();

The conversion from partial MyBaseClass to MyBaseClass only fills in the vptr value and can be done in place. After the conversion, all public methods may be called, including virtual methods.

The partial class type is required for abstract classes, since otherwise they may not be instantiated. Constructor functions for abstract classes should be marked protected so they may only be accessed in derived classes.

abstract class MyAbstractClass {
  protected fn Make() -> partial Self {
    return {.base_field_1 = ..., .base_field_2 = ...};
  }
  // ...
}
// ❌ Error: can't instantiate abstract class
var abc: MyAbstractClass = ...;

If a base class wants to store a pointer to itself somewhere in the constructor function, there are two choices:

  • An extensible class could use the plain type instead of the partial class type.

    base class MyBaseClass {
      fn Make() -> Self {
        returned var result: Self = {...};
        StoreMyPointerSomewhere(&result);
        return var;
      }
    }
    
  • The other choice is to explicitly cast the type of its address. This pointer should not be used to call any virtual method until the object is finished being constructed, since the vptr will be null.

    abstract class MyAbstractClass {
      protected fn Make() -> partial Self {
        returned var result: partial Self = {...};
        // Careful! Pointer to object that isn't fully constructed!
        StoreMyPointerSomewhere(&result as Self*);
        return var;
      }
    }
    

The constructor for a derived class may construct values from a partial class type of the class' immediate base type or the full type:

abstract class MyAbstractClass {
  protected fn Make() -> partial Self { ... }
}

// Base class returns a partial type
base class Derived {
  extend base: MyAbstractClass;
  protected fn Make() -> partial Self {
    return {.base = MyAbstractClass.Make(), .derived_field = ...};
  }
  ...
}

base class MyBaseClass {
  fn Make() -> Self { ... }
}

// Base class returns a full type
base class ExtensibleDerived {
  extend base: MyBaseClass;
  fn Make() -> Self {
    return {.base = MyBaseClass.Make(), .derived_field = ...};
  }
  ...
}

And final classes will return a type that does not use the partial class type:

class FinalDerived {
  extend base: MiddleDerived;
  fn Make() -> Self {
    return {.base = MiddleDerived.Make(), .derived_field = ...};
  }
  ...
}

Observe that the vptr is only assigned twice in release builds if you use partial class types:

  • The first class value created, by the factory function creating the base of the class hierarchy, initialized the vptr field to nullptr. Every derived type transitively created from that value will leave it alone.
  • Only when the value has its most-derived class and is converted from the partial class type to its final type is the vptr field set to its final value.

In the case that the base class can be instantiated, tooling could optionally recommend that functions returning Self that are used to initialize a derived class be changed to return partial Self instead. However, the consequences of returning Self instead of partial Self when the value will be used to initialize a derived class are fairly minor:

  • The vptr field will be assigned more than necessary.
  • The types won't protect against calling methods on a value while it is being constructed, much like the situation in C++ currently.

Assignment with inheritance

Since the assignment operator method should not be virtual, it is only safe to implement it for final types. However, following the maxim that Carbon should "focus on encouraging appropriate usage of features rather than restricting misuse", we allow users to also implement assignment on extensible classes, even though it can lead to slicing.

Destructors

Every non-abstract type is destructible, meaning has a defined destructor function called when the lifetime of a value of that type ends, such as when a variable goes out of scope. The destructor for a class may be customized using the destructor keyword:

class MyClass {
  destructor [self: Self] { ... }
}

or:

class MyClass {
  // Can modify `self` in the body.
  destructor [addr self: Self*] { ... }
}

If a class has no destructor declaration, it gets the default destructor, which is equivalent to destructor [self: Self] { }.

The destructor for a class is run before the destructors of its data members. The data members are destroyed in reverse order of declaration. Derived classes are destroyed before their base classes, so the order of operations is:

  • derived class' destructor runs,
  • the data members of the derived class are destroyed, in reverse order of declaration,
  • the immediate base class' destructor runs,
  • the data members of the immediate base class are destroyed, in reverse order of declaration,
  • and so on.

Destructors may be declared in class scope and then defined out-of-line:

class MyClass {
  destructor [addr self: Self*];
}
destructor MyClass [addr self: Self*] { ... }

It is illegal to delete an instance of a derived class through a pointer to one of its base classes unless it has a virtual destructor. An abstract or base class' destructor may be declared virtual using the virtual introducer, in which case any derived class destructor declaration must be impl:

base class MyBaseClass {
  virtual destructor [addr self: Self*] { ... }
}

class MyDerivedClass {
  extend base: MyBaseClass;
  impl destructor [addr self: Self*] { ... }
}

The properties of a type, whether type is abstract, base, or final, and whether the destructor is virtual or non-virtual, determines which facet types it satisfies.

  • Non-abstract classes are Concrete. This means you can create local and member variables of this type. Concrete types have destructors that are called when the local variable goes out of scope or the containing object of the member variable is destroyed.
  • Final classes and classes with a virtual destructor are Deletable. These may be safely deleted through a pointer.
  • Classes that are Concrete, Deletable, or both are Destructible. These are types that may be deleted through a pointer, but it might not be safe. The concerning situation is when you have a pointer to a base class without a virtual destructor. It is unsafe to delete that pointer when it is actually pointing to a derived class.

Note: The names Deletable and Destructible are placeholders since they do not conform to the decision on question-for-leads issue #1058: "How should interfaces for core functionality be named?".

Class Destructor Concrete Deletable Destructible
abstract non-virtual no no no
abstract virtual no yes yes
base non-virtual yes no yes
base virtual yes yes yes
final any yes yes yes

The compiler automatically determines which of these facet types a given type satisfies. It is illegal to directly implement Concrete, Deletable, or Destructible. For more about these constraints, see "destructor constraints" in the detailed generics design.

A pointer to Deletable types may be passed to the Delete method of the Allocator interface. To deallocate a pointer to a base class without a virtual destructor, which may only be done when it is not actually pointing to a value with a derived type, call the UnsafeDelete method instead. Note that you may not call UnsafeDelete on abstract types without virtual destructors, it requires Destructible.

interface Allocator {
  // ...
  fn Delete[T:! Deletable, addr self: Self*](p: T*);
  fn UnsafeDelete[T:! Destructible, addr self: Self*](p: T*);
}

To pass a pointer to a base class without a virtual destructor to a checked-generic function expecting a Deletable type, use the UnsafeAllowDelete type adapter.

class UnsafeAllowDelete(T:! Concrete) {
  extend adapt T;
  impl as Deletable {}
}

// Example usage:
fn RequiresDeletable[T:! Deletable](p: T*);
var x: MyExtensible;
RequiresDeletable(&x as UnsafeAllowDelete(MyExtensible)*);

If a virtual method is transitively called from inside a destructor, the implementation from the current class is used, not any overrides from derived classes. It will abort the execution of the program if that method is abstract and not implemented in the current class.

Future work: Allow or require destructors to be declared as taking partial Self in order to prove no use of virtual methods.

Types satisfy the TrivialDestructor facet type if:

  • the class declaration does not define a destructor or the class defines the destructor with an empty body { },
  • all data members implement TrivialDestructor, and
  • all base classes implement TrivialDestructor.

For example, a struct type implements TrivialDestructor if all its members do.

TrivialDestructor implies that their destructor does nothing, which may be used to generate optimized specializations.

There is no provision for handling failure in a destructor. All operations that could potentially fail must be performed before the destructor is called. Unhandled failure during a destructor call will abort the program.

Future work: Allow or require destructors to be declared as taking [var self: Self].

Alternatives considered:

Access control

By default, all members of a class are fully publicly accessible. Access can be restricted by adding a keyword, called an access modifier, prior to the declaration. Access modifiers are how Carbon supports encapsulation.

The access modifier is written before any virtual modifier keyword.

Rationale: Carbon makes members public by default for a few reasons:

  • The readability of public members is the most important, since we expect most readers to be concerned with the public API of a type.
  • The members that are most commonly private are the data fields, which have relatively less complicated definitions that suffer less from the extra annotation.

Additionally, there is precedent for this approach in modern object-oriented languages such as Kotlin and Python, both of which are well regarded for their usability.

Keywords controlling visibility are attached to individual declarations instead of C++'s approach of labels controlling the visibility for all following declarations to reduce context sensitivity. This matches Rust, Swift, Java, C#, Kotlin, and D.

References: Proposal #561: Basic classes included the decision that members default to publicly accessible originally asked in issue #665.

Private access

As in C++, private means only accessible to members of the class and any friends.

class Point {
  fn Distance[self: Self]() -> f32;
  // These are only accessible to members of `Point`.
  private var x: f32;
  private var y: f32;
}

A private virtual or private abstract method may be implemented in derived classes, even though it may not be called. This allows derived classes to customize the behavior of a function called by a method of the base class, while still preventing the derived class from calling it. This matches the behavior of C++ and is more orthogonal.

Future work: private will give the member internal linkage unless it needs to be external because it is used in an inline method or template. We may in the future add a way to specify internal linkage explicitly.

Open questions: Using private to mean "restricted to this class" matches C++. Other languages support restricting to different scopes:

  • Swift supports "restrict to this module" and "restrict to this file".
  • Rust supports "restrict to this module and any children of this module", as well as "restrict to this crate", "restrict to parent module", and "restrict to a specific ancestor module".

Comparison to other languages: C++, Rust, and Swift all make class members private by default. C++ offers the struct keyword that makes members public by default.

Protected access

Protected members may only be accessed by members of this class, members of derived classes, and any friends.

base class MyBaseClass {
  protected fn HelperClassFunction(x: i32) -> i32;
  protected fn HelperMethod[self: Self](x: i32) -> i32;
  protected var data: i32;
}

class MyDerivedClass {
  extend base: MyBaseClass;
  fn UsesProtected[addr self: Self*]() {
    // Can access protected members in derived class
    var x: i32 = HelperClassFunction(3);
    self->data = self->HelperMethod(x);
  }
}

Friends

Classes may have a friend declaration:

class Buddy { ... }

class Pal {
  private var x: i32;
  friend Buddy;
}

This declares Buddy to be a friend of Pal, which means that Buddy can access all members of this class, even the ones that are declared private or protected.

The friend keyword is followed by the name of an existing function, type, or parameterized family of types. Unlike C++, it won't act as a forward declaration of that name. The name must be resolvable by the compiler, and so may not be a member of a template.

Test friendship

Future work: There should be a convenient way of allowing tests in the same library as the class definition to access private members of the class. Ideally this could be done without changing the class definition itself, since it doesn't affect the class' public API.

Access control for construction

A function may construct a class, by casting a struct value to the class type, if it has access to (write) all of its fields.

Future work: There should be a way to limit which code can construct a class even when it only has public fields. This will be resolved in question-for-leads issue #803.

Operator overloading

Developers may define how standard Carbon operators, such as + and /, apply to custom types by implementing the interface that corresponds to that operator for the types of the operands. See the "operator overloading" section of the generics design. The specific interface used for a given operator may be found in the expressions design.

Future work

This includes features that need to be designed, questions to answer, and a description of the provisional syntax in use until these decisions have been made.

Struct literal shortcut

We could allow you to write {x, y} as a short hand for {.x = x, .y = y}.

Optional named parameters

Structs are being considered as a possible mechanism for implementing optional named parameters. We have three main candidate approaches: allowing struct types to have field defaults, having dedicated support for destructuring struct values in pattern contexts, or having a dedicated optional named parameter syntax.

Field defaults for struct types

If struct types could have field defaults, you could write a function declaration with all of the optional parameters in an option struct:

fn SortIntVector(
    v: Vector(i32)*,
    options: {.stable: bool = false,
              .descending: bool = false} = {}) {
  // Code using `options.stable` and `options.descending`.
}

// Uses defaults of `.stable` and `.descending` equal to `false`.
SortIntVector(&v);
SortIntVector(&v, {});
// Sets `.stable` option to `true`.
SortIntVector(&v, {.stable = true});
// Sets `.descending` option to `true`.
SortIntVector(&v, {.descending = true});
// Sets both `.stable` and `.descending` options to `true`.
SortIntVector(&v, {.stable = true, .descending = true});
// Order can be different for arguments as well.
SortIntVector(&v, {.descending = true, .stable = true});

Destructuring in pattern matching

We might instead support destructuring struct patterns with defaults:

fn SortIntVector(
    v: Vector(i32)*,
    {stable: bool = false, descending: bool = false}) {
  // Code using `stable` and `descending`.
}

This would allow the same syntax at the call site, but avoids some concerns with field defaults and allows some other use cases such as destructuring return values.

Discussion

We might support destructuring directly:

var {key: String, value: i32} = ReturnKeyValue();

or by way of a mechanism that converts a struct into a tuple:

var (key: String, value: i32) =
    ReturnKeyValue().extract(.key, .value);
// or maybe:
var (key: String, value: i32) =
    ReturnKeyValue()[(.key, .value)];

Similarly we might support optional named parameters directly instead of by way of struct types.

Some discussion on this topic has occurred in:

Inheritance

C++ abstract base classes interoperating with object-safe interfaces

We want four things so that Carbon's object-safe interfaces may interoperate with C++ abstract base classes without data members, matching the interface as base class use case:

  • Ability to convert an object-safe interface (a facet type) into an C++-compatible base class (a base type), maybe using AsBaseClass(MyInterface).
  • Ability to convert a C++ base class without data members (a base type) into an object-safe interface (a facet type), maybe using AsInterface(MyIBC).
  • Ability to convert a (thin) pointer to an abstract base class to a DynPtr of the corresponding interface.
  • Ability to convert DynPtr(MyInterface) values to a proxy type that extends the corresponding base class AsBaseType(MyInterface).

Note that the proxy type extending AsBaseType(MyInterface) would be a different type than DynPtr(MyInterface) since the receiver input to the function members of the vtable for the former does not match those in the witness table for the latter.

Overloaded methods

We allow a derived class to define a class function with the same name as a class function in the base class. For example, we expect it to be pretty common to have a constructor function named Create at all levels of the type hierarchy.

Beyond that, we may want some rules or restrictions about defining methods in a derived class with the same name as a base class method without overriding it. There are some opportunities to improve on and simplify the C++ story:

  • We don't want to silently hide methods in the base class because of a method with the same name in a derived class. There are uses for this in C++, but it also causes problems and without multiple inheritance there isn't the same need in Carbon.
  • Overload resolution should happen before virtual dispatch.
  • For evolution purposes, you should be able to add private members to a base class that have the same name as member of a derived class without affecting overload resolution on instances of the derived class, in functions that aren't friends of the base class.

References: This was discussed in the open discussion on 2021-07-12.

Interop with C++ inheritance

This design directly supports Carbon classes inheriting from a single C++ class.

class CarbonClass {
  extend base: Cpp.CPlusPlusClass;
  fn Make() -> Self {
    return {.base = Cpp.CPlusPlusClass(...), .other_fields = ...};
  }
  ...
}

To allow C++ classes to extend Carbon classes, there needs to be some way for C++ constructors to initialize their base class:

  • There could be some way to export a Carbon class that identifies which factory functions may be used as constructors.

  • We could explicitly call the Carbon factory function, as in:

    // `Base` is a Carbon class which gets converted to a
    // C++ class for interop purposes:
    class Base {
    public:
        virtual ~Base() {}
        static auto Make() -> Base;
    };
    
    // In C++
    class Derived : public Base {
    public:
        virtual ~Derived() override {}
        // This isn't currently a case where C++ guarantees no copy,
        // and so it currently still requires a notional copy and
        // there appear to be implementation challenges with
        // removing them. This may require an extension to make work
        // reliably without an extraneous copy of the base subobject.
        Derived() : Base(Base::Make()) {}
    };
    

    However, this doesn't work in the case where Base can't be instantiated, or Base does not have a copy constructor, even though it shouldn't be called due to RVO.

Virtual base classes

TODO: Ask zygoloid to fill this in.

Carbon won't support declaring virtual base classes, and the C++ interop use cases Carbon needs to support are limited. This will allow us to simplify the C++ interop by allowing Carbon to delegate initialization of virtual base classes to the C++ side.

This requires that we enforce two rules:

  • No multiple inheritance of C++ classes with virtual bases
  • No C++ class extending a Carbon class that extends a C++ class with a virtual base

Mixins

We will need some way to declare mixins. This syntax will need a way to distinguish defining versus requiring member variables. Methods may additionally be given a default definition but may be overridden. Interface implementations may only be partially provided by a mixin. Mixin methods will need to be able to convert between pointers to the mixin type and the main type.

Open questions include whether a mixin is its own type that is a member of the containing type, and whether mixins are templated on the containing type. Mixins also complicate how constructors work.

Memory layout

Carbon will need some way for users to specify the memory layout of class types beyond simple ordering of fields, such as controlling the packing and alignment for the whole type or individual members.

We may allow members of a derived class like to put data members in the final padding of its base class prefix. Tail-padding reuse has both advantages and disadvantages, so we may have some way for a class to explicitly mark that its tail padding is available for use by a derived class,

Advantages:

  • Tail-padding reuse is sometimes a nice layout optimization (eg, in Clang we save 8 bytes per Expr by reusing tail padding).
  • No class size regressions when migrating from C++.
  • Special case of reusing the tail padding of a class that is empty other than its tail padding is very important, to the extent that we will likely need to support either zero-sized types or tail-padding reuse in order to have acceptable class layouts.

Disadvantages:

  • Cannot use memcpy(p, q, sizeof(Base)) to copy around base class subobjects if the destination is an in-lifetime, because they might overlap other objects' representations.
  • Somewhat more complex model.
  • We need some mechanism for disabling tail-padding reuse in "standard layout" types.
  • We may also have to use narrowed loads for the last member of a base class to avoid accidentally creating a race condition.

However, we can still use memcpy and memset to initialize a base class subobject, even if its tail padding might be reused, so long as we guarantee that no other object lives in the tail padding and is initialized before the base class. In C++, that happens only due to virtual base classes getting initialized early and laid out at the end of the object; if we disallow virtual base classes then we can guarantee that initialization order is address order, removing most of the downside of tail-padding reuse.

No static variables

At the moment, there is no proposal to support static member variables, in line with avoiding global variables more generally. Carbon may need some support in this area, though, for parity with and migration from C++.

Computed properties

Carbon might want to support members of a type that are accessed like a data member but return a computed value like a function. This has a number of implications:

  • It would be a way of publicly exposing data members for encapsulated types, allowing for rules that otherwise forbid mixing public and private data members.
  • It would provide a more graceful evolution path from a data class to an encapsulated type.
  • It would give an option to start with a data class instead of writing all the boilerplate to create an encapsulated type preemptively to allow future evolution.
  • It would let you take a variable away and put a property in its place with no other code changes. The number one use for this is so you can put a breakpoint in the property code, then later go back to public variable once you understand who was misbehaving.
  • We should have some guidance for when to use a computed property instead of a function with no arguments. One possible criteria is when it is a pure function of the state of the object and executes in an amount of time similar to ordinary member access.

However, there are likely to be differences between computed properties and other data members, such as the ability to take the address of them. We might want to support "read only" data members, that can be read through the public API but only modified with private access, for data members which may need to evolve into a computed property. There are also questions regarding how to support assigning or modifying computed properties, such as using +=.

Interfaces implemented for data classes

We should define a way for defining implementations of interfaces for struct types. To satisfy coherence, these implementations would have to be defined in the library with the interface definition. The syntax might look like:

interface ConstructWidgetFrom {
  fn Construct(Self) -> Widget;
}

impl {.kind: WidgetKind, .size: i32}
    as ConstructWidgetFrom { ... }

In addition, we should define a way for interfaces to define templated blanket implementations for data classes more generally. These implementations will typically subject to the criteria that all the data fields of the type must implement the interface. An example use case would be to say that a data class is serializable if all of its fields were. For this we will need a facet type for capturing that criteria, maybe something like DataFieldsImplement(MyInterface). The templated implementation will need some way of iterating through the fields so it can perform operations fieldwise. This feature should also implement the interfaces for any tuples whose fields satisfy the criteria.

It is an open question how to define implementations for binary operators. For example, if i32 is comparable to f64, then {.x = 3, .y = 2.72} should be comparable to {.x = 3.14, .y = 2}. The trick is how to declare the criteria that "T is comparable to U if they have the same field names in the same order, and for every field x, the type of T.x implements ComparableTo for the type of U.x."

Alternatives considered

References