Pointers in C++ are vulnerable to bugs involving dangling or null pointers. In Rust, pointers are non-nullable and the compiler enforces the invariant that pointers are never dangling.
C++ | Rust |
---|---|
T& foo |
let foo: &mut T (also non aliased) |
const T& foo |
let foo: &T (immutable - not just read-only through this reference) |
std::unique_ptr<T> |
owned pointer: Box<T> |
auto p = make_unique(...) |
let p = Box::new(...); |
std::shared_ptr<T> |
|
T *foo; |
let foo: *mut T |
potentially null pointer/ref | Option<Box<T>> , Option<&T> |
Borrowed pointers (&
) have the same representation as C pointers or C++
references at runtime, but to enforce safety at compile-time they are more
restricted and have a concept of lifetimes. See the borrowed pointer tutorial for details.
Note that unlike C++ references but like raw pointers, they can be treated as
values and stored in containers.
For example, a container can return a reference to an object inside it, and the lifetime will ensure that the container outlives the reference and is not modified in a way that would make it invalid. This concept also extends to any iterator that wraps borrowed pointers, and makes them impossible to invalidate.
Owned pointers are almost identical to std::unique_ptr
in C++, and point to
memory on a shared heap so they can be moved between tasks.
Rc<T>
pointers are reference counted , similar to std::shared_ptr<T>
. The reference count is embedded in a single allocation with the underlying type.
If managed Gc<T>
pointers are avoided, the garbage collection runtime is not used (the compiler
can enforce this with the managed-heap-memory
lint check). Garbage collection
will be avoided in the standard library except for concepts like persistent
(copy-on-write, with shared substructures) containers that require them.
Use Option<Box<T>>
or Option<&T>
to represent a potentially null pointer or reference accessible safely through match
or its helper methods.
The use of Raw pointers is controlled through unsafe blocks.
C++ | Rust |
---|---|
const char * | &str (closest approx, similar use) |
"foo" (const char * ) |
"foo" (&'static str ) |
std::string | String |
std::string("foo") |
String::from_str("foo") |
A String
is a general purpose growable string;
A string slice &str
is a slice of a string, should be used as an immutable input where possible; you can obtain a &str
from a String
with .as_slice()
Rust strings are sequences of variable length unicode chars, so are not cheaply randomly accessible. use .chars()
to iterate the characters sequentially.
Rust strings are not null-terminated like C strings - length information is held in the collection or reference (e.g. a slice is a pointer + size)
C++ | Rust |
---|---|
std::array<T,N> |
[T; N] |
std::array<int, 3> a={{1, 2, 3}} |
let a = [1,2,3] |
std::vector<T> |
Vec<T> |
std::vector<int> myvec={1, 2, 3} |
let myvec = vec![1, 2, 3] |
The &[]
and &mut []
types represent a fixed-size slice (immutable and mutable respectively) of any kind of vector.
Rust's enum
is a sum type like boost::variant
(tagged union). The match
expression is used to pattern match out of it, replacing the usage of a
switch
with a C++ enum
, or a visitor with boost::variant
. A match
expression is required to cover all possible cases, although a default fallback
can be used.
C++ | Rust |
---|---|
enum T {Foo,Bar,Baz} |
enum T {} , C like enums |
union {} |
enum T {} - tagged-union, variants have fields |
boost::variant |
enum |
boost::optional |
Option (an enum ) |
switch foo.kind {
` case BAR: ...break;` |
match foo {
` Bar => ...,` |
Option
and Either
are very common patterns, so they're predefined along
with utility methods.
Option<Box<T>>
and Option<&T>
are optimised by the compiler to be stored as a single pointer, with a null value representing 'None'.
A Rust struct
is similar to struct
/class
in C++, and uses a memory layout
compatible with C structs.
Rust has no conversion operators - it is necessary to call explicit methods such as .to_str()
to perform conversions. The philosophy is to make the cost (no hidden allocations) & code-path clearer.
Rust has no special syntax or semantics for constructors, and just uses static methods that return the type. Struct initializer syntax SomeStruct{field:value..}
avoids the boilerplate of manually creating simple constructors that just copy values into fields.
Custom destructors for non-memory resources can be provided by implementing the
Drop
trait.
rust collections and smart pointers manage memory similarly to C++ RAII techniques.
Struct fields and functions are private by default, but can be marked with pub
.
Privacy works at the module level: private fields are still visible to functions and methods throughout a module, and any sub-modules.
This eliminates the need for C++ 'friend' or 'protected'; C++ can in some situations use a class as a module-like entity. See Modules. Where a C++ programmer might use a class with friends, a rust programmer places the group of classes in a module.
Tuples are built into the language syntax and along with destructuring assignment (available in arguments, match
, let
) they provide a convenient way of grouping multiple values. There are also 'tuple structs', where the type is named but the fields are not.
Rust has no concept of a copy constructor and only shallow types are implicitly
copyable. Assignment or passing by-value will only do an implicit copy or a
move. Other types can implement the Clone
trait, which provides a clone
method.
Rust will implicitly provide the ability to move any type, along with a swap implementation using moves. The compiler enforces the invariant that no value can be read after it was moved from.
Rust does not allow sharing mutable data between threads, so it isn't vulnerable to races for in-memory data (data races). Instead of sharing data, message passing is used - either by copying data, or by moving owned pointers between tasks.
Rust does not include C++-style exceptions that can be caught, only uncatchable
unwinding (panic
) that can be dealt with at task boundaries. The lack of
catch
means that exception safety is much less of an issue, since calling a
method that panics will prevent that object from ever being used again.
Generally, errors are handled by using a enum
of the result type and the
error type, and the result
module implements related functionality.
In addition to the prevention of null or dangling pointers and data races, the Rust compiler also enforces that data be initialized before being read. Variables can still be declared and then initialized later, but there can never be a code path where they could be read first. All possible code paths in a function or expression must also return a correctly typed value, or fail.
This also means indexing a string/vector will do bounds checking like
vector<T>::at()
(though there are unsafe functions in vec::raw
that allow bypassing the bounds checks)
unsafe {}
blocks are used to enable the use of raw pointers, *T, on which you can do pointer-arithmetic and dereferencing to recover all the power of C and C++. Functions may be marked as unsafe, hence only callable from unsafe blocks.
Unsafe code is used to implement standard library types, providing safe abstractions for most tasks.
by contrast, an entire C++ program is effectively an 'unsafe block' - if a rust application does 'segfault', the fault lies with the design of an unsafe library, rather than any safe code using it.
Thanks to memory/pointer safety and more rigorous immutability, rust can guarantee when multiple writable pointers don't access the same object. This gives more opportunities for caching values in registers, reducing loads and stores, and low level scheduling optimizations by the compiler. This avoids the need for 'restrict' that is used as an extension in some C/C++ compilers for performance sensitive code.
C++ | Rust |
---|---|
class Foo{ void bar()} |
struct Foo;
impl Foo { fn bar(); } |
size_t foo() const |
fn foo(&self) -> uint |
size_t foo() |
fn foo(&mut self) -> uint |
static size_t foo() |
fn foo() -> uint |
Methods in Rust are similar to those in C++, but Rust uses explicit self,
(which must be declared in the parameter list) so referring to a member/method
from within a method is done with self.member
.
Methods for T are added in impl T {}
blocks, rather than being declared inside a struct/class. impl <Trait> for <Type> {}
is required to declare methods useable in polymorphic code (generics or trait-objects)
It's also possible to take self
by-value, which allows for moving out of
self
since the object will be consumed.
For an example, the PriorityQueue
struct in the standard library has a
to_sorted_vec
method that takes self by-value, making
PriorityQueue::from_vec(xs).to_sorted_vec()
an in-place heap sort.
Rust generics are more like the proposed concepts extension for C++, not like the templates that exist today. This allows templates to be type-checked at the time you write them, not when you instantiate them.
In Rust, functions and structs can be given generic type annotations (like templated functions and structs), but to actually call a type's methods and use operators, the type annotations must be explicitly bounded by the traits required. Annotating the type bounds for functions is only necessary at the module-level, and not for nested closures where it is inferred.
Despite the different model in the type system, traits are compiled to code similar to what templates would produce.
Rust can only select function overloads based on the receiver (and sometimes through return values due to Hindley-Milner type inference).
To 'overload' functions on multiple parameters, one must create an extra layer of traits for something that looks like double-dispatch, with each parameter being moved to the receiver by an intermediate function. Although this is significantly more boilerplate, the benefit is that it's also ready for runtime polymorphism through trait-objects if needed, and traits improve error messages for generics(templates).
Without overloading multiple parameters, the information you pay in writing more specific function calls is sometimes recovered with HM type inference feeding back through its' arguments.
You can also consider Enums for conveniently passing varied options to a function - or use rusts powerful macros for wrapping extra convenience around function calls (eg, it is possible to annotate arguments with keywords & special syntax, handle variadics etc)
Instead of adding virtual functions to classes, Rust uses Trait Objects. A trait object is a fat pointer to data and an associated vtable, created by "&<expr> as &<traitname>". Any number of traits can be implemented for an existing type without needing to establish an inheritance heirachy, and there is no 'diamond inheritance problem'.
It is possible to attach a new vtable to an existing struct instance (with a borrowed pointer) - so one library can take data from another library, without the other having to have even known the interfaces required.
Rust includes macros (not textual ones) and syntax extensions which can be used for many of the other use cases of templates and constexpr
.
These macros handle repeated items in patterns, can match expressions,blocks parsed properly, and separate arguments with custom syntax elements. As such they can achieve what the 'x-macros' pattern in C does in a single macro definition/invocation.
Syntax extensions can also be used in place of user-defined literals.
Being handled within the AST, they do not have many of the hazards of C macros and are considered to be a useful tool, rather than an obsolete anti-feature as in C++.
Rust's modules don't work like most other languages. Modules are namespaces, not 'imported' into individual sources.
mod <module_name>;
is used from the crate root or directory 'mod.rs' file to reference other source files by mounting their contents in the module tree.
'mod.rs' files are a special workaround to allow rust's modules tree to mimic a directory structure.
Any public items can be referenced from other modules using the full root-relative path e.g. ::foo::bar, or relative using self:: , super::. "use" directives are a little like 'using namespace' or 'using', and filling some of the role of '#include' - providing shorter aliases for paths, or bringing individual symbols into scope.
A common point of confusion is that what appears in use
directives is always crate relative, whilst throughout the rest of the source file, paths are relative to the current module.
'mod <name> { ..definitions..}' statements can create nested namespaces containing items, replacing the use of nested classes in C++.