Skip to content

Latest commit

 

History

History
1155 lines (903 loc) · 28.2 KB

presentation.org

File metadata and controls

1155 lines (903 loc) · 28.2 KB

Using Types Effectively

Using Types Effectively?

What does that mean?

The recent evolution of C++ is (from one point of view) largely about strengthening and expanding the capabilities for dealing with types.

  • expansion of type_traits
  • decltype to utter types
  • auto to preserve types, prevent conversions, infer return types
  • nullptr to prevent int / pointer confusion
  • scoped enum
  • GSL: owner<T>, not_null<T>
  • Concepts TS

FP isn’t (only) about

  • first class functions
  • higher order functions
  • lexical scoping, closures
  • pattern matching
  • value semantics
  • immutability
  • concurrency through immutability
  • laziness
  • garbage collection
  • boxed data types / “inefficient” runtime models
  • the M-word

FP is (also, importantly) about

  • using types effectively and expressively
  • making illegal states unrepresentable
  • making illegal behaviour result in a type error
  • using total functions for easier to use, harder to misuse interfaces

Why does C++ have a type system?

  • to help programmers?
  • to hinder programmers?
  • because objects?
  • for the compiler’s benefit?

What has a type system ever done for us?

Why does C have a type system?

“The machines on which we first used BCPL and then B were word-addressed, and these languages’ single data type, the ‘cell,’ comfortably equated with the hardware machine word. The advent of the PDP-11 exposed several inadequacies of B’s semantic model.

First, its character-handling mechanisms were clumsy. Second, although the original PDP-11 did not provide for floating-point arithmetic, the manufacturer promised that it would soon be available. Finally, the B and BCPL model implied overhead in dealing with pointers.

For all these reasons, it seemed that a typing scheme was necessary. Other issues, particularly type safety and interface checking, did not seem as important then as they became later.”

– dmr, /The Development of the C Language/

The PDP-10 was old, with its 36-bit words.

1971/72: The PDP-11 was the new hotness.

It could operate on (8-bit) bytes and (16-bit) words. If your language only operates on words (‘cells’), string/char handling is awkward.

In B, pointers were indices into arrays rather than naked addresses. So scale conversion would be needed at runtime.

DEC promised floating point capability in hardware! So the C compiler would need to know about types in order to output the correct instructions.

What is a type?

  • A way for the compiler to know what opcodes to output (dmr’s motivation)?
  • The way data is stored (representational)?
  • Characterised by what operations are possible (behavioural)?
  • Determines the values that can be assigned?
  • Determines the meaning of the data?

What is a type?

./int_bool_1.png

“Only Lua would have ’1 == true’ evaluate to false. #wantmydayback”

./int_bool_2.png

“But, how can 1 be equal to true? 1 is an integer, and true is a boolean. Lua seems to be correct here. It’s your view of the world that has been warped.”

(Smiley faces make criticism OK!)

What is a type?

  • The set of values that can inhabit an expression
    • may be finite or “infinite”
    • characterized by cardinality
  • Expressions have types
    • A program has a type

Let’s play a game

To help us get thinking about types.

I’ll tell you a type.

You tell me how many values it has.

There are no tricks: if it seems obvious, it is!

Level 1

Types as sets of values

Level 1

How many values?

bool;

2 (true and false)

Level 1

How many values?

char;

256

Level 1

How many values?

void;

0

struct Foo { Foo() = delete; };
struct Bar { template <typename T> Bar(); };

Level 1

How many values?

struct Foo {};

1

Level 1

How many values?

enum FireSwampDangers : int8_t {
  FLAME_SPURTS,
  LIGHTNING_SAND,
  ROUSES
};

3

Level 1

How many values?

template <typename T>
struct Foo {
  T m_t;
};

Foo has as many values as T

End of Level 1

Algebraically, a type is the number of values that inhabit it.

These types are equivalent:

bool;

enum class InatorButtons {
  ON_OFF,
  SELF_DESTRUCT
};

Let’s move on to level 2.

Level 2

Aggregating Types

Level 2

How many values?

std::pair<char, bool>;

256 * 2 = 512

Level 2

How many values?

struct Foo {
  char a;
  bool b;
};

256 * 2 = 512

Level 2

How many values?

std::tuple<bool, bool, bool>;

2 * 2 * 2 = 8

Level 2

How many values?

template <typename T, typename U>
struct Foo {
  T m_t;
  U m_u;
};

(# of values in T) * (# of values in U)

End of Level 2

When two types are “concatenated” into one compound type, we multiply the # of inhabitants of the components.

This kind of compounding gives us a product type.

On to Level 3.

Level 3

Alternating Types

Level 3

How many values?

std::optional<char>;

256 + 1 = 257

Level 3

How many values?

std::variant<char, bool>;

256 + 2 = 258

Level 3

How many values?

template <typename T, typename U>
struct Foo {
  std::variant<T, U>;
}

(# of values in T) + (# of values in U)

End of Level 3

When two types are “alternated” into one compound type, we add the # of inhabitants of the components.

This kind of compounding gives us a sum type.

Level 4

Function Types

Level 4

How many values?

bool f(bool);

4

Level 4

Four possible values ./function_bool.svg

Level 4

bool f1(bool b) { return b; }
bool f2(bool) { return true; }
bool f3(bool) { return false; }
bool f4(bool b) { return !b; }

Level 4

How many values?

char f(bool);

256 * 256 = 65,536

Level 4

How many values (for f)?

enum class Foo
{
  BAR,
  BAZ,
  QUUX
};
char f(Foo);

256 * 256 * 256 = 16,777,216

Level 4

The number of values of a function is the number of different ways we can draw arrows between the inputs and the outputs. ./function.svg

Level 4

How many values?

template <class T, class U>
U f(T);

$|U||T|$

End of Level 4

When we have a function from $A$ to $B$, we raise the # of inhabitants of $B$ to the power of the # of inhabitants of $A$.

End of Level 4 (corollary)

Hence a curried function is equivalent to its uncurried alternative.

$$\begin{align*} Funcurried::(A,B) → C & ⇔ CA*B & = CB*A \\ & = (C^B)^A \\ & ⇔ (B → C)^A \\ & ⇔ Fcurried::A → (B → C) \end{align*}$$

Victory!

Equivalences

template <typename T>
struct Foo {
  std::variant<T, T> m_v;
};

template <typename T>
struct Bar {
  T m_t;
  bool m_b;
};

We have a choice over how to represent values. std::variant will quickly become a very important tool for proper expression of states.

This is one reason why std::variant’s “never-empty” guarantee is important.

Algebraic Datatypes

This is what it means to have an algebra of datatypes.

  • the ability to reason about equality of types
  • to find equivalent formulations
    • more natural
    • more easily understood
    • more efficient
  • to identify mismatches between state spaces and the types used to implement them
  • to eliminate illegal states by making them inexpressible

Making Illegal States Unrepresentable

std::variant is a game changer because it allows us to (more) properly express types, so that (more) illegal states are unrepresentable.

./variant-tweet.png

Making Illegal States Unrepresentable

Let’s look at some possible alternative data formulations, using sum types (variant, optional) as well as product types (structs).

Example: Connection State

enum class ConnectionState {
  DISCONNECTED,
  CONNECTING,
  CONNECTED,
  CONNECTION_INTERRUPTED
};

struct Connection {
  ConnectionState m_connectionState;

  std::string m_serverAddress;
  ConnectionId m_id;
  std::chrono::system_clock::time_point m_connectedTime;
  std::chrono::milliseconds m_lastPingTime;
  Timer m_reconnectTimer;
};

Example: Connection State

struct Connection {
  std::string m_serverAddress;

  struct Disconnected {};
  struct Connecting {};
  struct Connected {
    ConnectionId m_id;
    std::chrono::system_clock::time_point m_connectedTime;
    std::optional<std::chrono::milliseconds> m_lastPingTime;
  };
  struct ConnectionInterrupted {
    std::chrono::system_clock::time_point m_disconnectedTime;
    Timer m_reconnectTimer;
  };

  std::variant<Disconnected,
               Connecting,
               Connected,
               ConnectionInterrupted> m_connection;
};

Example: Nullable field

class Friend {
  std::string m_alias;
  bool m_aliasPopulated;
  ...
};

These two fields need to be kept in sync everywhere.

Example: Nullable field

class Friend {
  std::optional<std::string> m_alias;
  ...
};

std::optional provides a sentinel value that is outside the type.

Example: Monster AI

enum class AggroState {
  IDLE,
  CHASING,
  FIGHTING
};

class MonsterAI {
  AggroState m_aggroState;

  float m_aggroRadius;
  PlayerId m_target;
  Timer m_chaseTimer;
};

Example: Monster AI

class MonsterAI {
  struct Idle {
    float m_aggroRadius;
  };
  struct Chasing {
    PlayerId m_target;
    Timer m_chaseTimer;
  };
  struct Fighting {
    PlayerId m_target;
  };

  std::variant<Idle, Chasing, Fighting> m_aggroState;
};

Example: Design Patterns

The addition of sum types to C++ offers an alternative formulation for some design patterns.

State machines and expressions are naturally modelled with sum types.

Example: Design Patterns

  • Command
  • Composite
  • State
  • Interpreter

Sum types vs Runtime Polymorphism

Runtime polymorphism (i.e. regular OO interface/implementation) allows manipulation of heterogeneous state with a uniform interface.

Sum types allow manipulation of heterogenous state and interface in a homogeneous way.

Designing with Types

std::variant and std::optional are valuable tools that allow us to model the state of our business logic more accurately.

When you match the types to the domain accurately, certain categories of tests just disappear.

Designing with Types

Fitting types to their function more accurately makes code easier to understand and removes pitfalls.

The bigger the codebase and the more vital the functionality, the more value there is in correct representation with types.

Using Types to Constrain Behaviour

We’ve seen how an expressive type system (with product and sum types) allows us to model state more accurately.

“Phantom types” is one technique that helps us to model the behaviour of our business logic in the type system. Illegal behaviour becomes a type error.

Phantom Types: Before

std::string GetFormData();

std::string SanitizeFormData(const std::string&);

void ExecuteQuery(const std::string&);

An injection bug waiting to happen.

Phantom Types: The setup

template <typename T>
struct FormData {
  explicit FormData(const string& input) : m_input(input) {}
  std::string m_input;
};

struct sanitized {};
struct unsanitized {};

T is the “Phantom Type” here.

Phantom Types: After

FormData<unsanitized> GetFormData();

std::optional<FormData<sanitized>>
SanitizeFormData(const FormData<unsanitized>&);

void ExecuteQuery(const FormData<sanitized>&);

Total Functions

A total function is a function that is defined for all inputs in its domain.

template <typename T> const T& min(const T& a, const T& b);

float sqrt(float f);

Let’s play another game

To help us see how total functions with the right types can result in unsurprising code.

I’ll give you a function signature with no names attached.

You tell me what it’s called… (and you’ll even know how to implement it).

The only rule… it must be a total function.

Name That Function

template <typename T>
T f(T);

identity

int f(int);

Name That Function

template <typename T, typename U>
T f(pair<T, U>);

first

Name That Function

template <typename T>
T f(bool, T, T);

select

Name That Function

template <typename T, typename U>
U f(function<U(T)>, T);

apply or call

Name That Function

template <typename T>
vector<T> f(vector<T>);

reverse, shuffle, …

Name That Function

template <typename T>
T f(vector<T>);

Not possible! It’s a partial function - the vector might be empty.

T& vector<T>::front();

Name That Function

template <typename T>
optional<T> f(vector<T>);

Name That Function

template <typename T, typename U>
vector<U> f(function<U(T)>, vector<T>);

transform

Name That Function

template <typename T>
vector<T> f(function<bool(T)>, vector<T>);

remove_if, partition, …

Name That Function

template <typename T>
T f(optional<T>);

Not possible!

Name That Function

template <typename K, typename V>
V f(map<K, V>, K);

Not possible! (The key might not be in the map.)

V& map<K, V>::operator[](const K&);

Name That Function

template <typename K, typename V>
optional<V> f(map<K, V>, K);

lookup

What Just Happened?

I gave you almost nothing.

No variable names. No function names. No type names.

Just bare type signatures.

You were able to tell me exactly what the functions should be called, and likely knew instantly how to implement them.

You will note that partial functions gave us some issues…

Well-typed Functions

Writing total functions with well-typed signatures can tell us a lot about functionality.

Using types appropriately makes interfaces unsurprising, safer to use and harder to misuse.

Total functions make more test categories vanish.

About Testing…

In a previous talk, I talked about unit testing and in particular property-based testing.

Effectively using types can reduce test code.

Property-based tests say “for all values, this property is true”.

That is exactly what types are: universal quantifications about what can be done with data.

Types scale better than tests. Instead of TDD, maybe try TDD!

Further Down the Rabbit Hole

Thanks For Listening

“On the whole, I’m inclined to say that when in doubt, make a new type.”

– Martin Fowler, /When to Make a Type/

“Don’t set a flag; set the data.”

– Leo Brodie, Thinking Forth

Goals for Well-typed Code

  • Make illegal states unrepresentable
  • Use std::variant and std::optional for formulations that
    • are more natural
    • fit the business logic state better
  • Use phantom types for safety
    • Make illegal behaviour a compile error
  • Write total functions
    • Unsurprising behaviour
    • Easy to use, hard to misuse

Epilogue

A taste of algebra with datatypes

A Taste of Algebra with Datatypes

How many values?

template <typename T>
class vector<T>;

We can define a vector<T> recursively:

${v(t)} = {1 + t v(t)}$

(empty vector or (+) head element and (*) tail vector)

A Taste of Algebra with Datatypes

And rearrange…

${v(t)} = {1 + t v(t)}$

${v(t) - t v(t)} = {1}$

${v(t) (1-t)} = {1}$

${v(t)} = {{1} \over {1-t}}$

What does that mean? Subtracting and dividing types?

A Taste of Algebra with Datatypes

When we don’t know how to interpret something mathematical?

${v(t)} = {{1} \over {1-t}}$

A Taste of Algebra with Datatypes

Series expansion at ${t = 0}$:

${1 + t + t^2 + t^3 + t^4 +{ }…}$

A vector<T> can have:

  • 0 elements (${1}$)
  • or (+) 1 element (${t}$)
  • or (+) 2 elements (${t^2}$)
  • etc.

Goals for Well-typed Code

  • Make illegal states unrepresentable
  • Use std::variant and std::optional for formulations that
    • are more natural
    • fit the business logic state better
  • Use phantom types for safety
    • Make illegal behaviour a compile error
  • Write total functions
    • Unsurprising behaviour
    • Easy to use, hard to misuse