Skip to content

Hobby WASM compiler project that has gotten out of control, send help

License

Notifications You must be signed in to change notification settings

theidiotmachine/WA1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hello WA1

WA1 is the working name for a language that compiles down to Web Assembly. It is a research project. From it has come two things: the prototype language that is implemented in this project and described in this document; and the design document WHITEPAPER.MD. This white paper contains the design for a much larger scope project. It couldn't have been written without the hard slog of putting this code in place.

I would like to take the opportunity to thank the authors of all the open source libraries that I use. For a full attribution list, please see the LICENSE.md.

Yes, it needs a fancy name. At some point it will get that.

A very quick introduction

WA1 is a strongly-typed expression-based language, which borrows from TypeScript, Rust and Scala. Here we use 'expression-based' to mean that, generally, the last value of a block is its return value, and you can compose expressions freely.

Because we are not monsters, classic control flow also works, so you can still use return if you want.

So, a simple function is written like this. In simple build mode, use the export keyword to make it visible outside web assembly.

export fn mul(x: Number, y: Number) -> Number {
    x * y
}

But this is also fine:

fn mul3(x: Number, y: Number, z: Number) -> Number {
    return x * y * z
}

if is an expression too!

export fn negMax(x: Number, y: Number) -> Number {
    if(x > y)
        -x
    else {
        -y
    }
}

But it doesn't need to be.

export fn bang(var x: Number) -> Number {
    if(x <= 0)
        return 0

    let var out: Number = 1.0

    while(x > 0) {
        out *= x
        x -= 1
    }

    out
}

There are some examples in the tests/simple/one.wa1 file which contains our original test!

Using it

To build the compiler, first you need Rust. Install cargo, and make sure you are up to date with rustup update. Then, run

cargo build

in the root. To run the compiler, you need to pass the compiler some arguments.

cargo run -- build-simple tests/simple/one.wa1 -o=tests/simple/out.wasm

will parse tests/simple/one.wa1 and generate tests/simple/out.wasm.

Running

To run, you will need some way of running wasm.

Runner

We ship a runner built on wasmtime for easy testing. To run, call

cargo run --bin test-runner -- tests/simple/out.wasm -f 'fourTimes(8);'

This will run the init function (always called __wasm_call_ctors for compatibility with existing WASM conventions) and then the entry point you give it.

Webpage

One way is to use a webpage that looks something like this:

<html>
<body>
  <script>
    fetch('out.wasm').then(response =>
      response.arrayBuffer()
    ).then(bytes =>
      WebAssembly.instantiate(bytes, {imports: {}})
    ).then(results => {
      results.instance.exports.__wasm_call_ctors();
      window.addd = results.instance.exports.addd; //or whatever is exported from the file
    });
  </script>
</body>
</html>

Serve it with http-server; install that with

npm install -g http-server

and manually call in your browser in the debugger by typing the function name (called addd out of the box). Note that you have to call __wasm_call_ctors manually. We don't use start functions, because the wasm linker tools warn against it.

Another quite quick introduction but less quick than the last one

Variables

Local and global consts are declared like this.

let s = expr

The type of s is inferred from the expression you assign it.

Variables use the var keyword, meaning that you can change what the variable contains.

let var t = 5
t += 1

A word on semi-colons

You don't need 'em; a return character is equivalent. The things in javascript, like getting weird problems with return, are not an issue here.

Built in types

  • Number - a 64 bit float.
  • Int - An integer. You specify bounds as generic arguments, so Int<-2147483648, 2147483647> is a 32 bit signed int. You should rarely need to do that, though; by default you get a 32 bit signed, and most things will take that. The compiler tries to do most safe casting for you. This is... a tiny bit crazy and I may not keep it. Ints that stay within the safe 64 bit float range can be auto-casted to Number.
  • Bool - A boolean.

Casting

As mentioned above, numeric types will auto-cast. If you need to manually invoke casting, the 'as' keyword is your friend.

//totally spurious example
let a = 9 as Number

Member functions on ints

Currently there are four legal int member functions. These are somewhat bodged today.

let i = 256
let clz = i.countLeadingZeros()
let ctz = i.countTrailingZeros()
let shl = i.shiftLeft(3)
let shr = i.shiftRight(3)

Functions

Functions are declared as follows. Use thin arrows for explicit return type declaration, fat arrows will let the type engine figure it out.

//explicit return type
fn name(arg: Type) -> Type {
    body
}

//implicit return type
fn name(arg: Type) => {
    body
}

The deduction algorithm for return types is very simple. Don't be surprised if you need to help it.

Generic Functions

Astonishingly, generic functions are supported. Here is the identity function. Not all types can be applied, but that's a work in progress. Stonishingly, expect bugs (but I am fixing them.)

fn id<T>(x: T) -> T { x }

To use them you can provide the types, or have it deduce them.

let n1: Number = 4
let n2 = id<Number>(n1)
let n3 = id(n1)

The deduction algorithm is very simple, and just matches function argument types against what it's been given. This means that it won't be able to deduce a type variable that is only used as a return type.

They are not actually true generics. They specialize for some types -- at the moment, the numeric types -- which means I can avoid a box. It also means they are sort of a mix between templates and generics. Some days I like saying 'genemplates', other days 'templerics' is more fun.

Control

Blocks

The return value of a block is the last expression. So a function that added two numbers would be this:

fn add(x: Number, y: Number) => x + y

This works because a function expects a block; a single expression or a squiggly bracket enclosed list of expressions is a block; the last expression of a block is what it returns.

if

if is an expression, if it has an else branch.

let a = if(b) 3; else 4

Type inference will do some work for you if the branches are a bit different, but will complain if they are radically different.

You don't have to have an else branch, mind; you just can't use it as an expression.

if(k != 4)
    j = true

while

while is not an expression.

while(a > 3) {
    a -= 1

    if(a == 100)
        break
}

break and continue work as you would expect.

return

return causes early return of the function you are in.

return 4;

You don't need it at the end of a function (and is considered bad practice), but it is useful for flow control in complicated functions. If you combine fat arrow with return, we try and guess based on all the returns in the function.

User types

You can, obviously, define your own types, built from the system types. This is still very much a work in progress.

alias

To create a type alias, use the alias keyword. A type alias is what it sounds like, a different name for the same type, not a different type.

alias __size_t = Int<0, 4294967295>

type

The type keyword is the simplest (and currently only) true user type. It wraps an existing type, and then supplements it with extra member functions.

Member functions have access to a this local variable, which is of the type defined. To copy the inner data, use this.getInner(). To copy the inner data such that you can mutate this copy, use this.getInnerMut().

type One = Int {
    constructor(x: Int) => x
    
    fn add(r: Int) -> Int this.getInner() + r
}

It's important to understand that getInner and friends copy. WA1 is pass-by-value by default. This means in the above example, you will get an int in the add function, but it's a local variable which is a copy of the this value. There is no setInner function.

The constructor keyword is the sole way of constructing. This essentially casts an object of the inner type into the outer type. This is subject to change. You use the new keyword to construct.

fn testOne() => {
    let oneOne = new One(3)
    assert.assert(oneOne.add(4) == 7)
}

Member functions may be private (put the private keyword before the fn).

  • Private means that you can only call the function in a member function of the same type. This means you can call other instance's private functions.
  • Protected is not yet implemented.

trait

Traits in WA1 are purely compile-time constructs. They are a set of member functions that a type variable can assert that it supports.

They are declared like this.

trait TraitOne{
    fn a() -> Void
    fn b(x: Int) -> Int
    fn c() -> TraitOne
}

Note the use of the trait name as a return value. This is completely fine, but traits are not actually types. Instead, the TraitOne return type is a generic type variable that will be resolved later.

You use them like this.

//generic function that asserts a type variable conforms to a trait, and calls a function on it
fn fb<T: TraitOne>(x: T) -> Int x.b(1)

In theory you can specify a set of traits that the type variable supports. We borrow the lovely union and intersection syntax from TypeScript for this. So, in a magical world this might work:

fn fb<T: TraitOne & TraitTwo>(x: T) -> Int x.b(1)

This means that the type variable T supports the intersection of TraitOne and TraitTwo, which means the union of their member functions. Likewise the | operator is the union of traits, which means the intersection of their member functions. Read the operator as you would a boolean - in the above example T has TraitOne and TraitTwo so it gets both sets.

I say 'magical world' because I haven't written any tests for it so it probably has bugs.

In order to implement a trait, you have to use an implement keyword. You don't have to provide all the functions: if the type already implements the functions, just don't include them in the implement block.

type TypeOne = Int{
    constructor(x: Int) => x

    fn a() -> Void {}
    fn inner() -> Int this.getInner()
}

implement TraitOne for TypeOne {
    fn b(x: Int) -> Int {
        this.getInner() + x
    }
    fn c() -> TypeOne {
        new TypeOne(this.getInner() * 2)  
    }
}

I dislike the word implement. I think I may switch to express.

Then, using them feels like a regular generic function invocation.

fn testOne() -> Void {
    let x = new TypeOne(8)
    assert.assert(fb(x) == 9)
}

These are, of course, inspired by Haskell's type classes and are a little closer to them than Rust or Scala's traits given they are pure compile-time constructs. But they are also obviously much less powerful beasts, for good or for ill.

They almost certainly don't work in a million weird ways, because I literally just finished typing them, so don't expect happiness.

Type guards

User defined

Inspired by TypeScript, WA1 contains type guards. In TypeScript these are used to tell one JavaScript blob from another. Here, we use them to do limited scope down-casting. What this essentially means is this. Given a bit of code of the form:

let x: Bar = whatever()
if(isFoo(x)) {
    x.methodOnFoo()
}

any local access of x in the block will automatically cast x to a Foo. This lasts until the block is existed, or x is assigned to. Creating a type guard function is an unsafe operation, so needs to be done in unsafe mode.

On integers

Type guards also work on integers. Comparison operators will narrow integer ranges, casting if necessary.

Linker

Instead of using the simple build mode described above, you may use a full build mode. For this we generate WASM object files, which, because they are a standard, can then be linked with an external linker. A very simple example exists in the tests folder.

The linker exe

The linker we use is wasm-ld, given that is the only working WASM linker. You will need to install it somehow. On Ubuntu you can get that by calling

sudo apt install lld

On Windows probably the easiest is to install a pre-built binary from the releases page, here. MacOS you are on your own, I am afraid. I found building it from source was not impossible, but I imagine you may be able to install from the above link. Once you have an exe name (and in in Ubuntu it is wasm-ld-9, surprisingly) you need to edit the build-wsb.json file you are building to point to the right location. Yes, I know. This should get better as either the LLVM WASM toolchain matures, or I finally build a configuration tool.

Using it

When you are set up, call this to run the sample.

cargo run -- build tests/linker/build-wsb.json --clean

Here, the 'clean' arg will always force a build. You can omit if you wish. The build-wsb.json format is subject to change. However, at the moment, here it is.

{
    "entry_point": {
        "file_name": "two.wa1",
        "is_unsafe": false
    },
    "source_files": [
        {
            "file_name": "one.wa1",
            "is_unsafe": false
        }
    ],
    "src_path": "./src",
    "out_path": "./out",
    "module_name": "linker",
    "wasm_exe": "wasm-ld-9'
}

The entry_point is the place where we do the module exports. So anything exported in there will be exported to the final wasm. Anything exported from the source_files will be visible to other files, but not outside the module.

The only working import syntax is

import {*} from "./one"

where the * means to import everything. Once you import a function f from ./m it will appear as m.f. That means to test the example you would run

cargo run --bin test-runner -- tests/linker/out/linker.wasm -f 'linker.hello(1, 2);'

Unsafe mode

Unsafe mode is designed to be a mode that library writers can write low-level code. It feels like C in that it is not much more than structured WASM. In order to use it you need to somehow pass the unsafe arg to the compiler. Syntax is deliberately scary.

A leading __ is pronounced 'unsafe', by the way.

  • __Ptr type. Pointer to raw memory. mut __Ptr is a pointer where you can change what it points to. __Ptr is a readonly thing.
  • intrinsics - wrap low level wasm calls
    • __memorySize() - memory.size instruction
    • __memoryGrow(0, numPages) - memory.grow instruction
    • __trap() - unreachable instruction
  • __Option - is a essentially a nullable pointer. You can have it be a __Null or __Some<T>. It will end up being the core of the real option. It's kind of horrible because it's pretending to be a union but it really is a nullable pointer.
  • __struct type - to get one you have two horrible choices. You can cast from a __Ptr (this is unsafe. The clue is in the name...), or use __static (which is a C-style static allocation). To define one, type __struct Hello { a: Int; }. The static creation is, __static mut Hello {a: 3}. Omit the mut for it and the things it holds to be immutable. A __struct is and will always be a raw pointer to memory, used for writing the allocator and other low level things.
  • __typeguard This keyword lets you define type guards. Here is an example.
export fn __Option_isNull<T: __struct T>(x: __Option<T>) -> Bool __typeguard {
    true => __Null
    false => __Some<T>
} {
    x == __Null
}

TODO

  1. Typing
    1. every expr needs a return type!
    2. functions decls are exprs, not statements
    3. Infer return types
    4. Handle never properly - i.e. complain about dead code
    5. do proper auto-widening. means you can widen anything to unknown. Needed for templates, I think.
      1. means subsuming the number code, which is probably a good thing
    6. typescript-style const types, where a = 0 means typeof a == 0
    7. move the bin op types from the ast
    8. clean up the cast code
    9. bool literals
    10. check __Option equality
    11. consts properly rolled into the type system (aka the Konst Waaagh)
    12. privacy
    13. be honest about int and ptr member functions (currently they are bodged)
  2. Expression based language
    1. blocks return a value
    2. no need for a return statement (but still supported - what is this, scala?)
    3. if as an expression
  3. Error handling
    1. all exprs have a loc
    2. all errors have a loc
  4. Control flow
    1. if
    2. while
    3. simple for loops
    4. assignments and consts
    5. global variables
    6. match statements
      1. implement this for strings
    7. assign is void
  5. Strings
    1. a 'char' is a unicode grapheme cluster - see this - this guy has some cool rust libs
  6. Optimise steps. These should be on even on O0
    1. use the 'consume' code to not write drops
    2. remove returns right before end
    3. change tee + drop into set
    4. remove get + drop
    5. remove casts straight after consts, and just have the appropriately typed const inline
    6. == 0 is eqz
    7. globals that are init to a const should use the wasm global init mechanism
    8. when calling __static, copy true data into the data section and drop the initializer expression
    9. tail recursion
  7. Function calling
    1. simple static function call
    2. functions as first class objects (but not closures)
    3. allow tuples to be used as function args (so e.g. instead of f(1, 2), we also allow let t = (1, 2); f t)
  8. Known bugs
    1. fairly sure prefix unary operators are wrong - may need to start at a precedence
    2. the lexer doesn't parse negative numbers!
    3. import order is not correct - imports need to be first - need a secondary mapping. Claim fixed, needs to be tested
    4. type guards - in unguarded loops
  9. Inlining
    1. inline numeric constants?
    2. or full blown inlining, of which this is just a special case?
      1. initially write a function that inlines code that contains no returns as an ast operation
      2. copy any non-variable exprs into local variables
  10. Purity
    1. Record (some? one?) of states pre expr
      • globally pure (no non-const global access)
      • locally pure (true purity)
      • locally impure, globally pure (essentially reassignment to a local variable - referential integrity but not proper FP)
      • changing passed in object
      • side effecting (e.g. writing a mutable global, writing a file)
      • non-deterministic (e.g. currenttime, memorysize, reading a global or file)
    2. evaluate globally and locally pure functions if they have literals passed in. Is this... insane?
  11. A runtime
    1. a simple allocator
    2. a small object allocator
    3. a gc based on the rajan & bacon paper
      1. but with assignment counter
  12. Structs, ptrs
    1. structs - initial code
    2. union types are discriminated unions like Rust enums - yeah!
      1. typescript lets us call members that are in all the types in the unions. This is very convenient, but hard here. Proposal: generate accessor functions for all union members, route to the appropriate type, insert that. Not free, but not terrible either.
    3. option is union T | null... maybe?
      1. means map over a union is well understood, I guess
    4. make small ints fit into small spaces
    5. opt: booleans are 1 bit in structs, 32 bits on the heap?
  13. Types
    1. tuples - are just collections of locals, so pass by value (needs wasm multi value for some things)
    2. tuples - need a syntax for tuple 1. Probably Tuple<int>(3)
    3. type keyword is a first class type, not an alias (might as well, eh? I always disliked that Scala and TS do that)
  14. Closure generation and usage
    1. closure capture
    2. call function with closure
    3. optimization - don't capture consts, you have all the information to roll them inline. This is awesome because I don't think JS can do this
  15. templerics
    1. for funcs generate meta code to turn into templates
    2. for types generate meta types
    3. implicit type args on instantiation
    4. support fat arrow
    5. partial instantiation - specifically fn f1<T>(x: T) => {} fn f2<U>(x: U) => f1<U>(x) currently generates an UnresolvedTypeArg error
  16. containers
    1. arrays
      1. pass by value. increment rc on creation, if function contains a modification, not otherwise
      2. 'ref' keyword for pass by ref
      3. if a function takes an object and modifies it and returns that object, mark it. When you do let a = f(a) you should be able to remove ref count incs
      4. a function that takes an object by ref and modifies it is marked. it will not inc the ref count; instead, the calling function does.
    2. hashmaps (probably called 'objects' to be JS friendly)
    3. map/foreach are macros
    4. more complex for loops (in, of)
  17. Numbers
    1. there is an i32 type, an i64 type and an 164 type; the compiler auto promotes i32 to the other two types
    2. colors are a special type
  18. Imports and exports
    1. syntax
      1. import {*} as x from y => means will get everything from y as x.a, x.b
      2. import {a} as x from y => means will get x.a
      3. import {*} from y => means will get y.a, y.b, etc
      4. import {a} from y => means will get y.a
    2. A stage 1 parser that will generate a list of file exports - globals, functions, types
    3. A file format for that
    4. When import commands are run, load that, pull the imports in
    5. Functions to be inlined
    6. types - I think import doesn't work properly
  19. Linker
    1. object file format that contains
    2. use wasm-ld
    3. fatal-warnings
  20. tool chain
    1. npm
    2. binaryen optimizer

About

Hobby WASM compiler project that has gotten out of control, send help

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published