Short for 'Basic Lisp', pronounced 'bee-lisp' so as to rhyme with 'ELisp', this is a custom Lisp dialect with a very limited instruction set. I am not very familiar with Lisp so it may or may not end up looking anything like other Lisp variants.
BLisp used to be hosted in my interprete-rs repository. Any history before this repo's original commit can be found there.
For notes on the Collatz conjecture demo see the Example Programs section below.
I believe it is complete with regards to core implementation. I have gotten it working on
simple but non-trivial programs such as fib2.bl
and fib.bl
. I will fix issues that
come up but likely won't make large changes except those listed in the 'Remaining
implementation' section below. I now want to focus on performance improvements. This will
involve creating some benchmarking programs and testing them with the current code, to get
baseline statistics. The general improvements I should make are:
- Get rid of cloning as much as possible
- Instead of cloning values I should make a concerted effort to move them. This
involves things like
Argument::try_to_value: fn(self) -> Result<Value>
andmatch
ing on actual values instead of references. - May be useful to define destructuring methods on some types, e.g.
Value::destructure: fn(self) -> Result<(Type, ValueData)>
. I end up avoiding destructuring because full match statements or if let are ugly and take up 3 full lines minimum. Alternatively I can make a standardized macro format. Perhaps even using proc macros?
- Instead of cloning values I should make a concerted effort to move them. This
involves things like
- Handle collections more efficiently
- A pattern I use commonly is
vec.iter().map(_).collect()
or evenvec.iter().map(_).cloned().collect()
. For this I should try usingvec.into_iter().for_each(_)
or a for loop and editing the vec directly. I should be able to removecollect
almost entirely unless I'm missing something
- A pattern I use commonly is
- I would like to expand certain functions to accept lists. Specifically match functions.
E.g.
(add [a b c] [d e f] = ([(+ a d) (+ b e) (+ c f)])
. - I would like to include basic collection methods like
prod
andsum
- I need to redo the
read
method.- Alternatively add a
readline
method that only works when usingStdin
- Alternatively add a
- Maybe alternative syntax for writes?
- Could also do
writeline
to mimic the current behavior and havewrite
do a simple write
- Could also do
Each BLisp program is a single lisp statement, which simplifies the control flow. If you want multiple statements you must construct a list from them.
As an example, below is a simple programs that prints fibonacci numbers every half-second
until it overflows and crashes the interpreter (also exists as a separate file called
programs/fib2.bl
).
([
(def n1 0u)
(def n2 1u)
(def tmp 0u)
(write n1)
(write n2)
(while true [
(set tmp n1)
(set n1 n2)
(set n2 (+ n2 tmp))
(write n2)
(sleep 500)
])
])
The Collatz conjecture states that if you apply the following rules to any positive integer repeatedly, you'll eventually reach 1:
- If n is even, set
n = n / 2
- If n is odd, set
n = (3 * n) + 1
The number of steps it takes an integer to reach 1
has been researched in-depth, and
OEIS has the first 72 values on its site. Below is a program
that calculates and prints the values for each of the first 72 integers (I've verified
that they match.). This file is also at programs/collatz.bl
([
(init cnt uint)
(init tmp uint)
(def curr 1u)
(while (le curr 72) [
(set cnt 0u)
(set tmp curr)
(while (ne tmp 1) [
(set cnt (+ cnt 1))
(if (eq tmp (* 2 (/ tmp 2)))
(set tmp (/ tmp 2))
(set tmp (+ 1 (* 3 tmp)))
)
])
(write (concat "Steps for " (concat (tostring curr) ": ")))
(write cnt)
(write ())
(set curr (+ curr 1))
(sleep 100)
])
])
I want this language to have a strict type system, analogous to Haskell or Rust. The types I will support:
int
(internally stored asi64
) is a signed int typeuint
(internally stored asu64
) is an unsigned int typefloat
(internally stored asf64
) is a signed floating-point typebool
(internally stored asbool
) is a boolen typechar
(internally stored asu8
) is an ASCII characterlist<ty>
(internal representation tbd) is a list of typety
(which can be any valid type, including another list)string
(internal representation tbd) is an alias forlist<char>
unit
/()
represents an empty value. Mostly used to make sequences of functions easier
Any numeric literal with no period is treated as an integer literal by default. They
default to int
type unless uint
is required in context. To force a literal to be
uint
append a u
to it:
(1)
isint
(1u)
isuint
(-1)
isint
(-1u)
is a compilation error
Any numeric literal with a period is treated as a float literal. To force a numeric
literal to be a float, append f
to it:
(1.)
isfloat
(1.0)
isfloat
(1f)
isfloat
(.1)
is a compilation error since floats must have at least one digit preceding the decimal point(1.0u)
is a compilation error since floats can not be coerced to ints (unsigned or otherwise)
As usual, true
and false
are used
String literals are always sequences of ASCII data enclosed by double quotes ""
.
Examples:
("ABC")
("AOTIENSR_RS_D_SD_*SRT)
('ABC')
is a compilation error since double quotes were not used
By default, any ASCII char surrounded by single quotes '' is a char literal. You can also
force a numeric literal to be a char by appending c
:
('a')
ischar
("a")
isstring
(57c)
ischar
and is equivalent to('9')
(256c)
is a compilation error since the max value of an ASCII char is 255 (including extended ASCII)
Lists are enclosed by []
and their type is determined by the types of the members. If
the members are not all of the same type and they can't be coerced to the same type, the
list declaration is invalid:
[1 2 3]
islist<int>
[1u, 2, 3]
islist<uint>
since the second and third elements are coerced touint
to match the first[1f, 2, 3]
islist<float>
since the second and third elements are coerced tofloat
[true, false]
islist<bool>
[[1u, 2, 3], [4, 5, 6]]
islist<list<uint>>
, the second list is coerced tolist<uint>
[]
is a compilation error since the list's type can't be determined without at least one value (may change later)['a', "bcd"]
is a compilation error sincechar
andstring
are incompatible types[1, 1c]
is a compilation error because there is no implicit coercion from int literal to char (may change)
To allow for coercion we must define a couple abstract types and a hierarchy. To start we should look at which type coercions are possible, and in what contexts:
- Any numeric literal with a decimal point is a
float
and non-coercible - Any numeric literal with a negative sign and no decimal point is
negnum
type, which defaults toint
and can be coerced tofloat
orint
- Any numeric with neither a negative sign nor a decimal point is a
num
type which defaults toint
and can be coerced toint
,uint
, orfloat
- While there is a suffix that turns integer literals into chars, at least for now there
is no implicit coercion between the two. E.g.
[1c, 'd']
is fine but[1c, 2]
or['d', 2]
are errors
Variables are values bound to a name, have an associated type, and can be used in any context that a literal of the same type would be valid (and a few others).
+
oradd
(+ 1 2) = (add 1 2) = 3
(+ T T) -> T
is defined forT: int | uint | float
-
orsub
(- 2 1) = (sub 2 1) = 1
(- T T) -> T
is defined forT: int | uint | float
*
ormul
(* 2 3) = (mul 2 3) = 6
(* T T) -> T
is defined forT: int | uint | float
/
ordiv
(/ 10 2) = (div 11 2) = 5
since this would be integer division(/ 9.0 2.0) = (div 9.0 2.0) = 5.0
(* T T) -> T
is defined forT: int | uint | float
write
(print "ABC")
prints "ABC" to stdout and returns it for convenience(print T) -> T
is defined forT: string
read
(read 10)
reads up to 10 characters from the configured reader, returning result asstring
(read T) -> U
is defined forT: uint | int > 0 | number, U: string
?
orif
(if true "option1" "option2") = (? false "option2" "option1") = "option1
(? T U U) -> U
is defined forT: bool, U: any
while
(while true (print "ABC"))
prints "ABC" forever(while T U) -> ()
is defined forT: bool, U: any
Note: Boolean logic is not lazy, e.g. (and false (write x))
still evaluates
the second expression. May change later
eq
(eq 'a' 'a') = true
(eq T T) -> bool
defined for all default types
neq
(neq 'a' 'a') = false
(neq T T) -> bool
defined for all default types
leq
(leq 'a' 'b') = true
(leq T T) -> bool
defined forT: char, int, uint, float
geq
(geq 'a' 'b') = true
(geq T T) -> bool
defined forT: char, int, uint, float
lt
(lt 'a' 'b') = true
(lt T T) -> bool
defined forT: char, int, uint, float
gt
(gt 'a' 'b') = true
(gt T T) -> bool
defined forT: char, int, uint, float
and
(and true false) = false
(and T T) -> bool
defined forT: bool
or
(or true false) = true
(or T T) -> bool
defined forT: bool
!
ornot
(1 arg)(not true) = (not (not false)) = false
(or T) -> bool
defined forT: bool
concat
(concat [1] [2, 3, 4, 5, 6]) = [1, 2, 3, 4, 5, 6]
(concat list<T> list<T>) -> list<T>
defined forT: any
(which includes strings)
:
orprepend
(: 1 [2, 3, 4]) = (prepend 1 [2, 3, 4]) = [1, 2, 3, 4]
(: T list<T>) -> list<T>
defined forT: any
(which includes strings, whereT: char
)
take
(take 2 [1, 2, 3, 4, 5]) = [1, 2]
(take T list<U>) -> list<U>
is defined forT: uint, U: any
def
(def x 12)
defines a new variablex
with value12
and typeint
(def <ident> T) -> ()
is defined forT: any
- If you use a numeric literal the type will be set on the variable's definition. E.g.
if you define
x
via(def x 12)
and then create a list like[1u, x]
it will throw a compiler error. This may change in the future. You can get around this using the numeric suffix with literals.
init
(init x int)
defines a new uninitialized variable with typeint
(init <ident> <type>) -> ()
is defined for any type and available identifier
set
(set x 12)
sets the value of existing variablex
to 12 (coercing if necessary)(set x T) -> ()
is defined forT = type(x)
(more or less)
tostring
converts any type to a default string representationeval
evaluates an expression and throws the value away (specifically returns()
)- Useful for putting multiple expressions in sequence, e.g. a program could look like
Without([ (def x 12u) (set x (+ x 1)) (eval (write (tostring x))) ])
eval
this would throw a type error trying to match()
withstring
return
allows us to exit the program, propagating the specified value. This is useful mainly because it allows us to return a result from the main function even if we construct it as a list of()
statementsreturn 12
stops the program and returns12: int
sleep
halts the execution of the program for a specified time. This is mainly useful to allow us to space out print statements in long-running programs. It will be useful for demoing.
My previous grammar was too low-level to be useful in the new implementation. Now that the
Token
type is fairly well-defined I can do another quick pass. <>
represent rules,
[]
represents tokens
<Prog> => <Expr>
<Expr> => [LParen] <ExprBody> [RParen]
<ExprBody> => <Val> | <FuncCall>
<Val> => <List> | <Expr> | [Ident] | [Type] | [CharLiteral] | [String] | [NumLiteral] | [UnitLiteral]
<List> => [LBrack] <ListBody> [RBrack]
<ListBody> => <Val> | <Val> <ListBody>
<FuncCall> => [ReservedIdent] <Args>
<Args> => <Val> | <Val> <Args>
Below I list which tokens can begin which rules:
<Prog>: [LParen]
<Expr>: [LParen]
<ExprBody>: [LBrack], [LParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral], [ReservedIdent]
<Val>: [LBrack], [LParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
<ListVal>: [LBrack]
<ListBody>: [LBrack], [LParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral], [ReservedIdent]
<FuncCall>: [ReservedIdent]
<Args>: [LBrack], [LParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
And now the tokens that can end each rule:
<Prog>: [RParen]
<Expr>: [RParen]
<ExprBody>: [RBrack], [RParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
<Val>: [RBrack], [RParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
<ListVal>: [RBrack]
<ListBody>: [RBrack], [RParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
<FuncCall>: [RBrack], [RParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
<Args>: [RBrack], [RParen], [Ident], [Type], [CharLiteral], [String], [NumLiteral], [UnitLiteral]
I need to know which rules each token can be a part of, and which other tokens can follow them.
LParen
can only exist at the beginning of an expression, and it always exists at the
beginning of any expression. When we encounter an LParen
it starts the Expr
rule. This
means we have the following options for next token:
[LBrack] => Means we are in Expr -> ExprBody -> Val -> ListVal rule now
[LParen] => Means we are in a new Expr
[Ident] => *TERMINAL* means we have resolved ExprBody = [Ident] and can move on
[Type] => *TERMINAL* means ExprBody = [Type]
[CharLiteral] => *TERMINAL* means ExprBody = [CharLiteral]
[String] => *TERMINAL* means ExprBody = [String]
[NumLiteral] => *TERMINAL* means ExprBody = [NumLiteral]
[UnitLiteral] => *TERMINAL* means ExprBody = [UnitLiteral]
[ReservedIdent] => Means we are in Expr -> FuncCall rule now
- With my current handling of number literals something like
12u13
would be considered two separate numeric literals12u
and13