-
Notifications
You must be signed in to change notification settings - Fork 0
Syntax
Julia's syntax has many desirable features, some of which we feel programmers have always wanted and yet are seldom available.
- Chained comparisons: 0 <= i < n
- Implied multiplication by juxtaposition: 1 + 2x
- An optional terse function definition syntax, encouraging small definitions where appropriate
- Every binary operator has an updating form, +=, -=, etc.
- Matrix/vector literals, matrix/vector concatenation syntax
- Ternary conditional operator (as in C)
- Chains of sum and product operators are combined into single calls, allowing operations to be reordered or parallelized dynamically
- Range syntax allowing indefinite start and/or end points
- Operators are ordinary functions whose names are the same as their symbols, and they can be called with general function call syntax: (+)(a,b,c)
integer, floating-point, identifier, string
By precedence, with the lowest precedence at the top:
- newline (inside a block), semicolon
- = += -= *= /= ^= %= |= &= $= <<= >>=
- comma inside a tuple
- ||
- &&
- -> <- -->
- > < >= <= == != .> .< .>= .<= .== .!=
- : ..
- << >>
- + - | $
- * / ./ % & .* \ .\\
- ^ .^
- ::
- .
+ - ~ !
`x is equivalent to quote(x)
' .' ...
a < b <= c etc.
() empty tuple (a,) (a,b) (a...) vararg tuple (a...,) also permitted
Note that parens with no comma, e.g. (a+b), performs grouping as usual in math.
Note that function definitions, function calls, and tuples all have the same syntax.
f(a, b, c) f() f(a, key=value) f(a, b; key=value)
a[i] a[i] = x a[i,j] a[i,:,k]
Inside indexing expressions, : by itself is automatically quoted.
a:c a:b:c a: :c a:b: :b:c :b:
{a, b, c}
[a, b, c] [ [a,b],[c,d] ] [a,b;c,d]
begin ... end while condition ... end for i = x ... end if condition ... elseif condition ... else ... end
Semicolons are allowed at the end of lines but are optional and meaningless. Semicolons can be used to separate multiple statements on one line. A line break acts as a statement separator unless the line leaves an expression open, e.g. there is an unclosed open paren. "..." or the like is not necessary.
function foo(x::Int, y=0, rest...; color="red") l = 0; z = 1; function bar() local l; z = z+1; l = 100; end end
In this function x must be an integer, y can be any type and is optional with a default value of 0, and additional arguments will be passed in a list called rest. foo also accepts a color keyword argument with a default value of red.
Inside foo, l and z are locals. Inside bar, z is shared with foo (it is the same variable), but it has its own separate local l.
In general, variables that exist in outer scopes are inherited. A variable assigned within a scope that does not exist in any outer scope is created as a local variable of its immediately enclosing scope. The keyword local can be used to create a new local shadowing locals from outer scopes.
The following forms of local are allowed:
local x local x::Int local x, y, z local x::Int, y::Int, z::Int local x=2 local x::Int=3
x -> 2*x # simple function of 1 argument () -> 0 # function of 0 arguments (x,y) -> 2*x+y
To avoid ambiguity, there is different syntax for arrow types:
x -> 2*x # anonymous function Int --> Int # arrow type
x -> do(a,b,c) # expression syntax for performing a sequence of operations
See Types and their representations for possible type definition syntax.
Stefan's idea is to separate optional from keyword arguments with a semicolon:
function f(x, y=0, z=0; powerLevel=2) ... end
This allows us to require that keyword arguments be passed by keyword (not by position), which I believe aids the expressiveness and clarity of APIs. You can have both ordinary optional arguments, and super-special optional arguments that need to be clearly marked. One can imagine a function accepting 20 keyword arguments, and this prevents people from writing obfuscated code passing all 20 arguments unlabeled. It also leaves the function author free to change the names of ordinary optional arguments without fear of breaking code that might try to pass them by name.
Stefan adds: This notation actually has a basis in mathematical notation: mathematica functions take variables as well as zero or more parameters; they are traditionally separated by a semicolon. Although the analogy "non-keyword parameters = function variables; keyword parameters = function parameters" is not exact, I think it's actually a pretty decent near analogy. For me this mathematical notation actually makes me inclined to allow people to separate the "variables" from the "parameters" even in their function calls if they are so inclined.
The features of julia's syntax map to a limited number of abstract syntax node types. Some node types are implemented directly by the compiler (see Core forms) and some must be implemented by macros. Implementers and macro writers need to understand this representation.
Function calls and most operators are represented by (call f args...). Argument lists with semicolons, such as f(x; k=2), are represented by (call f x (parameters (= k 2))). A rest argument is represented by (... x).
The following operators are not function names but syntax node types: = := += -= *= /= ^= %= |= &= $= => <<= >>= -> || && : :: .
The splat operator in an argument list, e.g. f(x, l..., y), is syntactic and parsed as (... l).
Block structure forms:
(block ...) # a block of statements (while cond body) (for iteration-spec body) (if cond then else) (function sig body) (type sig body) (try body catch-body) (break) (continue)
Matrix, list, and tuple constructors:
(cat ...) (tuple ...) (list ...)
Indexing: (ref a idxs...) This is quickly converted to a call to a function named ref in most cases, but it lets us distinguish a[i]=x from the invalid ref(a,i)=x.
Other keywords:
(typealias name sig) (return expr) (local expr) # expr is a symbol, or x:t or x=v or x:t=v, or a tuple of these
A few keywords are special in that their uses look just like function calls, but they are syntax and not function calls:
block(stmt,...) quote(expr)In each case the head symbol is not a function name but a node type.
Arguments to the colon operator may be omitted, resulting in several different forms:
a:c => (: a c) a:b:c => (: a b c) a: => (: a (quote :)) :c => (: c) a:b: => (: a b (quote :)) :b:c => (: (: b c)) :b: => (: (: b (quote :)))A missing first argument results in a unary colon expression. A missing last argument is parsed as if the last argument were the symbol ":".
Comparison operators can be chained; a<b<c means a<b && b<c. Such comparison chains are parsed as (comparison expr OP expr OP ...) where the OPs are equality or inequality operators. A lowering pass is responsible for converting this to a conjunction of calls.
After abstract syntax, there are two more distinct tree languages.
The first is de-sugared, or macro intermediate form. Low-level macros transform raw parse trees into this form to clarify the meaning of a program. Some node types are added, some must be removed, and some stay the same:
New node types:
(_while test body) - primitive while loop, no support for break, continue, or scope (break-block name body) - inside body, (break name) will exit the block (break name) - see above (addmethod name lambda-expr) (scope-block body) - mark a variable scope boundary around body (build-args ...) - move a series of tuples onto the argument stack (lambda args body) - a primitive (non-generic) function
Removed node types:
-> . : ref list function while for break continue comparison += -= *= /= ^= %= |= &= $= <<= >>= (... x) (local ...) with multiple variables
and the following nodes survive intact:
quote if return && || block type typealias tuple = call ::
The next tree language is "linear flow form", which is used for type inference and ultimately code generation. The idea of this form is to represent control flow explicitly, so all control-flow-related block constructs are flattened out and removed: _while, break-block, break, scope-block, block, if, &&, and ||. goto and goto-if are added. local is removed and replaced by a (locals ...) list within lambda expressions. These locals lists hold the results of variable analysis. Code at this point will also be in closure-converted form.
The point before linear flow form is where in a traditional compiler you would construct a control flow graph. However, this is not necessary if the code itself is in a convenient form. There are efficient algorithms for type-inference and other data-flow passes against linear representations, plus this form is closer to the level of the target back end. In a recent release the Mono team decided to move to a linear intermediate representation, saying that it facilitates more optimizations, among other advantages. I am now also convinced that this is a good idea.
However, what I'm caling LFF does not linearize everything—namely, just control flow. Within a statement there might still be complex nested expressions. Hopefully this will give us the best of both worlds, allowing both symbolic simplification of expressions (such as comprehension optimization), and more traditional optimizations through the linear IR.
The following AST node types exist in the final lowered form:
goto goto-ifnot label return lambda call = quote null top value-or-null closure-ref