Skip to content

jq Language Description

itchyny edited this page Jan 26, 2021 · 87 revisions
  1. Purpose of this Page
  2. Notation
  3. The jq Language
  4. jq Program Structure and Basic Syntax
  5. Data Types
  6. Array and Object Accessors and Iterators
  7. Lexical Symbol Bindings: Function Definitions and Data Symbol Bindings
  8. Data Flow
  9. Generators and Backtracking
    1. Lazy Evaluation
    2. Streaming vs Arrays
  10. Reductions
  11. Path Expressions
    1. Sub-expressions that are not Path Expressions
  12. Assignments
  13. Built-in Functions
  14. Special Forms
    1. Operators Priority
  15. List of Built-in Functions
  16. Side-Effects
    1. Side-Effects Wish-list
  17. Keywords

Purpose of this Page

The jq documentation is written in a style that hides a lot of important detail because the hope is that the language feels intuitive. Some users need documentation that includes such details and more — this page is for them. Such users should also read the jq Advanced Topics wiki page.

This page, too, can hopefully form the basis for a formal specification of the jq language.

Notation

Besides making use of the jq language, whenever referring to functions, sometimes the number of arguments to that function will be denoted as symbol/N, so, foo/0 (function named “foo” with no arguments), bar/2 (function named “bar” with two arguments), and so on. E.g., foo(a; b) is equivalent to foo/2, but only the former is syntactically a jq expression, while the latter is used only in documentation.

When we refer to “jq programs” we mean, programs written in the jq language.

When we refer to the jq(1) command-line executable, we refer to it as the “jq command-line processor” or “jq(1)” — the “(1)” in “jq(1)” refers to the operating system manual section for commands. The jq command-line processor compiles and executes jq programs, but the way the jq command-line processor and the jq program interact with the world depends on what command-line options are used — those are not covered here. See the jq documentation for details.

The jq Language

jq is a dynamically-typed functional programming language with second-class higher-order functions of dynamic extent. All values are immutable, but the language can make it seem as though they are mutable.

Every expression is a closure, or “thunk”, if you wish, that gets applied to a singular input value.

Functions can be defined, and they too also get applied to a singular input value, but they may also get additional argument expressions (closures — or thunks if you wish). Functions, incidentally, like expressions, are all closures in jq, as they close over their lexically-visible environments.

The output(s) of an expression can be passed to another using the | operator. expressionA | expressionB applies expressionA to some input, then expressionB to all the outputs of expressionA.

As far as the jq language is concerned, a complete jq program is applied only to one singular input value, but the jq command-line processor generally evaluates the jq program on all the inputs to the command-line processor by restarting the jq program after each input. The processor’s behavior can be controlled by command-line options like -n and -s — these are not covered here, but in the https://stedolan.github.io/jq/manual/.

The input to an expression can be referred to explicitly as .. This is useful because jq lacks automatic currying, thus an expression that adds 1 to its input reads like so: . + 1 (or 1 + .). The + operator is syntactic sugar, and . + 1 desugars to _plus(.; 1) where _plus/2 is a special function; similarly for other infix operators and the prefix unary operator -.

The only types available to jq programs are JSON’s types: scalars (null, boolean true and false, numbers, and strings) and non-scalars (objects and arrays).

Expressions are not a value type, even though expressions can be passed to functions as arguments. A function’s expression arguments themselves can never be saved as values in arrays, or objects, or as scalars, and thus they cannot be output. The outputs of expressions, on the other hand, can only be values. Thus in def muladd(m; a): (. * m) + a; the m and a symbols are as-if functions that are applied, so muladd(5; 1) does the obvious thing, but muladd(.+1; ./2) might be less obvious: it is akin to writing (. * (. + 1)) + (./2). The fact that function arguments are thunks, plus jq’s generator/backtracking semantics, recursion, tail-call optimization, path expressions, and reduction operators, allows jq functions to implement powerful abstractions and control structures almost as if jq functions were macros.

The jq language is a “Lisp-2”, in that it has separate symbol namespaces for function and data. Its closures/thunks are of dynamic extent, thus allocated on a stack and deallocated automatically when their defining scopes are exited — this is why jq cannot have closure/thunk/function values, as it would then be difficult or impossible to prevent their use after being deallocated.

(The Icon programming language, for example, also has semantics that can be and are implemented using closures of dynamic extent. Dynamic-extent closures are sufficient for implementing depth-first backtracking, at the cost of needing co-routines for breadth-first searches. jq does not yet have co-routines, unlike Icon, which has had them for decades.)

Another reason that jq cannot have first-class function values is that jq deals in JSON texts as inputs and outputs, or else raw text, and there is no JSON representation of jq functions, and there really is no standard for representing code in raw text. One can imagine a variant of jq that has first-class function values, and a first-class function type or types, with closures having indefinite extent, but still, something would have to be done about output.

Still, because jq allows local functions in most expressions, and because of its lexical scoping rules, the fact that its functions are not first-class values of first-class types is not so restrictive.

jq Program Structure and Basic Syntax

Every jq program consists of exactly one expression. This expression can include any number of module imports/includes and function definitions. Comments are introduced by a # character and run through the end of the line.

# Module imports, includes, and function definitions:
import "a" as foo;
include "b";
def some_function: body_here;
# ...
#
# Finally, the main program, really, a singular expression:
some_expression | some_other_expression # and so on

# But note that you can have `def ...` in any expression.

Expressions can be pipelined, where the output(s) of each pipeline stage are the inputs to the next:

some_expression | some_other_expression

There are a number of special forms, such as constant literals, array/object accessors and iterators, “variable” bindings and destructuring, conditionals, and so on.

Every expression has a singular input value and zero, one, or more output values.

A function foo is called by just writing its name: now | foo applies foo to the result of now (which is a function that returns the current time). An expression like (1, 2) | foo means calling foo twice, first applying it to 1, then again to 2: when foo as applied to 1 completes, jq backtracks to produce a new value (here, 2) to apply foo to.

Functions can have arguments. Again, function arguments are not value arguments but thunk arguments. Calling a function with arguments looks like this: bar(some_expression; another_expression). Pay close attention to the use of ; for separating argument expressions, and do not confuse it with ,, is an operator that joins the outputs of the expressions on its left and right, while ; is only a syntactic separator that separates function arguments and terminates function bodies.

Semi-colons are required for terminating import, include, and def function bodies, as well as for separating expression arguments to functions of more than one expression argument.

Whitespace is not significant.

Data Types

jq supports only JSON’s data types: null, boolean (true and false), strings, numbers, arrays, and objects. Arrays are zero-based.

There is no way to declare any new data types, but objects and arrays can be used to represent complex data types.

jq is a dynamically-typed language.

Array and Object Accessors and Iterators

The expression expr[] outputs all the values in the array or object output by expr. E.g., [0, 3][] outputs 0, then 3. .[] outputs the values in ., so [range(3)] | .[] outputs 0, 1, and 2.

The expression expr[N] outputs the Nth element of the array output by expr. Thus [range(10)][2] outputs 2, and [range(10)][-1] outputs 9. Arrays are zero based, with negative indices referring to values from the right end.

The expression expr.ident outputs the value of the key named "ident" in the object output by expr. So {a:0}.a outputs 0.

The expression expr["some key string"] outputs the value of the key named "some key string" in the object output by expr.

The expression expr."some key string" outputs the value of the key named "some key string" in the object output by expr.

These things can chain. Thus .a["b"].c[].d outputs the value of the key named "d" in the objects output by .a.b.c[], which are all the values in the array at .a.b.c, which in turn is the value of the key named "c" in .a.b, and so on.

Lexical Symbol Bindings: Function Definitions and Data Symbol Bindings

In jq there are two types of symbols: function symbols, and value symbols.

Function symbols are any ident-like symbols, while value symbols are any ident-like symbols prefixed with a $.

Ident-like means: starts with a letter or underscore and consists only of letters, digits, and underscores. foo+1 parses as foo + 1.

Thus $foo is a symbol that evaluates to a value, while foo is a symbol that evaluates to a function (or closure/thunk). Though even a data symbol is an expression, and thus a thunk — one that ignores its input and always outputs the value bound to that data symbol. There is no relation between data and function symbols of the same name.

A symbol (in any context other than where it gets defined) always effectively applies the function named. $foo is a function that ignores its input and produces the value that $foo is bound to. foo is some function that gets applied to its input. foo(expr) is a function that gets applied to its input — what it does with expr is up to the foo/1 function’s body.

Functions are defined with def IDENT: BODY; or def IDENT(arg0; arg1; ..; argN): BODY;. Any reference to the function’s name in the body is bound to the same, which then allows recursion. The arguments are themselves also functions bound to the expressions passed in at where the defined function is applied. Function definitions can be included just about everywhere (e.g., ... | def foo: ...; ...).

E.g., in the body of a function defined as def cond(c; t; f): if c then t else f end; the function symbols c, t, and f, are bound to the first, second, and third argument expressions, respectively, and the name of the function, cond with ariness 2, is made visible to all jq code that follows its definition.

Note well that foo, foo(expr), foo(expr0; expr1), and so on, are all different functions. The number of arguments passed determines which foo is applied. We can and do refer to the first as foo/0, the next as foo/1, and so on.

Data symbol bindings are introduced with expr as $NAME | .... The | is required. The binding is visible to all expressions to the right of the |.

Lexical bindings shadow earlier bindings of the same names. For example:

def foo:
  def foo:
    def foo: .+1; # Just     .+1
    foo*3;        # Same as (.+1)*3
  foo+5;          # Same as ((.+1)*3)+5)

In this example, the foo in the outermost function body normally would have been bound to the function itself (being named foo), thus causing infinite recursion in this case, but because foo is immediately shadowed by a local function foo, the foo in the body is bound to that local function.

Function symbol bindings are introduced only by defs, defs in modules imported or included, or by jq itself in the case of built-in function symbols. The names of the argument thunks are lexical bindings available to the body of the function — and to the functions defined inside that function.

Recursion is possible because a function’s name is visible to its body:

def fact:
    if . == 0 then 1
    elif . > 0 then .*(.-1|fact)
    else "fact not defined for negative numbers"|error
    end;

A tail-recursive version of fact:

def fact:
  # Helper that keeps state as an array of [$n, $result]:
  def fact:
    if .[0] == 0 then .
    else .[0] as $n |
         (.[1] *= $n) |
         (.[0] -= 1) | fact
    end;
  select(. >= 0) | [., 1] | fact | .[1];

or

def fact:
  if . == 0 then 1
  elif . < 0 then empty
  else reduce (range(.) + 1) as $n (1; . * $n)
  end;

Note the two kinds of scopes:

  • function scopes are introduced by def, and function symbols are visible (assuming no shadowing) to all expressions in the defs that introduce them
    • function symbols are also visible to all subsequent defs at the same level
  • value scopes are introduced by ... as $name | ... and are visible (assuming no shadowing) to all expressions to the right of the |

Value scopes are also introduced by destructuring forms, which are a generalization of ... as $name | ....

Data Flow

Recall that every expression gets a singular input value, and that expressions can be chained with |. The outputs of each expression are then passed as input to the expression to the right (if any). The jq command-line processor prints the outputs of the right-most expression.

This means that values flow from left to right. Each expression in a pipeline can “transform”/replace its input value with zero, one, or more values. When an expression produces no more values (possibly none at all), the expression on the left is resumed to see if it can produce another value, in which case the expression on the right is applied de novo to the new value.

Generators and Backtracking

Every expression can output zero, one, or more values.

The primitive expression that outputs zero is empty, and it causes backtracking / pruning.

When an expression produces a value, the evaluation state of that expression is “suspended” while the output is processed by applying the expression to the right to that value.

When an expression in a pipeline produces no further values, then control returns to the expression to the left of it in the pipeline.

E.g., in range(5) | if .%2==1 then ., .*2 else empty end, the conditional expression is applied to each output of range(5), but for some such values (even numbers) it will “output” empty, which is to say, nothing, and it backtracks, while for other input values (odd numbers) it will output two numbers then backtrack. Each time the if statement in that example backtracks, the range(5) to its left will resume and output the next value, until it runs out, in which case it will backtack, and being the first expression in the jq program, its backtracking will cause the program to terminate. Note that if this program is invoked via the jq command-line processor (as opposed to the C API for invoking jq programs), then the command-line processor may read another value from stdin and apply the jq program to it all over again.

The array/object value iterator expression, .[], outputs all the values in the array/object.

The comma operator, , outputs the value(s) of the expression on the left, then the values of the expression on the right, but both expressions will be applied to the same input. For example, range(3;6) | (., . * 2) outputs 3, 6, 4 , 8, 5 ,10.

The inputs builtin outputs all the inputs read from the jq command-line processor’s stdin.

The range() builtin outputs a sequence of numbers. E.g., range(5) outputs the numbers 0 through 4, inclusive.

Lazy Evaluation

jq does not have lazy evaluation as such. But because all function arguments are thunks that may or may not get evaluated (depending on what the called function chooses to do), and because function argument thunks can output multiple values, jq effectively has lazy evaluation after all.

Consider the limit/2 builtin function: it outputs the first $n values of its second argument thunk:

$ time jq -cn '[limit(5; range(1000000)|range(1000000))]'
[0, 1, 2, 3, 4]

real    0m0.02s
user    0m0.02s
sys     0m0.00s
$ 

In fact, the limit/2 builtin function really does limit how many values its second argument produces. No matter how many values its second argument wants to produce, once the $nth value is reached, evaluation stops.

Here’s a definition of limit/2:

def limit($n; exp):
  if $n < 0 then exp
  else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
  end;

Incidentally, function arguments named $name are just a small amount of syntactic sugar. The following definition of limit/2 is equivalent to the above:

def limit(n; exp):
  n as $n |
  if $n < 0 then exp
  else label $out | foreach exp as $item ($n; .-1; $item, if . <= 0 then break $out else empty end)
  end;

Streaming vs Arrays

Arrays are not lazy in jq, therefore they always have a definite size, and they take up O(N) space.

jq expressions and functions can output zero, one, or more values. A jq expression that outputs one billion values takes up O(1) memory, not O(N). Therefore “streaming”, i.e., generating many values, is cheaper than collecting those values into an array.

Consider the map function, and a variant that streams:

def map(f): [.[] | f];
def map_values(f): .[] | f;

The first, map/1, is the standard “map” function one finds in most functional programming languages. The second, map_values/1 is a streaming version of map/1.

Whenever possible, jq programmers should prefer to stream values.

Reductions

jq has a couple of reduction primitives:

reduce stream_expression as $name (initial_value; update_expression)

and

foreach stream_expression as $name (initial_value; update_expression; extract_expression)

These allow the programmer to apply an update expression successively to its own outputs, but with a lexical binding for each of the stream_expression’s outputs.

E.g., reduce range(5) as $n (0; .+$n) adds the numbers from 0 to four, inclusive. In this example $n in the update expression is bound to each successive input from the stream expression (which here is range(5)), and the expression .+$n is applied to the reduction’s state value, and the output of .+$n becomes the next reduction state value. When the stream expression runs out of inputs, the final reduction state value is output.

The foreach reduction operator can output intermediate state values, and will do so whenever the third expression, the extraction expression (optional and by default equal to .), outputs a value (if it outputs no values, then foreach will update the reduction state with the next input). (Note: it probably would have been best to not introduce a new syntactic construct for this, just add an expression to the existing reduce construct.)

Note that though a reduction like reduce range(5) as $n (0; .+$n) is equivalent to 0 + 0 | . + 1 | . + 2 | . + 3 | . + 4, jq uses much less state to implement the reduction.

Note that while the state update expression is running, jq does not retain any additional references to that expression’s input value. This means that from the second update forward, the reduction state value never has more than one reference. This is critical because when values have just one reference, then “mutation” operations that normally copy-then-write, just mutate in-place. See more about this below.

Path Expressions

A path expression is any expression which when given to path(EXPR), does not yield an error. This is a terrible description. Let us try again.

A path expression is any expression that is ultimately passed to path/1 and which is composed of array/object traversal operators, or function calls where their bodies (but not conditional expressions) all consist of path expressions.

The purpose of path expressions is to enable the magical-seeming path/1 built-in, and assignment forms (which are built on path/1).

The path(path_expression) builtin outputs arrays of strings and numbers representing the paths in the input value matched by the given path_expression. path/1 is, essentially, a pattern-matching primitive.

As we’ll see in Assignments, the path/1 built-in is essential to the construction of assignment operators.

Not every expression is a path expression. For example, .a.b is a path expression, but .a + .b is not! foo is a path expression if and only if the body of function named foo is a path expression.

Because in jq arguments to functions are thunks, it is not possible from local syntactic analysis to tell whether an expression must be a path expression — a function’s body might or might not pass a thunk to path/1. One must either inspect the function’s documentation or its body. Passing a non-path expression to path/1 will yield a run-time error, so it is important to know which expressions must be path expressions. As we’ll see in Assignments, the left-hand side expressions of assignment forms must be path expressions.

(It should be possible for the jq compiler to determine if some expression is a path expression, and also to determine if a function argument must be a path expression, thus being able to report path expression errors at compile-time. However, the jq compiler is not that smart at this point.)

Path expressions, then include:

  • .[KEY_EXPR] in all its variants, but note that KEY_EXPR itself need not be a path expression:
    • .ident
    • ."string key"
    • .["string key"]
    • .[INTEGER]
  • the object/array iterator .[]
  • select(CONDITIONAL) is a path expression regardless of whether CONDITIONAL is
  • if ... then ... elif ... then ... else ... end is a path expression if all the branches are path expressions, regardless of whether the conditional expressions are path expressions
  • since select(COND) is def select(cond): if cond then . else empty end;, it follows that…
    • the expression argument to select/1 is not required to be a path expression
    • . and empty are path expressions!
  • .. desugars into recurse, which is def recurse(f): def r: ., (f | r); r; def recurse: recurse(.[]?);, which are path expressions, so .. is a path expression

Constant literals are not path expressions. Neither are any of the arithmetic operators, nor the unary minus operator.

Some sub-expressions in path expressions are exempted from having to themselves be path expressions. For example, conditionals are exempted, and so are value expressions in data symbol binding forms (including destructuring forms). The next section lists these exemptions.

Given a datum like {"a":{"b":[{"c":0},{"d":1}]}} we can have path expressions like:

  • .. => matches all paths in the input
  • .a.b[0].c => matches the path to the value 0
  • .a.b[1].d => matches the path to the value 1
  • .a[]b[][] => matches all the paths in this input
  • .a.b|.. => matches all paths below .a.b

and so on.

Examples:

$ printf '%s\n' '{"a":{"b":[{"c":0},{"d":1}]}}' | jq -c 'path(..)'
[]
["a"]
["a","b"]
["a","b",0]
["a","b",0,"c"]
["a","b",1]
["a","b",1,"d"]

Sub-expressions that are not Path Expressions

Here we’ll expose some of jq’s internals for the purpose of listing all of the sorts of sub-expressions of path expressions that are exempted from having to contribute to path-building. The reader can gloss over the internals details if they wish and focus only on the list of exemptions below. (XXX Perhaps we should remove all internals details?)

The jq VM interpreter has four special opcodes for dealing with path expressions:

  • PATH_BEGIN and PATH_END, which bracket calls to the path expression argument to path/1, and
  • SUBEXP_BEGIN and SUBEXP_END, which bracket calls to sub-expressions which are not intended to contribute to path building.

For example, conditional expressions in if forms are bracketed with SUBEXP_BEGIN and SUBEXP_END opcodes.

Thus we can look at all the forms where bytecode is generated via gen_subexp() to see what sorts of expressions are exempted from having to contribute to path-building:

  • evaluation of index expressions such as index_expr in .[index_expr] (see gen_index())
  • evaluation of array slice start/end expressions such as start_exp and end_exp in .[start_exp:end_exp] (see gen_slice_index())
  • evaluation of empty object construction, {} (see '{' MkDict '}' case of Term in src/parser.y)
  • evaluation of object key and value expressions in object construction syntax (see gen_dictpair())
  • evaluation of conditional expressions (see gen_cond())
  • evaluation of value expressions in data symbol binding forms (see gen_var_binding()) (i.e., in path(5 as $five | ...), the 5 does not contribute to path building, whereas path(5 | ...) would yield a run-time error)
  • evaluation of path expressions in destructuring, which is a generalized form of data symbol binding (see gen_array_matcher() and gen_object_matcher())
  • evaluation of argument expressions in calls to C-coded built-in jq functions (see expand_call_arglist())

We have had bugs in the past relating to incorrect or missing uses of gen_subexp(), and bugs related to insufficient or excessive run-time sanity checking of path-building. See path_intact() and path_append() in src/execute.c.

Note too that path-building context can nest. That is, one can have path expressions with path expressions inside them. This is done by making path building context part of expression evaluation stack frames (jq has a stack, naturally). For example, foo = 1 where foo/0 has a body that itself uses path/1.

Assignments

jq has assignment operators. But jq values are immutable. So how can jq possibly have assignments?!

Well, assignments in jq desugar into reductions over the paths matched by the path expressions on the left-hand side (LHS) modifying the values at those paths (in the input value) according to the right-hand side (RHS) expression. Modifications are copy-on-write modifications (and, when there is just one reference to a value, the modifications are in-place as an optimization).

The use of path expressions can make jq assignments resemble Lisp generalized variables (setf macros), or Icon place references. For example, here we see a function foo functioning a lot like a Lisp generalized variable (Lisp setf macros):

$ jq -cn 'def foo: .a.b; {a:{b:{c:0}}}|(foo.c += 1)'
{"a":{"b":{"c":1}}}

The += assignment operator desugars to lhs |= . + rhs, and |= desugars into _modify(lhs; rhs), and _modify is defined as (simplified):

def _modify(paths; update):
    reduce path(paths) as $p (.; setpath($p; getpath($p) | update));

Note that the lhs in assignments ultimately gets passed to path/1, thus making the LHS of assignments… path expressions!

What does _modify/2 do? It:

  • produces all the paths in the input value as arrays of path component numbers and/or strings (path(paths))
  • reduces these with the original input value as the initial reduction state
  • for each path it gets the value in the input at that path (getpath($p))
  • evaluates update on that value (getpath($p) | update)
  • and finally “mutates” the reduction state value (.) by setting the new value at the same path (setpath($p; ...))

It’s important to note that values are immutable, which means that all mutation operations return a new copy of their input modified according to the desired mutation. Thus setpath(...; ...) doesn’t modify its input, but it produces a new value as its output that is a copy of the input modified according to setpath()’s arguments.

It’s also important to note that whenever there is a single reference to a value, internally jq will in fact mutate it rather than copy it, and this is obviously correct and performant.

All the assignment operators except = work this way. Those that combine operators like +, -, and so on, with assignment, desugar into _modify(lhs; . OPERATOR rhs), while |= desugars into _modify(lhs; rhs).

The = operator passes the same value as input to the RHS as the input to the lhs, and desugars into _assign(paths; value). _assign() is defined as:

def _assign(paths; value):
    value as $v | reduce path(paths) as $p (.; setpath($p; $v));

Note that _assign() applies value (the RHS) to its input once at the beginning, creates a lexical binding for that value ($v), and then sets all the paths to that value $v. Thus .[] = range(5) will produce five outputs, each with all the value slots in the . array or object set to 0, then all set to 1, and so on. This can be surprising.

In modify-assignments (|=, +=, etc.), it makes no sense to have more than one output in the value update expression. The actual _modify looks like this:

def _modify(paths; update):
    reduce path(paths) as $p (
        .;
        label $out | (setpath($p; getpath($p) | update) | ., break $out),
                      delpaths([$p]));

which means that when the value update expression outputs more than one value, only the first is used, and when it outputs no values, then the path is deleted. I.e., .a |= select(.%2 == 1) + 1 deletes .a from . if the value at .a is an even number, else it adds one to it:

$ jq -cn '{a:0,b:true}|.a |= select(.%2==1) + 1'
{"b":true}
$ jq -cn '{a:1,b:true}|.a |= select(.%2==1) + 1'
{"a":2,"b":true}

while .a |= range(5) sets .a to 0:

$ jq -cn '{a:1,b:true}|.a |= range(5)'
{"a":0,"b":true}

Built-in Functions

There are three types of built-in functions:

  • jq-coded functions

    These are functions defined in src/builtin.jq, and they are compiled as any user-defined functions.

  • bytecoded functions

    These are functions defined in src/builtin.c, and they consist of hand-crafted block representations of jq programs. (A block is an AST-ish output of the jq program parser, which straightforwardly gets compiled to bytecode.)

    For example, the empty built-in function has a one-opcode body, and that opcode is BACKTRACK.

    The full list of bytecoded built-in functions is very short, at this time being just:

    • empty/0
    • not/0
    • path/1
    • range/1

    range/2 and range/3 are jq-coded, not bytecoded, and are made possible by tail recursion optimization.

  • C-coded jq functions

    These functions are defined in src/builtin.c. These functions do not actually accept thunks as arguments, only values, therefore the jq compiler wraps invocations of C-coded functions with a bytecoded wrapper that applies any argument thunks to ., roughly like so: def _jq_call_c_coded_foo(a; b): a as $a | b as $b | _call_c_coded_foo($a; $b);.

    C-coded functions have C prototypes of this form jv name(jv input) for zero-expression-argument functions, jv name(jv input, jv a) for one-expression-argument functions, jv name(jv input, jv a, jv b) for two-expression-argument functions, and so on up to six arguments. jq-coded functions have no such limit on the number of expression arguments they accept, but they are limited to however many arguments they can address given that compiler jq-coded function bodies are limited to 2^16 opcodes per function body.

With the exception of if-then-else constructs, .[], and a few other such constructs, everything in jq involves applying functions.

Special Forms

[ expr ] is a special form that collects the outputs of expr into an array. It desugars into something like reduce expr as $value ([]; setpath(length; $value). The object constructor, { ... } is similar.

If-then-else constructs are a special form.

There are a number of others, and these are all defined in src/parser.y, and are described in the manual.

A partial list of special forms follows:

  • import "name" as prefix; – imports the module name "name" and makes its symbols available as prefix::name
  • include "name"; – imports the module named "name" and makes its symbols available as if the module had been included verbatim
  • . – the current input value
  • literal values, i.e., numbers, "strings", true, false, and null
  • "this \(expr) interpolates the outputs of expr into this string" – string interpolation
  • binary infix operators
    • comparison operators: ==, !=, <, >, <=, >=
    • arithmetic operators: +, -, *, /, %
  • unary prefix negation operator -
  • [ expr ] – collect expr’s outputs into an array
  • object construction syntax (see manual)
  • term[index_expr] – output the value at index_expr in expr
  • term . ident – same as term["ident"]
  • term . "name" – same as term["name"]
  • .. – produce all the values in . in pre-order order recursively
  • term[start_expr : end_expr] – array slice operator
  • expr ? – suppress errors from expr
  • label $name | ... | break $name – fancy empty that unwinds all of ...
  • assignment operator: =
  • modify-assignment operators: |=, +=, -=, *=, /=, %=
  • logical operators: not, and, or
  • __loc__ – evaluates to the {file: FILENAME, line: LINENO} where __loc__ occurs
  • $ident – value binding’s value
  • ident – applies function ident to .
  • ident(expr) – applies ident called with expr to .
  • ident(expr0; expr1) – applies…
  • comma operator , – outputs the values of the expression to the left, then those of the expression to the right, both expressions applied to the same input value
  • if cond_expr0 then true_expr0 elif cond_expr1 then true_expr1 ... else false_expr end
  • try expr catch handler_expr – invokes handler_expr on the error raised by expr, if any
  • reduction syntax (see above)
  • function definition (see elsewhere here)
  • data symbol binding (expr as $name | ...) and destructuring syntax (see manual)
    • expr as $name | ...
    • expr as [$name, $other_name] | ...
    • expr as {$name, $other_name} | ...
    • expr as {$name:[$thing1, $thing2], $other_name} | ...
  • @sh, @json, @csv, @tsv, @html, @uri, @base64, @base64d – format / escape string forms

Note that path(expr), though very special, is not a special form. path(expr) is a bytecoded-function whose body invokes its argument expression thunk bracketed with opcodes that cause the paths in . traversed by that expression to be recorded and output one by one.

Operators Priority

Operator Associativity Description
(...)   scope delimiter and grouping operator
| right compose/sequence two filters
, left concatenate/alternate two filters
// right coerces null, false and empty to an alternative value
= |= += -= *= /= %= //= nonassoc assign; update
or left boolean “or”
and left boolean “and”
== != < > <= >= nonassoc equivalence and precedence tests
+ - left polymorphic plus and minus
* / % left polymorphic multiply and divide; modulo
- none prefix negation
? none postfix operator, coerces errors to empty
?// nonassoc destructuring alternative operator

List of Built-in Functions

Use jq -nr 'builtins[]' to list all the built-in functions.

At this time that list includes:

  • IN/1
  • IN/2
  • INDEX/1
  • INDEX/2
  • IN_INDEX/2
  • JOIN/2
  • JOIN/3
  • JOIN/4
  • LOOKUP/2
  • UNIQUE_INDEX/2
  • acos/0
  • acosh/0
  • add/0
  • all/0
  • all/1
  • all/2
  • any/0
  • any/1
  • any/2
  • arrays/0
  • ascii_downcase/0
  • ascii_upcase/0
  • asin/0
  • asinh/0
  • atan/0
  • atan2/2
  • atanh/0
  • booleans/0
  • bsearch/1
  • builtins/0
  • capture/1
  • capture/2
  • cbrt/0
  • ceil/0
  • combinations/0
  • combinations/1
  • contains/1
  • copysign/2
  • cos/0
  • cosh/0
  • debug/0
  • del/1
  • delpaths/1
  • drem/2
  • empty/0
  • endswith/1
  • env/0
  • erf/0
  • erfc/0
  • error/0
  • error/1
  • exp/0
  • exp10/0
  • exp2/0
  • explode/0
  • expm1/0
  • fabs/0
  • fdim/2
  • finites/0
  • first/0
  • first/1
  • flatten/0
  • flatten/1
  • floor/0
  • fma/3
  • fmax/2
  • fmin/2
  • fmod/2
  • format/1
  • frexp/0
  • from_entries/0
  • fromdate/0
  • fromdateiso8601/0
  • fromjson/0
  • fromstream/1
  • gamma/0
  • get_jq_origin/0
  • get_prog_origin/0
  • get_search_list/0
  • getpath/1
  • gmtime/0
  • group_by/1
  • gsub/2
  • gsub/3
  • halt/0
  • halt_error/0
  • halt_error/1
  • has/1
  • hypot/2
  • implode/0
  • in/1
  • index/1
  • indices/1
  • infinite/0
  • input/0
  • input_filename/0
  • input_line_number/0
  • inputs/0
  • inside/1
  • isempty/1
  • isfinite/0
  • isinfinite/0
  • isnan/0
  • isnormal/0
  • iterables/0
  • j0/0
  • j1/0
  • jn/2
  • join/1
  • keys/0
  • keys_unsorted/0
  • last/0
  • last/1
  • ldexp/2
  • leaf_paths/0
  • length/0
  • lgamma/0
  • lgamma_r/0
  • limit/2
  • localtime/0
  • log/0
  • log10/0
  • log1p/0
  • log2/0
  • logb/0
  • ltrimstr/1
  • map/1
  • map_values/1
  • match/1
  • match/2
  • max/0
  • max_by/1
  • min/0
  • min_by/1
  • mktime/0
  • modf/0
  • modulemeta/0
  • nan/0
  • nearbyint/0
  • nextafter/2
  • nexttoward/2
  • normals/0
  • not/0
  • now/0
  • nth/1
  • nth/2
  • nulls/0
  • numbers/0
  • objects/0
  • path/1
  • paths/0
  • paths/1
  • pow/2
  • pow10/0
  • range/1
  • range/2
  • range/3
  • recurse/0
  • recurse/1
  • recurse/2
  • recurse_down/0
  • remainder/2
  • repeat/1
  • reverse/0
  • rindex/1
  • rint/0
  • round/0
  • rtrimstr/1
  • scalars/0
  • scalars_or_empty/0
  • scalb/2
  • scalbln/2
  • scan/1
  • select/1
  • setpath/2
  • significand/0
  • sin/0
  • sinh/0
  • sort/0
  • sort_by/1
  • split/1
  • split/2
  • splits/1
  • splits/2
  • sqrt/0
  • startswith/1
  • stderr/0
  • strflocaltime/1
  • strftime/1
  • strings/0
  • strptime/1
  • sub/2
  • sub/3
  • tan/0
  • tanh/0
  • test/1
  • test/2
  • tgamma/0
  • to_entries/0
  • todate/0
  • todateiso8601/0
  • tojson/0
  • tonumber/0
  • tostream/0
  • tostring/0
  • transpose/0
  • trunc/0
  • truncate_stream/1
  • type/0
  • unique/0
  • unique_by/1
  • until/2
  • utf8bytelength/0
  • values/0
  • walk/1
  • while/2
  • with_entries/1
  • y0/0
  • y1/0
  • yn/2

Side-Effects

Most jq built-in functions are pure, but over time we have added a few impure functions:

  • input – read one input from the standard input (or whatever the jq command-line processor wants to read from)
  • inputs – read as many inputs from the standard input as possible (or whatever the jq command-line processor wants to read from)
  • input_filename - name of the file whose input is currently being filtered
  • debug – output its input to the standard error output
  • halt/0, halt_error/0, and halt_error/1
  • now – current time
  • $ENV and env - access environment variables

Side-Effects Wish-list

We’d like to add:

  • random numbers
  • file I/O
  • external command I/O
  • SQLite3 access
  • ...

Keywords

The jq language has relatively few keywords. These cannot be used for function or data symbols (Note: we could allow keywords in data symbols, but not in function symbols), but they can be used in object construction syntax as keys.

If we update jq to allow keywords as data symbols, we will also allow keywords in destructuring syntax.

Keywords:

  • __loc__
  • and
  • as
  • break
  • catch
  • def
  • elif
  • else
  • end
  • foreach
  • if
  • import
  • include
  • label
  • module
  • or
  • reduce
  • then
  • try
Clone this wiki locally