Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEP 11: The let() Function #6

Closed
wants to merge 1 commit into from
Closed

JEP 11: The let() Function #6

wants to merge 1 commit into from

Conversation

jamesls
Copy link
Member

@jamesls jamesls commented Feb 25, 2015

The JEP goes into detail about why we need this and how it works. I'll also link compliance tests and a sample implementation to this PR shortly.

cc @mtdowling

jamesls added a commit to jmespath/jmespath.test that referenced this pull request Feb 25, 2015
jamesls added a commit to jamesls/jmespath that referenced this pull request Feb 25, 2015
Proposed in jmespath/jmespath.site#6.

I tried to pick an implementation that was as minimally
invasive as possible.

It works by making three changes.

First we need to track scope, and share this information between
the interpreter and the function module.  They both take a
reference to a scope object that allows you to push/pop scopes.
The ``let()`` function will push the user provided lexical scope
onto the scope chain before evaluating the expref, and pop the
scope after evaluating the expref.

The second change needed is to change how identifiers are resolved.
This corresponds to visiting the ``field`` AST node.  As detailed
in JEP 11, after failing to resolve the field in the current object,
we call back to the scope chain.

The third change is to bind the current value (the context) in which
an expref is first created.  This wasn't needed before because for
functions that take an expref, such as ``sort_by``, ``max_by``, and
``min_by``, they evaluate the expref in the context of each list
element.  However, with ``let()``, we want to evaluate the expref
in the context of the current object as specified when the expref
was created.  This also tracks the current object properly in the
case of nested ``let()`` calls.
jamesls added a commit to jmespath/jmespath.test that referenced this pull request Feb 25, 2015
@jamesls
Copy link
Member Author

jamesls commented Feb 25, 2015

Ok I've linked:

So far everything seems reasonable. Semantics make sense to me. We're borrowing from existing languages where this feature has been around for a really long time so we're using well tested concepts. The python implementation was really straightforward and concise (IMO). I'd imagine other implementations would have the same order of magnitude of code changes to implement this.

I'm curious to hear what others think, but this JEP is growing on me.

@jamesls
Copy link
Member Author

jamesls commented Feb 25, 2015

cc @kyleknap @danielgtaylor, if you want to comment, though don't feel obligated.


Prior to this JEP, identifiers are resolved by consulting the current context
in which the expression is evaluted. For example, using the same
``search`` function as defined in the JMESPath specification, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part threw me off, some a minor point of feedback might be to link to what you're talking about. I thought search was a JMESPath function.

@mtdowling
Copy link
Contributor

This is awesome. While it does complicate JMESPath a bit and requires non-trivial changes to the php, javascript, lua, clojure, ruby, etc... implementations... I think this feature is worth it. 👍

jamesls added a commit to jmespath/jmespath.js that referenced this pull request Feb 25, 2015
Proposed in jmespath/jmespath.site#6.

I think the code could use a bit of cleanup, but all the
compliance tests are passing.
@jamesls
Copy link
Member Author

jamesls commented Feb 25, 2015

FWIW, I also have an initial implementation of the javascript version as well (jmespath/jmespath.js@25c22b2), and I would say the implementation is almost borderline trivial, which is what I like (and surprised me) about this JEP. Same thing for the python implementation. Really all I needed was:

  1. Allow for the runtime/function modules to modify scope. For me, I just needed a way to push/pop scope objects.
  2. Change the lookup process for identifiers. After failing a lookup in the current object, fall back to scope chain lookups.
  3. Have some way to bind the current object when the expref arg is initially evaluated (not when the expression referenced by the expref is evaluated). Because all we see in a function body are the evaluated arguments, we need to know what the current object/context was when the function was called so that the expref has an input object. I didn't previously need this because functions like sort_by, max_by, and min_by evaluated the expref in the context of each individual list element, which was provided as an input argument. Here we need to know what the current object was at the time the let() args are resolved.

There might be a better way to do this, but as far as I can tell, this can handles all the cases I can think of, is pretty easy to follow, and has very minimal runtime overhead. Should also be easy to generalize if other functions or things in the runtime want the ability to create lexical scopes.

@jamesls
Copy link
Member Author

jamesls commented Feb 25, 2015

Although I did notice a few interesting scenarios.

First one's not so bad, but consider this:

>>> data = {"foo": {"bar": "baz"}}
>>> jmespath.search('let({qux: `qux`}, &foo.qux)', data)
'qux'

In this scenario we're evaluating foo.qux. foo evaluates to {"bar": "baz"}. So far so good. Next we evaluate the RHS of the sub expression, qux. We see the qux is not defined in the current object {"bar": "baz"} so we look in the scope object and see it's defined as "qux", so we use that value. I can see how some people might find that confusing. Maybe not.

The second one is more interesting. Right now, trying to evaluate an identifier on something that's not a JSON object, results in null:

>>> data = [0, 1, 2, 3, 4]
>>> print jmespath.search("foo", data)
None

However, what about something like this:

>>> data = [0, 1, 2, 3, 4]
>>> print jmespath.search('let({foo: `foo`}, &[0].foo)', data)

What would you expect that to print? The JEP isn't actually clear about this. You could say this is evaluated as:

  1. It's a sub expr, so evaluate the LHS then the RHS.
  2. The LHS is [0], which evaluates to 0.
  3. The right hand side is foo. We lookup foo in 0, which results in null.
  4. If we treat this null as a "failed lookup", we then defer to the scope chain, which has foo set to "foo", so the return value from this function is foo.

On the other hand, you could say trying to evaluate an identifier with an input/current object that's not a JSON object will result in a null and will not fallback to scope chain lookups. This is what I currently have implemented in javascript/python, but I can see how someone might expect this to evaluate to "foo".

@mtdowling
Copy link
Contributor

The first case you brought up is interesting. I feel that the behavior you've outlined is confusing and not what I'd expect.

>>> data = {"foo": {"bar": "baz"}}
>>> jmespath.search('let({qux: `qux`}, &foo.qux)', data)
'qux'

In this case, I think that once you've descended into foo, you've entered into a new scope. In this new scope, qux is not defined and should evaluate to null.

@mtdowling
Copy link
Contributor

We've spoken about this in person, but I wanted to leave feedback here to hopefully help drive this proposal forward. I stated this earlier, but I think that the current behavior of this JEP would be confusing.

I think a better approach would be to allow for variables to be bound to specific scopes, which could then be referenced using a specific sigil. The first sigil that comes to mind for me is $. Perhaps a let function could have the same signature it does now, but instead of adding arbitrary identifiers to the bound expref, it could add named variables. These variables would then be referenced using $name notation, with a var = "$" identifier ABNF grammar rule.

Each scope would have access to the variables bound to that scope and any parent scope, and any variables bound in a specific let function that have the same name as the parent scope would override the parent scope. I don't think there needs to be a specific way to get the value of the same name from a parent scope (you could copy a variable binding to a new name in the child scope to work around this if necessary). Accessing an unbound variable would result in a parse or runtime error (depending on the sophistication of the parser).

Here's an example.

Given {"a": 0}

let({foo: `"bar"`, baz: a}, &
  [$foo, $baz])

Result: ["bar", 0]

What do you think?

@jamesls
Copy link
Member Author

jamesls commented Apr 12, 2015

I like that better, though we still have to define where exactly these variables are valid. For instance, in my example above I had let({foo: foo}, &[0].foo). The equivalent expression using the $ suggestion would be let({foo: foo}, &[0].$foo), which I wouldn't want to allow. However, I think it's reasonable to support something like $foo.bar. So we'd need updated grammar rules for when you can use these variables. It might be easier to describe this once #2 is done.

As for the specific char, I think my only hesitation was what character we'd use for dereferencing an expref (likely a later JEP). Given the expref char is &, it would have been awesome to use * as the complement, but I'm not entirely sure off hand how feasible that is given the current use of * in the existing grammar. Other than that, I think $ works.

@mtdowling
Copy link
Contributor

Awesome. Maybe we can use @ to deref (like clojure)? I think this could be done in a way that does not conflict with the current-node grammar.

I think a variable should be allowed in the same places you would see the current-node token.

@jamesls
Copy link
Member Author

jamesls commented Apr 14, 2015

Not opposed to it, but my preference is to use a separate token that's not being used yet if possible.

@mtdowling
Copy link
Contributor

I was thinking about let expressions recently, and I thought of a potential new syntax that might make it easier to read and work with. It would require built-in support and not utilize functions. We would add a new operator = that is an assignment. This would assign a variable to a named value returned by an expression, and you would then pipe this to expressions that would have the bound scope.

This expression would assign $foo to the current node and pipe that bound variable to an expression.

foo = @ | $foo.bar

We could add destructuring as well:

{foo: bar, baz: bam} = @ | not_null($foo, $baz)

The above expression would assign $foo to @.baz and $baz to @.bam.

We could potentially add list destructuring as well, but I'm not sure if it's necessary:

[foo, bar] = @ | not_null($foo, $bar)

The above expression would take @[0] and assign it to $foo and @[1] to $bar.

Each variable assignment would only be scoped to child subexpressions. So given the following expression, trying to get the value of $foo would be null (or maybe fail?).

[foo = bar | $foo.qux, $foo]


Another option, instead of using = with destructuring, we could use {} => RHS where {} are key value pair bindings exactly how multi-select-hash works, and RHS is the expression to evaluate with those bindings:

{foo: bar, baz: bam} => $foo

The new ast node would be something like Assignment with a LHS and RHS. LHS are the bindings and RHS is the expression to interpret. One advantage of using this over pipe is that we can have a much more optimized tree interpreter that doesn't need to worry about pushing and popping binding frames. If we just have something like an Assignment node, then we know exactly where to push and pope frames.

Finally, I would also recommend that expression references would now close over the variable bindings available when they are created. When executing an expref, you would merge in the current bindings over the closed over bindings.

The => syntax is similar to Scala's anonymous functions: http://docs.scala-lang.org/tutorials/tour/anonymous-function-syntax.html. And C#: https://msdn.microsoft.com/en-us/library/bb397687.aspx. And Hack: https://docs.hhvm.com/hack/lambdas/introduction. And D: https://dlang.org/spec/expression.html#Lambda.

Various other languages use ->. Lots of inspiration and comparisons can be made using this collection of languages: http://rosettacode.org/wiki/Higher-order_functions

@ghost
Copy link

ghost commented Sep 27, 2019

I want this!

@innovate-invent
Copy link

innovate-invent commented Mar 5, 2021

I took a different approach to how this was implemented. Rather than try to implement a stack I just used the call stack for the scope. This required making all visit_* functions accept **kwargs and forward the kwargs to any self.visit() call within those functions. Any time let() is called it creates a new scope dict and merges in any previous scope dict and passes it to the visit() call.
I found with your implementation that it was possible for the scope to be popped if a expression ever returns a deferred value (such as a generator). When the generator goes to access the scope stack, that value will have already been popped.

My implementation can be viewed here: https://github.com/brinkmanlab/BioPython-Convert/blob/master/biopython_convert/JMESPathGen.py

Forwarding kwargs also paves the way for future functionality to pass arbitrary values down the expression stack.

A third alternative if the kwargs solution is not satisfactory is to modify the visit_field() function to leverage the inspect library to race up the call stack to the call to _func_let() and pull out a local scope variable. This is demonstrated here: https://stackoverflow.com/a/14694234

@jamesls
Copy link
Member Author

jamesls commented Mar 10, 2023

It looks like this feature is picking up interest again, so to help
move this discussion forward, here's my thoughts coming back to
this after a while:

I'm mostly convinced that a let() function is the wrong thing to do. We are
fundamentally changing how we reference values from the semantics of an
implicit "current node" context to this idea of binding and referencing values
with scope. Putting this behind a special function that has the ability to
change this process would not only be confusing to users (you just have to know
that let() is magic) but also a source of errors that would be hard to track
down. We've already seen this to some extent in various comments (which
identifiers should be looked up in scope &foo.[a, b]), which leads to some
incredibly unintuitive behavior with what I originally proposed:

>>> data = {'foo': {'bar': {}}}
>>> # Alternative results depending on if the number of `.bar`'s is even or odd!
>>> jmespath.search('let({bar: foo}, &foo.bar)', data)
{}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar)', data)
{'bar': {}}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar)', data)
{}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar.bar)', data)
{'bar': {}}
>>> jmespath.search('let({bar: foo}, &foo.bar.bar.bar.bar.bar)', data)
{}

So where does that leave us?

Before working through alternative proposals, here's the properties I think the
design should have:

  1. There's a syntactic distinction between the existing "current node" lookups
    and scoped variable lookups.

Rationale: They are fundamentally different types of lookups with different
evaluation rules (undefined variables should be an error), so having explicit
syntax lets the user clearly state their intent. It also lets implementations
more easily optimize variable lookups as we're not touching the existing
current node lookup process. I liked the proposal with the $myvar syntax.

  1. We create a new tokens/syntax to indicate we want to evaluate an expression with a
    given scope. That can be a combination of assignment tokens to denote
    bindings and/or a token to delimit the start of the evaluated expression, e.g. ->,
    =>, etc. I'm even open to the possibility of introducing keywords into the
    language. We'll have to consider the backwards compatibility constraints, but
    if it makes the expressions easier to read I'm open to it.

Rationale: In addition to simplifying the parser (a starting keyword/token would
let a parser immediately know it should parse this as a scope lookup), it also
simplifies the evaluation process because it defines exactly where the scope is
valid. In looking at approaches other query/expression languages take,
reusing an existing token such as | to denote the start of the expression to
evaluate as well as a expr | expr pipe expression makes it harder to see
which version you're using (is it a let expression or a pipe expression). You
also have to know the precedence rules of both to know the expression boundaries.

Updated proposal

So here's an updated proposal to get the ball rolling. The reason we
originally proposed the let function was because just about every functional
programming language has some let <bindings> in <expr> syntax:

So let's just use that syntax. In pseudo-grammar:

lex-expr: let <bindings> in <expr>
bindings: varref = <expr>
        | placeholder for alternate destructuring syntax
varref: $<identifier>

Examples:

Basic usage, binding top level keys:

Expression:

let $newvar = top in foo.{foo: bar, other: $newvar}

Input:

{'foo': {'bar': 'baz'}, 'top': 'top-value'}

Result:

{'foo': 'baz', 'other': 'top-value'}

Chained scope showing root key binding, an inner key binding,
and an inline let expression within a multiselect. The
let each = @ in $each.z is the same thing as z, but I
wanted to show you can use a let inline wherever you could
normally use an expression:

Expression:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]


Input:

{
    'foo': {'bar': 'baz'},
    'top': {'other': 'other-value'},
    'bar': {
        'listval': [{'a': 1, 'z': 5}, {'a': 2, 'z': 6}, {'a': 3, 'z': 7}],
        'barscope': 'innervar',
     }
}


Result:

[
  [1, 5, 'innervar', 'other-value'],
  [2, 6, 'innervar', 'other-value'],
  [3, 7, 'innervar', 'other-value']
]

Let me know what you all think. If this seems like the right direction,
I can update the JEP and sketch out the python implementation to see
if any issues come up.

cc @mtdowling

@eddycharly
Copy link

eddycharly commented Mar 10, 2023

@jamesls thanks for updating the proposal !

Am I right thinking that in the new proposal the notion of scopes is gone ?

I think we need to make sure things like let $newvar = top in foo.{foo: bar, other: $newvar.$newvar} is not allowed, basically it should not be a fallback of trying to access a field.

And in your example let each = @ in $each.z shouldn't it be let $each = @ in $each.z ($ is missing when declaring the binding) ?

@jamesls
Copy link
Member Author

jamesls commented Mar 10, 2023

Am I right thinking that in the new proposal the notion of scopes is gone ?

Scopes are still there. A $var lookup will try to lookup the variable in its current scope, then in its parent scope in succession until there are no scopes left. So in the last example:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

The $each is pulled in inner most scope (within the multiselect), the $barscope is pulled from the parent scope let $barscope = barscope and the $topkey is pulled from the first let statement in the outer most scope. Similarly, the bindings are only valid within the expression of the let, so if the last line instead was:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other] | $each
                                                                                 ^^^^^
                                                                                 |
                                                                                 ---- Invalid, $each doesn't exist anymore                   

you'd get an error because $each doesn't exist anymore.

I think we need to make sure things like let $newvar = top in foo.{foo: bar, other: $newvar.$newvar} is not allowed, basically it should not be a fallback of trying to access a field.

Yep that's why I like the idea of an explicit sigil for variable references, e.g. $foo. The grammar rule would have a new varref = '$' identifier, so $newvar.$newvar wouldn't be allowed by the parser because the sub-expression rule would stil be:

sub-expression    = expression "." ( identifier /
                                     multi-select-list /
                                     multi-select-hash /
                                     function-expression /
                                     "*" )

And in your example let each = @ in $each.z shouldn't it be let $each = @ in $each.z ($ is missing when declaring the binding) ?

Good catch, updated.

@eddycharly
Copy link

eddycharly commented Mar 10, 2023

Scopes are still there. A $var lookup will try to lookup the variable in its current scope, then in its parent scope in succession until there are no scopes left. So in the last example:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

It looks to me that we don't need to chain scopes anymore, at any point a flat binding structure would be enough.

The $each is pulled in inner most scope (within the multiselect), the $barscope is pulled from the parent scope let $barscope = barscope and the $topkey is pulled from the first let statement in the outer most scope. Similarly, the bindings are only valid within the expression of the let, so if the last line instead was:

    let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other] | $each
                                                                                 ^^^^^
                                                                                 |
                                                                                 ---- Invalid, $each doesn't exist anymore                   

you'd get an error because $each doesn't exist anymore.

You mean at compile time ? Because we can now validate at compile time a reference to a binding is valid or not ?

Yep that's why I like the idea of an explicit sigil for variable references, e.g. $foo. The grammar rule would have a new varref = '$' identifier, so $newvar.$newvar wouldn't be allowed by the parser because the sub-expression rule would stil be:

sub-expression    = expression "." ( identifier /
                                     multi-select-list /
                                     multi-select-hash /
                                     function-expression /
                                     "*" )

I guess $newvar.$newvar wouldn't translate to:

{
  "type": "Subexpression",
  "children": [
    {
      "type": "Field",
      "name": "$newvar"
    },
    {
      "type": "Field",
      "name": "$newvar"
    }
  ],
  "jmespathType": "Expref"
}

Right ?

@jamesls
Copy link
Member Author

jamesls commented Mar 10, 2023

It looks to me that we don't need to chain scopes anymore, at any point a flat binding structure would be enough.

Keep in mind you can shadow variables from an outer scope. Here's a somewhat convoluted example that demonstrates the idea. In this example, suppose I'm calling this from some host language where the input data is called data, and the evaluation result is called results (and pretend jmespath has comments with //).

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // $scope is `root`
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // $scope is now `root['subscope1']`
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // $scope is now `root['subscope1']['subscope2']
        in $scope.key                 // <---- results[2]
      ],                              // $scope is now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // $scope is now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]



Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Notice how every expression in the results list is $scope.key, but depending on where we are in the scope chain, they can evaluate to different values. So we still need the concept of pushing/poping scopes to support this lexical scoping.

You mean at compile time ? Because we can now validate at compile time a reference to a binding is valid or not ?

In theory yes. Several tools let you provide an initial scope as part of the evaluation (e.g. to seed in environment variables when evaluating jmespath from a CLI or in general to pull in data from the outside world), so you wouldn't be able to verify it at compile time, but you could validate it before evaluating the top level expression by collecting the set of free variables in the top level closure and verifying that the initial seed scope binds all the free variables. At any rate, I wouldn't want the spec to require that you fail at compile time for free variables to allow implementations to support this use case. I would like to have a minimum requirement that a runtime failure occurs for any references to variables that don't exist.

I guess $newvar.$newvar wouldn't translate to:

No. I haven't worked out the exact grammar rules, but to translate the pseudo-grammar I initially used into ABNF, roughly:

let-expression: "let" bindings "in" expression
bindings: variable-ref "=" expression
variable-ref: "$" identifier

So there's no expref (that comes from the & char), and a $ isn't a valid starting character for a Field, so you'd never be able to have that node either. Something like $foo.bar would be:

type: sub-expression
children:
  - type: variable-ref
    name: foo
  - type: field
    name: bar

@eddycharly
Copy link

eddycharly commented Mar 10, 2023

Keep in mind you can shadow variables from an outer scope. Here's a somewhat convoluted example that demonstrates the idea. In this example, suppose I'm calling this from some host language where the input data is called data, and the evaluation result is called results (and pretend jmespath has comments with //).

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // $scope is `root`
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // $scope is now `root['subscope1']`
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // $scope is now `root['subscope1']['subscope2']
        in $scope.key                 // <---- results[2]
      ],                              // $scope is now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // $scope is now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]



Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Notice how every expression in the results list is $scope.key, but depending on where we are in the scope chain, they can evaluate to different values. So we still need the concept of pushing/poping scopes to support this lexical scoping.

Still, I don't quite get why a flat structure can make it:

Input:

root = {
    'key': 'rootvalue',
    'subscope1': {
        'key': 'subscope1-value',
        'subscope2': {
            'key': 'subscope2-value'
        }
    }
}

Expression:

let $scope = @                        // bindings = { $scope = `root` }
in [
  $scope.key,                         // <---- results[0]
  $scope.subscope1.[                  // "Current node" changes from sub-expr
    let $scope = @                    // bindings = { $scope = `root['subscope1']` }
    in [
      $scope.key,                     // <---- results[1]
      $scope.subscope2.[              // "Current node" changes from sub-expr
        let $scope = @                // bindings = { $scope = `root['subscope1']['subscope2'] }
        in $scope.key                 // <---- results[2]
      ],                              // bindings are now back to `root['subscope1']`
      $scope.key                      // <---- results[3]
    ][]                               // bindings are now back to `root`
  ],
  $scope.key                          // <---- results[4]
][][]

Result:

results = ['rootvalue', 'subscope1-value', 'subscope2-value', 'subscope1-value', 'rootvalue']

Bindings should be treated as immutable, writing a new key in a binding should not modify it but return a new binding that will be used in all sub expressions.

Current implementations usually look like interpreter.Execute(node, data) (node is the current ast node, data is the input object). We just need to change it to interpreter.Execute(node, data, bindings).

In theory yes. Several tools let you provide an initial scope as part of the evaluation (e.g. to seed in environment variables when evaluating jmespath from a CLI or in general to pull in data from the outside world), so you wouldn't be able to verify it at compile time, but you could validate it before evaluating the top level expression by collecting the set of free variables in the top level closure and verifying that the initial seed scope binds all the free variables. At any rate, I wouldn't want the spec to require that you fail at compile time for free variables to allow implementations to support this use case. I would like to have a minimum requirement that a runtime failure occurs for any references to variables that don't exist.

Ok so you want to allow referencing a binding that hasn't been previously declared ?
This is a detail but most languages will fail to compile when referencing an undeclared variable.
I guess we can easily support both with different compilation functions though (Compile vs CompileStrict).

No. I haven't worked out the exact grammar rules, but to translate the pseudo-grammar I initially used into ABNF, roughly:

let-expression: "let" bindings "in" expression
bindings: variable-ref "=" expression
variable-ref: "$" identifier

So there's no expref (that comes from the & char), and a $ isn't a valid starting character for a Field, so you'd never be able to have that node either. Something like $foo.bar would be:

type: sub-expression
children:
  - type: variable-ref
    name: foo
  - type: field
    name: bar

Got it, type: variable-ref was my question, resolving a binding becomes a specific ast node, not just a fallback in the implementation of the field node 👍

@eddycharly
Copy link

eddycharly commented Mar 10, 2023

To illustrate the idea of bindings in a flat structure (map):

// flat:    {}
// chained: `null`
    let $topkey = top
    // flat:    { $topkey = `top` }
    // chained: { $topkey = `top` } -> `null`
    in
        bar |
        let $barscope = barscope
        // flat:    { $topkey = `top`, $barscope = `barscope` }
        // chained: { $barscope = `barscope` } -> { $topkey = `top` } -> `null`
        in
            ....

Every time a let is started we clone the current bindings map, add the new binding (or overwrite an existing one) and this becomes the new bindings map used to evaluate nodes in the in expression. (actually i don't really care about flat vs chained, more about immutability and i would like to avoid a stack based approach, I don't want to push/pop scopes).

@jamesls
Copy link
Member Author

jamesls commented Mar 10, 2023

Still, I don't quite get why a flat structure can make it:

Ahh, got it, you're asking about implementation details, that's what I was missing. Thought we were still discussing whether or not there will be lexical scoping, which there will be with this proposal. The spec should be careful to avoid requiring a specific implementation, so libraries are free to implement it however they'd like provided all the compliance tests pass. The spec will probably just say "lexical scope" with an explanation similar to this that talks about variables in terms of their visibility/lifetime and avoid any talk of pushing/popping scope. The chained thing was just how I've implemented this type of thing in the past and a common way to implement it. Personally, I'd avoid taking an O(n) copy each time you enter a new scope. You also don't have to mutate anything, you could do scope = makeScope(newbindings, scope) on entering a new scope, and have scope be a struct of type scope struct { bindings map[string]whatever; parent *Scope }, but again implementations are free to handle this however they'd like.

At any rate, I think that's getting ahead of ourselves. Right now, I'd like to focus on whether or not this feature is useful from an end user's perspective and if there's demand for this. I'd rather start with the ideal syntax/semantics for users, and then modify if needed if it'll create implementation difficulties.

@eddycharly
Copy link

I agree it's more on the implementation side and is out of scope in this discussion.

To me this is an extremely useful feature, allowing to reference parent elements opens doors to more complex/interesting queries. I was a huge fan of the proposal until I started playing with it and discovered the confusing part of it.
Now, if we can remove the confusing bits it would be my most wanted feature.

@springcomp
Copy link
Contributor

springcomp commented Mar 11, 2023

@jamesls thank you very much for weighing in!

I really appreciate your new insight into this feature.

I strongly think lexical scoping is needed although and I think we are all mostly convinced that what brings confusion when using the let() function, is not the let() function itself, but as you rightly pointed out, the scoped-variable lookup.

I must admit I was a bit confused by the inital look into your examples of a let $var in \<expression> syntax but must admit it makes sense overall.

Following your discussion with @eddycharly, I’m coming to the conclusion that JEP-11 as it stands is mostly complete. It would only require a slight update that mandates using a new sigil like $var as proposed here to close the loop of most confusing usages and prevent using scopes as a fallback when evaluating an identifier in the right-hand-side of a sub-expression. @eddycharly that’s what you suggested indeed while coming up with a potential ref() function to explicitly accessing the scope.

That said, I do not dislike this updated proposal, although maybe we should explore alternate tokens as well, as reserved keywords may feel a bit foreign if limited to this only usage.

Scoped variable lookups

For the record, your first example would be possible with an equivalent jep-11 expression, It would work as still exhibiting the undesirable fallback behaviour. So I will use an hypotheticallly updated syntax using the $var syntax as well.

Given:

{"foo": {"bar": "baz"}, "top": "top-value"}

The following two expressions are equivalent:

  • proposal: let $newvar = top in foo.{foo: bar, other: $newvar}
  • jep-11 equivalent: let({newvar: top}, &foo.{foo: bar, other: $newvar})

Given:

{
    "foo": {"bar": "baz"},
    "top": {"other": "other-value"},
    "bar": {
        "listval": [
			{"a": 1, "z": 5},
			{"a": 2, "z": 6},
			{"a": 3, "z": 7}
		],
        "barscope": "innervar"
     }
}

Your proposed expression:

let $topkey = top
    in
        bar |
        let $barscope = barscope
        in
            listval[*].[a, let $each = @ in $each.z, $barscope, $topkey.other]

The jep-11 equivalent expression would be:

let(
  {topkey: top},
  &bar|let(
    {barscope: barscope},
    &listval[*].[a, let({each: @}, &$each.z), $barscope, $topkey.other]))

While the notion of lexical scope will not go away, it would be entirely controlled by the nesting of expressions.

My only qualm with this syntax using reserved keywords is that the in keyword might be confusing. It reads as if $var is taken from the next part of the expression, whereas the full scope is determined unambiguously right before in.

So maybe an alternate keywords would be more intuitive? What about then as in:

  • let $newvar = top then foo.{foo: bar, other: $newvar}

Or a new set of keywords like:

  • with $newvar = top eval foo.{foo: bar, other: $newvar}

New tokens / syntax

Introducing new keywords into the language seems a bit too drastic a change at first. But I am welcoming such a change.

It could pave the way for more simplifications in the future. For instance, this would allow us to abandon the backtick JSON-literals which would be rendered useless, provided we:

Together with multi-select-hash and multi-select-list, emitting JSON would be entirely possible without using backticks at all.

Exploring this proposal with new tokens, I would like to suggest the following epressions for the two examples that you have shown:

  • $newvar := top => foo.{foo: bar, other: $newvar}

Assignment of scope would be done using = or := tokens.

I toyed with the idea of introducing lambda-expression constructs to the language while discussing a potential reduce feature. So the => token could be the remainder of the expression evaluation.

The second example would look like so (and we really need parsers to keep track of line and column numbers to accurately report errors 😁):

$topkey := top => 
  bar |
  $barscope := barscope =>
    listval[*].[
      a,
      $each := @ => $each.z,
      $barscope,
      $topkey.other
    ]

@springcomp
Copy link
Contributor

I think the alternatives thus far far all have in common that referring to a scoped variable should be explicit rather than being the result of a fallback when evaluating an identifier to null.

So as always alternatives fall into two categories, new syntax/tokens, vs new functions.

The updated proposal here introduces new keywords as well which would introduce yet another layer of open potential for JMESPath in the future 🙂.

Scope variable lookup

  • New syntax $var (this proposal)
  • New syntax *var (dereference syntax)
  • New function lookup('var')

A new function would be cumbersome as the identifier needs to be specified as a string. However it would allow dynamically constructing the identifier to lookup from the scopes and open up new possibilities.

Creating scope

  • JEP-11 let({scope: expression}, &expression)
  • Assignment $scope := expression
  • This proposal let $scope = expression

As it stand, I liked introducing a scope in the let() function and I would favor using the same approach in the future when specifying the initial seed value in a potential future reduce() function.

@eddycharly
Copy link

eddycharly commented Mar 11, 2023

Another approach that could work without modifying the current grammar:

  • let defines lexical scopes and nothing more
  • create another function to make the scope chain the current context (for example in)
let(
  {topkey: top},
  &bar | let(
    {barscope: barscope},
    &listval[*].[a, let({each: @}, &in(&each.z)), in(&barscope), in(&topkey.other)]))

Basically when inside the in function the current context is the scope chain.

This would allow the creation of isolated scopes that do not inherit from the parent too.

@eddycharly
Copy link

eddycharly commented Mar 11, 2023

// scope chain: {}
let(
  // scope chain: {root: @} -> {}
  {root: @},
  &in(
    // we enter `in`, scope chain becomes the current context and scope chain is reset to {}
    let(
      // scope chain: {newroot: @} -> {}
      {newroot: @},
      &in(
        // we enter `in`, scope chain becomes the current context and scope chain is reset to {}
        // here the parent scope was not inherited and root.field does not exist, it should be newroot.root.field
        &root.field
      )
    )
  )
)

EDIT:

With the design above I wonder if the scope chain makes sense, the first argument of let produces the new lexical scope and this lexical scope can become the current context by invoking in.

@eddycharly
Copy link

Same thing without the notion of scope chains, just a single lexical scope:

// lexical scope: { foo: "baz" }
// current context: { foo: "bar" }
let(
  // lexical scope becomes: { root: { foo: "bar" }, parent: { foo: "baz" } }
  { root: @, parent: in(&@) },
  &in(
    // we enter `in`, lexical scope is reset to null
    // current context is now: { root: { foo: "bar" }, parent: { foo: "baz" } }
    let(
      // lexical scope becomes: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
      { newroot: @ },
      &in(
        // we enter `in`, lexical scope is reset to null
        // current context is now: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
        &[ newroot.root.foo, newroot.parent.foo ]
      )
    )
  )
)

The in function could also be replaced by a lexical_scope function returning the current lexical scope:

// lexical scope: { foo: "baz" }
// current context: { foo: "bar" }
let(
  // lexical scope becomes: { root: { foo: "bar" }, parent: { foo: "baz" } }
  // current context is not modified
  { root: @, parent: lexical_scope() },
  &let(
    // lexical scope becomes: { newroot: { root: { foo: "bar" }, parent: { foo: "baz" } } }
    // current context is not modified
    { newroot: lexical_scope() },
    &lexical_scope() | [ newroot.root.foo, newroot.parent.foo ]
  )
)

@eddycharly
Copy link

eddycharly commented Mar 11, 2023

@jamesls @springcomp WDYT ?
Do we need something more complicated than what I described the above ?

@mtdowling
Copy link
Contributor

I like the idea of dedicated syntax for this since I think it can make it more readable than frequent usage of “&”, and it doesn’t need us to make a let function special. We’d probably make it the only function that could introduce new scoped variables. I don’t think other functions have arguments that cause side effects for other function arguments either (each argument is isolated and based only on current node).

I like the “$” syntax to access variables too as that removes ambiguity around whether a value is from current node or scoping.

The discussion around new keywords is super interesting. Adding true/false/null keywords and non-backtick numbers would be nice. I’d be concerned about it breaking existing expressions though. Maybe this should be a different discussion than this JEP though.

@springcomp
Copy link
Contributor

springcomp commented Mar 11, 2023

I don’t think other functions have arguments that cause side effects for other function arguments either

@mtdowling while brainstorming a potential design for 'reduce', we struggled to find a satisfying design. IMHO, a function that takes an object to specify the accumulated identifier and it's seed value is the most elegant solution. Also this would neatly complement the existing map() function.

This would leverage the pattern introduced by let() as initially proposed.

A reduce does have a side effect on its second expression argument. In that case reduce() and let() would share a consistent syntax.

The discussion around new keywords is super interesting. Adding true/false/null keywords and non-backtick numbers would be nice. I’d be concerned about it breaking existing expressions though. Maybe this should be a different discussion than this JEP though.

Of course, this topic is a separate discussion. However we found that cross-pollination of ideas make the overall design more consistent.

@jamesls
Copy link
Member Author

jamesls commented Mar 15, 2023

I think we are all mostly convinced that what brings confusion when using the let() function, is not the let() function itself, but as you rightly pointed out, the scoped-variable lookup.

It's both for me. We are introducing new semantics into the language with scoped variable lookups. We're making this explicit now with the $ref syntax but it was always part of this proposal. If $ref is now a core feature of the language, it follows that assigning these variables should symmetrically be a core language feature as well, hence the let expressions vs. the function. Setting these variables via let() requires let to be special and have access to the scope/symbol table, which no other functions have. This "special casing" is something I'd like to avoid in the design.

My only qualm with this syntax using reserved keywords is that the in keyword might be confusing. It reads as if $var is taken from the next part of the expression, whereas the full scope is determined unambiguously right before in.

The original motivation for this proposal was to borrow from functional programming language terminology that often use let (I linked a few in my updated proposal). I'd like to keep in line with that motivation as long as we're still using the keyword let which is why I proposed the let / in terms.

To me, I read these expressions as "let these variables equal these values in the following expression". While I can understand there may be confusion with this naming, I don't think it's a strong enough motivator to warrant breaking from existing conventions. I think there's value in reusing terminology that will be familiar to (at least some subset of) users.

$newvar := top => foo.{foo: bar, other: $newvar}
...
while brainstorming a jmespath-community/jmespath.spec#48 (reply in thread) for 'reduce',

These features require some notion of creating anonymous functions with arguments. The & is conceptually an anonymous function with no arguments. We would need to expand on that to support something like reduce. This is also why I think let in its proposed function form with exprefs conceptually doesn't make sense, as that's not how you would introduce new scope with functions, instead you'd do that with arguments.

A syntax of $newvar := top => ... is defining a function, but you would then need to invoke the function with the corresponding params.

While it would be possible to define let in terms of anonymous functions, I think it is a common enough occurrence that it deserves its own syntax (as most languages do). For example, making up a syntax of | ... | for args, similar to ruby blocks or rust closures (which to be clear I'm not proposing here), it's conceptually a shorthand for defining a function and immediately invoking it:

&( |{$x, $y}| => [$x, $y])({x: 'foo', y: 'bar'})    // returns ['foo', 'bar']

Or translated to typescript as an IIFE:

(({x, y}) => ([x, y]))(
    {x: "foo", y: "bar"}
);

// returns ['foo', 'bar']

At which point, you don't need a let() function, it's redundant to the examples above, while still adhering to the constraint of having a built in language feature to introduce variable bindings. Here it's the |{$x, $y}| syntax.

Following your discussion with @eddycharly, I’m coming to the conclusion that JEP-11 as it stands is mostly complete.

I see the value in leaving this proposal unchanged for historical reasons. The next step for me is to create a new JEP with the updated proposal that obsoletes this one. We've seen enough interest in this feature that it makes sense to move forward with an updated formalized proposal.

@springcomp
Copy link
Contributor

@jamesls @springcomp WDYT ? Do we need something more complicated than what I described the above ?

I kinda (lazily) liked the let() function but I’m happy to let it go. I think @jamesls makes compelling arguments.
I’m happy to go ahead with a new proposal.

Should we prototype this first ?
I’m happy to assist in drafting the proposal for the sake of helping moving this forward. Of course, keeping credits to @jamesls .

@springcomp
Copy link
Contributor

springcomp commented Mar 18, 2023

@jamesls I’m currently prototyping this feature and have a couple of reservations.
Most implementations use a top-down parser and introducing keywords might be more suitable fo lex/yac style of parsing.
With the limited number of constructs introduced if this proposal goes through, that is not too bad.
I’m concerned, however, with the precedent to introduce keywords in the language

For instance it looks like, parsing the following expression should be valid and intuitive.

reservations[*].in[?bar==`1`]

Instead, the author must remember all the keywords from the language, and make sure to use quoted-identifier instead.
Or do we want to contextually treat in as an identifier, based on the previous parsing context ?

@springcomp
Copy link
Contributor

springcomp commented Mar 19, 2023

A syntax of $newvar := top => ... is defining a function, but you would then need to invoke the function with the corresponding params.

I'm not quite on the same page. This would rather be a means to specify the scope on the LHS of the lambda arrow and then immediately evaluate the RHS in the context - no pun intended - of the scope.

Maybe better formalized like:

expression /= let-expression
let-expression = multi-select-hash "=>" expression

While this design is a matter of subjective taste, I fail to see how that could not map directly to the first and second arguments of the JEP-11 let() function. The second argumen to let() was an expression-type but only because it was an argument to a function.

To be complete, we would have a new token type for explicit lookup of scoped identifiers:

expression /= reference
reference = "$" identifier

@springcomp
Copy link
Contributor

For the sake of advancing the discussion, I have tried to sum up all the main themes in a single post.

@jamesls
Copy link
Member Author

jamesls commented Mar 23, 2023

Hey everyone, quick update. I've created a new proposal based on the discussions in various threads: jmespath/jmespath.jep#18

I know this proposal has sat untouched for quite some time and there's been a lot of feedback and time put into exploring numerous ideas and alternatives (myself and everyone in this thread included), so thanks to everyone that's contributed.

In an effort to try and consolidate discussions, I'm going to close this issue and ask that we move discussions over to the jmespath.jep repo and the linked PR. I've also referenced this thread in the updated JEP so people still have a link to all the previous discussion. Going forward, I'll triage through the various repos in this org and move suggestions/ideas/proposals for new JMESPath features over to the issue tracker in the jmespath.jep repo. That way there's a single location to track changes to the language instead of being spread across various language implementation repos and the jmespath doc site.

Thanks again for all the interest in this feature, really excited for this to be added to JMESPath!

@jamesls jamesls closed this Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants