-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JEP 11: The let() Function #6
Conversation
JEP 11, introduced via jmespath/jmespath.site#6.
Proposed in jmespath/jmespath.site#6. I tried to pick an implementation that was as minimally invasive as possible. It works by making three changes. First we need to track scope, and share this information between the interpreter and the function module. They both take a reference to a scope object that allows you to push/pop scopes. The ``let()`` function will push the user provided lexical scope onto the scope chain before evaluating the expref, and pop the scope after evaluating the expref. The second change needed is to change how identifiers are resolved. This corresponds to visiting the ``field`` AST node. As detailed in JEP 11, after failing to resolve the field in the current object, we call back to the scope chain. The third change is to bind the current value (the context) in which an expref is first created. This wasn't needed before because for functions that take an expref, such as ``sort_by``, ``max_by``, and ``min_by``, they evaluate the expref in the context of each list element. However, with ``let()``, we want to evaluate the expref in the context of the current object as specified when the expref was created. This also tracks the current object properly in the case of nested ``let()`` calls.
JEP 11, introduced via jmespath/jmespath.site#6.
Ok I've linked: So far everything seems reasonable. Semantics make sense to me. We're borrowing from existing languages where this feature has been around for a really long time so we're using well tested concepts. The python implementation was really straightforward and concise (IMO). I'd imagine other implementations would have the same order of magnitude of code changes to implement this. I'm curious to hear what others think, but this JEP is growing on me. |
cc @kyleknap @danielgtaylor, if you want to comment, though don't feel obligated. |
|
||
Prior to this JEP, identifiers are resolved by consulting the current context | ||
in which the expression is evaluted. For example, using the same | ||
``search`` function as defined in the JMESPath specification, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part threw me off, some a minor point of feedback might be to link to what you're talking about. I thought search
was a JMESPath function.
This is awesome. While it does complicate JMESPath a bit and requires non-trivial changes to the php, javascript, lua, clojure, ruby, etc... implementations... I think this feature is worth it. 👍 |
Proposed in jmespath/jmespath.site#6. I think the code could use a bit of cleanup, but all the compliance tests are passing.
FWIW, I also have an initial implementation of the javascript version as well (jmespath/jmespath.js@25c22b2), and I would say the implementation is almost borderline trivial, which is what I like (and surprised me) about this JEP. Same thing for the python implementation. Really all I needed was:
There might be a better way to do this, but as far as I can tell, this can handles all the cases I can think of, is pretty easy to follow, and has very minimal runtime overhead. Should also be easy to generalize if other functions or things in the runtime want the ability to create lexical scopes. |
Although I did notice a few interesting scenarios. First one's not so bad, but consider this:
In this scenario we're evaluating The second one is more interesting. Right now, trying to evaluate an identifier on something that's not a JSON object, results in null:
However, what about something like this:
What would you expect that to print? The JEP isn't actually clear about this. You could say this is evaluated as:
On the other hand, you could say trying to evaluate an identifier with an input/current object that's not a JSON object will result in a |
The first case you brought up is interesting. I feel that the behavior you've outlined is confusing and not what I'd expect.
In this case, I think that once you've descended into |
We've spoken about this in person, but I wanted to leave feedback here to hopefully help drive this proposal forward. I stated this earlier, but I think that the current behavior of this JEP would be confusing. I think a better approach would be to allow for variables to be bound to specific scopes, which could then be referenced using a specific sigil. The first sigil that comes to mind for me is Each scope would have access to the variables bound to that scope and any parent scope, and any variables bound in a specific let function that have the same name as the parent scope would override the parent scope. I don't think there needs to be a specific way to get the value of the same name from a parent scope (you could copy a variable binding to a new name in the child scope to work around this if necessary). Accessing an unbound variable would result in a parse or runtime error (depending on the sophistication of the parser). Here's an example. Given
Result: What do you think? |
I like that better, though we still have to define where exactly these variables are valid. For instance, in my example above I had As for the specific char, I think my only hesitation was what character we'd use for dereferencing an expref (likely a later JEP). Given the expref char is |
Awesome. Maybe we can use @ to deref (like clojure)? I think this could be done in a way that does not conflict with the current-node grammar. I think a variable should be allowed in the same places you would see the current-node token. |
Not opposed to it, but my preference is to use a separate token that's not being used yet if possible. |
I was thinking about let expressions recently, and I thought of a potential new syntax that might make it easier to read and work with. It would require built-in support and not utilize functions. We would add a new operator This expression would assign
We could add destructuring as well:
The above expression would assign We could potentially add list destructuring as well, but I'm not sure if it's necessary:
The above expression would take Each variable assignment would only be scoped to child subexpressions. So given the following expression, trying to get the value of
Another option, instead of using
The new ast node would be something like Finally, I would also recommend that expression references would now close over the variable bindings available when they are created. When executing an expref, you would merge in the current bindings over the closed over bindings. The Various other languages use |
I want this! |
I took a different approach to how this was implemented. Rather than try to implement a stack I just used the call stack for the scope. This required making all visit_* functions accept **kwargs and forward the kwargs to any self.visit() call within those functions. Any time let() is called it creates a new scope dict and merges in any previous scope dict and passes it to the visit() call. My implementation can be viewed here: https://github.com/brinkmanlab/BioPython-Convert/blob/master/biopython_convert/JMESPathGen.py Forwarding kwargs also paves the way for future functionality to pass arbitrary values down the expression stack. A third alternative if the kwargs solution is not satisfactory is to modify the visit_field() function to leverage the inspect library to race up the call stack to the call to _func_let() and pull out a local scope variable. This is demonstrated here: https://stackoverflow.com/a/14694234 |
# Conflicts: # docs/proposals.rst
It looks like this feature is picking up interest again, so to help I'm mostly convinced that a
So where does that leave us?Before working through alternative proposals, here's the properties I think the
Rationale: They are fundamentally different types of lookups with different
Rationale: In addition to simplifying the parser (a starting keyword/token would Updated proposalSo here's an updated proposal to get the ball rolling. The reason we
So let's just use that syntax. In pseudo-grammar:
Examples: Basic usage, binding top level keys:
Chained scope showing root key binding, an inner key binding,
Let me know what you all think. If this seems like the right direction, cc @mtdowling |
@jamesls thanks for updating the proposal ! Am I right thinking that in the new proposal the notion of I think we need to make sure things like And in your example |
Scopes are still there. A
The
you'd get an error because
Yep that's why I like the idea of an explicit sigil for variable references, e.g.
Good catch, updated. |
It looks to me that we don't need to chain scopes anymore, at any point a flat binding structure would be enough.
You mean at compile time ? Because we can now validate at compile time a reference to a binding is valid or not ?
I guess
Right ? |
Keep in mind you can shadow variables from an outer scope. Here's a somewhat convoluted example that demonstrates the idea. In this example, suppose I'm calling this from some host language where the input data is called
Notice how every expression in the results list is
In theory yes. Several tools let you provide an initial scope as part of the evaluation (e.g. to seed in environment variables when evaluating jmespath from a CLI or in general to pull in data from the outside world), so you wouldn't be able to verify it at compile time, but you could validate it before evaluating the top level expression by collecting the set of free variables in the top level closure and verifying that the initial seed scope binds all the free variables. At any rate, I wouldn't want the spec to require that you fail at compile time for free variables to allow implementations to support this use case. I would like to have a minimum requirement that a runtime failure occurs for any references to variables that don't exist.
No. I haven't worked out the exact grammar rules, but to translate the pseudo-grammar I initially used into ABNF, roughly:
So there's no expref (that comes from the
|
Still, I don't quite get why a flat structure can make it:
Bindings should be treated as immutable, writing a new key in a binding should not modify it but return a new binding that will be used in all sub expressions. Current implementations usually look like
Ok so you want to allow referencing a binding that hasn't been previously declared ?
Got it, |
To illustrate the idea of bindings in a flat structure (map):
Every time a |
Ahh, got it, you're asking about implementation details, that's what I was missing. Thought we were still discussing whether or not there will be lexical scoping, which there will be with this proposal. The spec should be careful to avoid requiring a specific implementation, so libraries are free to implement it however they'd like provided all the compliance tests pass. The spec will probably just say "lexical scope" with an explanation similar to this that talks about variables in terms of their visibility/lifetime and avoid any talk of pushing/popping scope. The chained thing was just how I've implemented this type of thing in the past and a common way to implement it. Personally, I'd avoid taking an O(n) copy each time you enter a new scope. You also don't have to mutate anything, you could do At any rate, I think that's getting ahead of ourselves. Right now, I'd like to focus on whether or not this feature is useful from an end user's perspective and if there's demand for this. I'd rather start with the ideal syntax/semantics for users, and then modify if needed if it'll create implementation difficulties. |
I agree it's more on the implementation side and is out of scope in this discussion. To me this is an extremely useful feature, allowing to reference parent elements opens doors to more complex/interesting queries. I was a huge fan of the proposal until I started playing with it and discovered the confusing part of it. |
@jamesls thank you very much for weighing in! I really appreciate your new insight into this feature. I strongly think lexical scoping is needed although and I think we are all mostly convinced that what brings confusion when using the I must admit I was a bit confused by the inital look into your examples of a Following your discussion with @eddycharly, I’m coming to the conclusion that JEP-11 as it stands is mostly complete. It would only require a slight update that mandates using a new sigil like That said, I do not dislike this updated proposal, although maybe we should explore alternate tokens as well, as reserved keywords may feel a bit foreign if limited to this only usage. Scoped variable lookupsFor the record, your first example would be possible with an equivalent jep-11 expression, It would work as still exhibiting the undesirable fallback behaviour. So I will use an hypotheticallly updated syntax using the Given: {"foo": {"bar": "baz"}, "top": "top-value"} The following two expressions are equivalent:
Given: {
"foo": {"bar": "baz"},
"top": {"other": "other-value"},
"bar": {
"listval": [
{"a": 1, "z": 5},
{"a": 2, "z": 6},
{"a": 3, "z": 7}
],
"barscope": "innervar"
}
} Your proposed expression:
The jep-11 equivalent expression would be:
While the notion of lexical scope will not go away, it would be entirely controlled by the nesting of expressions. My only qualm with this syntax using reserved keywords is that the So maybe an alternate keywords would be more intuitive? What about
Or a new set of keywords like:
New tokens / syntaxIntroducing new keywords into the language seems a bit too drastic a change at first. But I am welcoming such a change. It could pave the way for more simplifications in the future. For instance, this would allow us to abandon the backtick JSON-literals which would be rendered useless, provided we:
Together with Exploring this proposal with new tokens, I would like to suggest the following epressions for the two examples that you have shown:
Assignment of scope would be done using I toyed with the idea of introducing The second example would look like so (and we really need parsers to keep track of line and column numbers to accurately report errors 😁):
|
I think the alternatives thus far far all have in common that referring to a scoped variable should be explicit rather than being the result of a fallback when evaluating an identifier to So as always alternatives fall into two categories, new syntax/tokens, vs new functions. The updated proposal here introduces new keywords as well which would introduce yet another layer of open potential for JMESPath in the future 🙂. Scope variable lookup
A new function would be cumbersome as the identifier needs to be specified as a string. However it would allow dynamically constructing the identifier to lookup from the scopes and open up new possibilities. Creating scope
As it stand, I liked introducing a scope in the |
Another approach that could work without modifying the current grammar:
Basically when inside the This would allow the creation of isolated scopes that do not inherit from the parent too. |
EDIT: With the design above I wonder if the scope chain makes sense, the first argument of |
Same thing without the notion of scope chains, just a single lexical scope:
The
|
@jamesls @springcomp WDYT ? |
I like the idea of dedicated syntax for this since I think it can make it more readable than frequent usage of “&”, and it doesn’t need us to make a let function special. We’d probably make it the only function that could introduce new scoped variables. I don’t think other functions have arguments that cause side effects for other function arguments either (each argument is isolated and based only on current node). I like the “$” syntax to access variables too as that removes ambiguity around whether a value is from current node or scoping. The discussion around new keywords is super interesting. Adding true/false/null keywords and non-backtick numbers would be nice. I’d be concerned about it breaking existing expressions though. Maybe this should be a different discussion than this JEP though. |
@mtdowling while brainstorming a potential design for 'reduce', we struggled to find a satisfying design. IMHO, a function that takes an object to specify the accumulated identifier and it's seed value is the most elegant solution. Also this would neatly complement the existing This would leverage the pattern introduced by A reduce does have a side effect on its second expression argument. In that case
Of course, this topic is a separate discussion. However we found that cross-pollination of ideas make the overall design more consistent. |
It's both for me. We are introducing new semantics into the language with scoped variable lookups. We're making this explicit now with the
The original motivation for this proposal was to borrow from functional programming language terminology that often use To me, I read these expressions as "let these variables equal these values in the following expression". While I can understand there may be confusion with this naming, I don't think it's a strong enough motivator to warrant breaking from existing conventions. I think there's value in reusing terminology that will be familiar to (at least some subset of) users.
These features require some notion of creating anonymous functions with arguments. The A syntax of While it would be possible to define let in terms of anonymous functions, I think it is a common enough occurrence that it deserves its own syntax (as most languages do). For example, making up a syntax of
Or translated to typescript as an IIFE: (({x, y}) => ([x, y]))(
{x: "foo", y: "bar"}
);
// returns ['foo', 'bar'] At which point, you don't need a
I see the value in leaving this proposal unchanged for historical reasons. The next step for me is to create a new JEP with the updated proposal that obsoletes this one. We've seen enough interest in this feature that it makes sense to move forward with an updated formalized proposal. |
I kinda (lazily) liked the Should we prototype this first ? |
@jamesls I’m currently prototyping this feature and have a couple of reservations. For instance it looks like, parsing the following expression should be valid and intuitive.
Instead, the author must remember all the keywords from the language, and make sure to use |
I'm not quite on the same page. This would rather be a means to specify the scope on the LHS of the lambda arrow and then immediately evaluate the RHS in the context - no pun intended - of the scope. Maybe better formalized like: expression /= let-expression
let-expression = multi-select-hash "=>" expression While this design is a matter of subjective taste, I fail to see how that could not map directly to the first and second arguments of the JEP-11 To be complete, we would have a new token type for explicit lookup of scoped identifiers: expression /= reference
reference = "$" identifier |
For the sake of advancing the discussion, I have tried to sum up all the main themes in a single post. |
Hey everyone, quick update. I've created a new proposal based on the discussions in various threads: jmespath/jmespath.jep#18 I know this proposal has sat untouched for quite some time and there's been a lot of feedback and time put into exploring numerous ideas and alternatives (myself and everyone in this thread included), so thanks to everyone that's contributed. In an effort to try and consolidate discussions, I'm going to close this issue and ask that we move discussions over to the jmespath.jep repo and the linked PR. I've also referenced this thread in the updated JEP so people still have a link to all the previous discussion. Going forward, I'll triage through the various repos in this org and move suggestions/ideas/proposals for new JMESPath features over to the issue tracker in the jmespath.jep repo. That way there's a single location to track changes to the language instead of being spread across various language implementation repos and the jmespath doc site. Thanks again for all the interest in this feature, really excited for this to be added to JMESPath! |
The JEP goes into detail about why we need this and how it works. I'll also link compliance tests and a sample implementation to this PR shortly.
cc @mtdowling