-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notation for referencing previous pipeline steps/relations #2668
Comments
My mental model when writing a pipeline is the current relation. Reading your pathological code example, it would go like this:
With this model, it does not make sense to want columns from "two steps back", you always use only the last step. The fact that I was using the same name |
Agreed, that was a terrible example. Here's something from my working life which perhaps makes more sense: from fund_holdings
filter value_date >= @2023-01-01
group [value_date, fund_code] (
aggregate [
fund_market_value = sum market_value
]
)
join ^ (==value_date && ==fund_code)
derive [weight = market_value / fund_market_value] Here I changed the meaning of Of course that query could be written in a more literate and self-document style below but it would be cool to write the query above when working interactively. let fund_holdings_2023 = (
from fund_holdings
filter value_date >= @2023-01-01
)
let fund_market_values = (
from fund_holdings_2023
group [value_date, fund_code] (
aggregate [
fund_market_value = sum market_value
]
)
)
from fund_holdings_2023
join fund_market_values (==value_date && ==fund_code)
derive [weight = market_value / fund_market_value] |
What about this:
Quite ergonomic. |
Ah yes, very good! I forgot about the "new" One thing I disliked about my Only remaining question I have is whether from fund_holdings
filter value_date >= @2023-01-01
into fund_holdings_2023
group [value_date, fund_code] (
aggregate [
fund_market_value = sum market_value
]
)
join fund_holdings_2023 (==value_date && ==fund_code)
derive [weight = market_value / fund_market_value] I can also see reasons for why you might want to disallow that and want to force One compromise to avoid that could be a two pronged approach:
I guess another way to phrase that would be that WDYT? |
I like the
What's a good name? IIRC there's a pipeline lang that use (was not a fan of the (I also don't think it's high priority, given it's syntactic sugar for the |
When we added Also, it would look like |
Agreed and I tried to address that with the local vs global name separation. I will try to write out some more examples to get more of a feel for what it would be like (to be added later - no time right now). I wasn't thinking of introducing a new keyword, rather just overriding the semantics of However I think this could be quite a nice QOL feature in line with the Language Pragmatics Engineering article @max-sixty posted on Discord earlier. We're adding a lot of features at the moment that will make PRQL a solid language, I'm thinking of module system, types, etc ... . The current Another angle to consider, which just occurred to me when comparing @aljazerzen and my |
I would vote to put the
|
The discussion in #2655 gave me an idea for something that had been bothering me for a while but I didn't have a palatable solution/suggestion for.
To me the bigger problem in #2655 is that each step in the pipeline is a new relation and should really be explicitly referenceable. Power Query M Language takes the approach of explicitly requiring a name for each step. That is obviously quite verbose and not really desirable. Perhaps we could have a convenient syntax for this. My suggestion would be
^
like in git (HEAD^
). This seems quite intuitive to me because it points upward.So to expand on that, unqualified column references refer to columns in the current frame at that step in the pipeline. The
^
would allow you to reference parent frames and multiple^
reference n-many frames up the stack.Say you wanted to write pathological code like
this would allow you to do it. There are obviously better ways of writing this code but I seem to recall there being cases where this was desirable.
One case that would need special considerations is joins.
HEAD^2
in git doesn't actually meanHEAD^^
as I was suggesting above (git usesHEAD~2
for that) but ratherHEAD^2
in git means the second merge head in a merge. I think that would be equivalent to the right relation in a join for us - see What's the difference between HEAD^ and HEAD~ in Git? for more details. I actually wasn't aware of that until I researched it right now. In any case, git isn't really known for simplicity or great UX so if we were to adopt this we should probably come up with our own consistent design.The text was updated successfully, but these errors were encountered: