Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea for more PRQLish table creation grammar #2427

Closed
cottrell opened this issue Apr 14, 2023 · 18 comments · Fixed by #2577
Closed

Idea for more PRQLish table creation grammar #2427

cottrell opened this issue Apr 14, 2023 · 18 comments · Fixed by #2577
Labels
language-design Changes to PRQL-the-language
Milestone

Comments

@cottrell
Copy link

cottrell commented Apr 14, 2023

What's up?

Currently, I think the only way to create tables is

let TableA = (
   from data
   ...
)

But it might be useful for a number of reasons to allow some pipeline operator for creating tables like this:

from data
...
let TableA = _

The point would be to allow components (table definition) to avoid nesting structures that basically block the ability to simply use concat/append as the base operator for constructing pipelines.

So someone might define some vanilla PRQL pipeline and you could then take that and simply extend it more more pipeline logic OR turn it into a table simply by + "let TableA" which sounds like not a big deal but removes a lot of complexity that leaks out of the components when you need to keep track of the type of the query object in order to combine with other query objects.

UPDATE:

And to be clear, the intention is to enable the following

from data
...
let TableA = _

from data
...
join TableA [==something]
...
@aljazerzen
Copy link
Member

Woah, I really like this.

Having let above the pipeline breaks the whole "flow from top to bottom" mantra. I haven't noticed this chasm in PRQL philosophy, but now it really bothers me. And I've seen people struggle with this similar syntax in R+tidyverse, so what you propose should not only be added to the language, but considered ideomatic PRQL.

But as always, we need to find a suiting syntax. There are a few considerations:

  • let must be at the beginning of the line (as in your proposed syntax),
  • table name should probably follow the let (as in your syntax),
  • the whole thing should preferably be readable. Something like "from data select my columns and let that be named TableA".

@aljazerzen
Copy link
Member

The syntax you propose is pretty close to perfect given my considerations, I'd only replace _ with a keyword, maybe this?

from data
...
let TableA = this

@aljazerzen
Copy link
Member

aljazerzen commented Apr 14, 2023

Do we want to allow things like this:

from data
filter a == 'a'
let TableA = (_ | select [a, b])

If not, maybe remove the assigning part and stick a keyword in front of the table name?

from data
...
let into TableA

@aljazerzen aljazerzen added the language-design Changes to PRQL-the-language label Apr 14, 2023
@aljazerzen aljazerzen added this to the 0.8 milestone Apr 14, 2023
@eitsupi
Copy link
Member

eitsupi commented Apr 14, 2023

from foo | let into bar seems to be a clearer syntax for me.

@cottrell
Copy link
Author

cottrell commented Apr 14, 2023

Haven't thought too much about that one ... I think the main feature would be to enable "ending" a pipe chunk with a let to be able to start another one after.

And basically you get to this stuff if you are doing anything complicated with table creation and joins where currently you really start to feel like it's not clean PRQL anymore.

@cottrell
Copy link
Author

Do we want to allow things like this:

from data
filter a == 'a'
let TableA = (_ | select [a, b])

If not, maybe remove the assigning part and stick a keyword in front of the table name?

from data
...
let into TableA

In that example I'm not clear on why one would want the pipe filter like that ... I think it can be done with a vanilla select line above?

@max-sixty
Copy link
Member

I'm keen on this / some variant of it!

Re the "what term do we choose for let foo = X", this should probably be the same as what we use for #2129 (comment)

@max-sixty
Copy link
Member

Are there other languages that allow this postfix lvalue?

The one I can think of is a python repl, which allows something similar — @cottrell 's original example of foo = _

@aljazerzen
Copy link
Member

@cottrell

In that example I'm not clear on why one would want the pipe filter like that ... I think it can be done with a vanilla select line above?

I'm not saying that someone would want that - just that someone might try it someday. In let foo = _, the part after = looks like a normal expression and _ is just a special variable. So someone would expect my query to be equivalent to this:

from data
filter a == 'a'
select [a, b]
let TableA = _

That's why it may be wise to make it clear that this syntax is special and that part after = is not an actual expression. Just to prevent people from writing abominations like my query above.


@max-sixty

Re the "what term do we choose for let foo = X", this should probably be the same as what we use for #2129 (comment)

Ugh, I don't think this should be the same keyword. See this example:

from data
let main = this

Here, main means the main pipeline in the module and this means the result of the pipeline. If we'd use the same keyword, it may get confusing.


All in all, I prefer from foo | let into bar the most.

@aljazerzen
Copy link
Member

One more thing: we do agree that this should be allowed only in the top-level pipeline, right?

This should all be forbidden, right?

let a = (
   from data | let into b
)

func take_n n rel -> (rel | take n | let into b)

from data
group x (take 1 | let into b)

@max-sixty
Copy link
Member

Ugh, I don't think this should be the same keyword. See this example:

from data
let main = this

Here, main means the main pipeline in the module and this means the result of the pipeline. If we'd use the same keyword, it may get confusing.

Hmmm — but IIUC we're not writing let main = — that's the implicit part (though I asked for confirmation here).

I was thinking that main (or whatever we choose) is the implicit name of the final statement... If that's right, it makes sense to align them, if not then it doesn't.

@max-sixty
Copy link
Member

All in all, I prefer from foo | let into bar the most.

One awkward-but-minor thing about let into is that some of its behavior is completely different from let — it continues rather than starts a new pipeline.

In order to parse let, we need to confirm that it's not followed by into. And we might get an odd error message from a partially formed query:

from foo
derive bar = x + 2

let # EOI — should we raise an error with the whole query, or just this line?

@aljazerzen
Copy link
Member

its behavior is completely different from let — it continues rather than starts a new pipeline

Well that's the point, is it not?

In your example, I think it should be possible to produce an error "expected a name or into, got end of file".

I see your point of "let can sometimes be seen as the last function of preceding pipeline or a new declaration", but this will the a problem unless we completely change the keyword to something like:

from data
...
into TableA 

I dislike that a bit, because it makes it harder to spot where TableA is declared.

@max-sixty
Copy link
Member

Well that's the point, is it not?

🤣

What I was trying to say is form & function (or "aesthetics & semantics") should be correlated; these have very different semantics insofar as they delineate expressions, and so arguably should have more different aesthetics.

We don't have other cases where we have two-word-keyword (It's not like into is an argument here...). In some ways reusing let is orthogonal, but we're also introducing a new language concept of a two-word-keyword to express this, which isn't.

When I'm reading the text, my mind doesn't know "is this the culmination of this expression or am I starting a new one?" That said, this point is very based in my experience, if others don't feel this then we shouldn't load on my experience alone.


into TableA

Yes I was thinking something like this. Or even let_into would be clearer to me than let into.


Overall I'm open to either, with a slight weight on something other than let into. We can also easily change in the future.

@cottrell
Copy link
Author

I probably don't follow all the subtleties here but might it be easier to start, at least conceptually and as a way to try things out in some experimental mode, with something totally unique and new like

...
store_into ...  # create tables using existing stack
end_pipe ...  # clear stack

and then separately create aliases to these things that are convenient/nice/stylish? Convenience and style is quite difficult IMO.

@aljazerzen
Copy link
Member

we're also introducing a new language concept of a two-word-keyword to express this

You are right. We don't (yet) have things like pub async fn or public static void main() as some kind of qualifiers on declarations. And this is not a reason big enough to justify adding them.

I'm convinced, with into being my first choice now.


@cottrell I don't know what you mean, can you elaborate?

@cottrell
Copy link
Author

I just meant, if it is hard to decide on the how to incorporate the operations into the existing framework, it might be easier to simply create two new operations "store_into" and "end_pipe" for now just to see if there are any blockers.

I also like into as it seems explicit and AFAIK is a new keyword.

@snth
Copy link
Member

snth commented May 25, 2023

Great discussion and the final syntax you arrived at is the best of all the options you considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language-design Changes to PRQL-the-language
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants