-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambigious wildcard references #1193
Comments
Originally, I was for option 2, because it's convenient similar to how SQL operates. But I vote for option 1. This basically compiler saying "you have to provide more information" so it can create SQL queries that are not ambiguous. Recently I find that many of problem I need to fix in my professional life are caused because we allow certain expressions, commands or protocols to be ambiguous. In my experience it would actually save time if we would report that at compile-time. On scale from -2 to +2, I feel +1 for changing this behavior from current option 2 to option 1. |
Looking through the code changes in #1205, it appears to be mostly examples and tests. I like how the new examples look because it makes it much easier for the reader to see what source table a column comes from. The litmus test will be to see whether it will be annoying when writing queries. However, similarly to the experience with Rust, I suspect that after an initial adjustment period one will actually come to appreciate the concreteness this brings. We have quite a few changes with this 0.3.x line of releases and it looks like we'll plan to bring quite a few new user facing syntax features in so I think this is a good time to try this. It's a +1 from me (to go with option 1). |
+1 for option 1 from me as well! |
(FYI I have seen this and am reflecting! I was quite ardent on allowing the ambiguity, given how nice PRQL is for exploratory use as well as robust use. I do hear the points above and want to think through some cases; I will be "on" on Saturday, so will respond then at the latest) |
Broadly: I'm open to trying this and iterating on feedback. I'm still a -1, assuming we're taking averages rather than minimums Context PRQL has many benefits over SQL. Two of them are:
On this issue, those two priorities are in opposition. It is genuinely quite useful to be able to do from employees
join salaries [==id]
group [city] (
aggregate avg_salary = average salary
)
] ...and not have to add in the table names, which are kinda obvious (to the user, not the compiler...) given the tables: from employees
join salary [==id]
-group [city] (
- aggregate avg_salary = average salary
+group [employees.city] (
+ aggregate avg_salary = average salaries.salary
)
] While it's ironic to say "trust SQL", I do place some weight in there being no SQL engine that forces these — Alternatives I think that this could be a lint, similar to clippy, which people can then opt-into. That enables the query to run, but also enables folks to spot potential future issues (e.g. if a When we build the DB backend, we could also use that to provide hints (I think this was @mklopets 's point), though I'm not sure whether that means it's worth forcing it now ("the help is coming") or leaving it until later until the help arrives. I think rust does this quite well! Or the python typing ecosystem is getting good at this. Compiler simplicity I would place a decent amount of weight on keeping the compiler simple. I have not been involved enough here — a regret for me that I promise to fix — so I defer to @aljazerzen . Keeping the compiler simple will mean less work to build, less work to change, and easier for new contributors to join (the last of which is something we're not doing so well on...) So to the extent this makes the compiler simpler, great. To the extent it's just "convert to an s-string", then it's less of an issue. (And to the extent this means we need to deal with more cases because we can't fall back to ambiguity, then that would detract). Meta On a meta level, I worry a little bit that as the language builders, we're likely overweighing Robustness and underweighing Exploration. At the extreme, if folks are using PRQL as a just a target for other tools, then there's no real Exploration priority. Thanks and forgive the delay. I'm fully back online now. |
As I wrote, it seems to me that stricter name checking may even improve exploration, because it eliminates debugging banal bugs that stem from compiler not understating exactly what you mean.
This is true, but PRQL is able to operate with much less knowledge of schema compared to all RDBMS.
The change is minimal - about 10 lines of code total. So this is not the concern. Given all opinions expressed, I conclude to change the behavior to option 1. For anyone in the future: feel free to add your opinion - at least at time of writing, reverting or adding an option to revert this behavior is not hard to do compiler-wise. |
I'm opening this issue as a followup on a recent call and @snth's comment here.
First I suggest to read current name resolution algorithm.
This issue is about this paragraph:
In practice this means that such queries:
... are translated to:
Note that this happens only when both tables names still contain wildcards.
In this case,
first_name
is know to be in employees and possibility that it may be in salaries is not considered:In this case,
first_name
is known to be in both, so ambiguity error is raised:As @snth put it, there are two options:
With semantic branch, this option 2 became a little worse, because ambiguous idents are converted to s-strings earlier which means that anchoring does not know about them and may cause problems.
The text was updated successfully, but these errors were encountered: