-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How should equality between frames behave? #2141
Comments
+1 for option 2, my guess is that you're right & that in production ehrQL this usage would be a thinko. Heading off on a tangent about option 1: Another tangent - this ticket is about what should we do for the syntax (discussion of tangents could move to Slack rather than this ticket!) |
My point was that, unless I misunderstood the example, they aren't equivalent. The expression:
Is equivalent to:
Which it has to be if it's going to act as the inverse of
More generally though you're right that will be semantically equivalent queries (e.g. On comparing the results of frames in the sandbox, I wonder if we could override the
|
ahhhhhhhh I hadn't spotted that. In which case the idea of comparing the results of frames in the sandbox could be more misleading than it is helpful. Scratch all that! |
This is fairly niche, but came up when some experienced Python programmers were first learning ehrQL. They wondered whether two frame operations produced the same thing and so tried writing
frame_a == frame_b
.At present, this just uses the default Python object identity behaviour. There are two reasonable alternatives to this:
Compare the related query model objects (i.e. the
_qm_node
attributes). This is the semantically correct thing to do: query model nodes are value objects and will compare equal iff they are structurally identical.Blow up with an error: ehrQL series objects have special equality behaviour where the result is a new series. Comparing a series with a frame already triggers a (hopefully) explanatory error. Comparing two frames is more likely to be a typo/thinko by an novice ehrQList than it is to be the investigation of a curious Pythonista, so raising an error is the more helpful thing to do.
I think I'm leaning in favour of option 2. Adding guardrails for novices is particularly important in our context. And experienced Pythonistas will be better placed to cope with the unexpected error.
I'd also add that, in the specific context in which this came up, the users were confused by the unexpected
False
result. But that result was actually correct in the sense that the two expression they assumed were semantically equivalent were not actually equivalent. So it's not clear that doing the semantically correct thing would actually have been helpful in any case.Slack thread:
https://bennettoxford.slack.com/archives/C069YDR4NCA/p1727966431279409?thread_ts=1727963482.698439&cid=C069YDR4NCA
The text was updated successfully, but these errors were encountered: