-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove select_column option in TaskInstance.get_task_instance #38571
Remove select_column option in TaskInstance.get_task_instance #38571
Conversation
I think there were still some tests failing @jedcunningham |
The 1 I looked at didn't seem obviously related, in the 10 seconds I looked at it. So I figured a fresh run wouldn't hurt. But maybe it is related :) |
Those test failure looked like abrupt failure of the docker engine in the middle of testing. I re-run just the tests and if it appears again then we have something interesting here. |
Yeah there was some weird error related to a test of sensor. I don’t think it’s really an issue probably something with the test but I was still working through it |
oh -- I thought you merged it @jedcunningham -- sorry i was confused 🙃 |
Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call. But the full story on this one is actually somewhat complicated. It was back in 2.2.0 in apache#25312 when we converted to query with the column attrs instead of the TI object (apache#28900 only refactored this logic into a function). The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr. Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object. This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option. Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.
a7b15cb
to
c9585c5
Compare
figured out what was going on here. it was a very confusing one. not obvious. basically, when switching it back to just query the TI, this had the effect of incrementing try_number every time ti.refresh_from_db is called (since try_number is bananas). This nonobviously caused failure in test of reschedule poke mode sensor because it could not find the right reschedule db obj so the start date was advancing when it shouldn't have. The reason was just try number shenanigans. But so then i found that that the real reason that we were query attrs directly (instead of orm obj) was to avoid locking dag_run! But we should be able to do that by simply lazy loading the attr..... So I added two changes (1) lazy load dag run attr in this func and (2) go back to setting try_number from private attr _try_number (which is what it was before the deadlock fix orig added). well that was a mounthful... |
I hope we’ll be able to clean up the try_number bs when we implement AIP-64. |
Fundamentally what's going on here is we need a TaskInstance object instead of a Row object when sending over the wire in RPC call. But the full story on this one is actually somewhat complicated.
It was back in 2.2.0 in #25312 when we converted to query with the column attrs instead of the TI object (#28900 only refactored this logic into a function). The reason was to avoid locking the dag_run table since TI newly had a dag_run relationship attr. Now, this causes a problem with AIP-44 because the RPC api does not know how to serialize a Row object.
This PR switches back to querying a TaskInstance object, but avoids locking dag_run by using lazy_load option. Meanwhile, since try_number is a horrible attribute (which gives you a different answer depending on the state), we have to switch it back to look at the underlying private attr instead of the public accesor.
Older description:
This was originally added in #28900 presumably for compatiblity with serialization. Maybe things have changed since then, because it's actually the Row object that does not serialize properly (and this is what's returned with this option True) meanwhile the TaskInstance object actually is serialized properly. This is the only usage of this param, and it's not needed here, so I'm just removing it. You could argue that it's public and can't be removed but I think it's pretty safe.