-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve compatibility with mssql #9973
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #9926 I would like us to improve the compatibility but won't like to add more test to our existing already large CI runs.
@potiuk is going to carry out a survey and then start the discussion on mailing list about it. Just a couple of days back, the large number of running test on v1-10-test causes a long-waiting queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops I see you did not add tests, just breeze configs which allows you to run tests on your machine. I am completely fine with that 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. But I think we should add somewhere (in README.md) next to supported versions, that mssql support is experimental and that we are evaluating it.
doesn't new pushes to the branch cancel previous workflow runs? - looks like the CI workflows are running to completion for every push I made to this branch and the previous ones are not getting cancelled out. Is it something to do with the |
They should cancel it. Let me see. |
Ah... It seems that PRs run from the fork do not have access to cancel the already running PRs :( (see the last error message)... Hmm that makes the whole cancel approach we have only work for master merges and direct pushes to v1-10-test branch :(. Maybe we will have to figure out something else. I guess you cannot cancel running PR of yours @aneesh-joseph ? Only committers can do i t I believe?
|
oops, and yes don't seem to have access to cancel runs from my PR |
That's the root of the problem - the running PRs have only read-access token to |
That issue has caused all sorts of issues with Github Action in general :( |
Yeah - but that's for a very good reason ... We do not want random fork's PR to make any changes to our repo before it gets merged. But I think eventually some of those permissions (in this case we need "cancel any PR from the same fork and only from the same fork" permission) will be added to GA. |
The permission should have been configurable I feel similar to how we did with the bot. The number of people asking for this feature has increased significantly - hopefully Github now updates that :) |
09dff0c
to
4ddaadb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I am wondering how much performance difference there is after changing the models
code regarding the sql alchemy queries you had to modify.
If that is not necessary to modify, can/should we revert it? If not this code should be in sql alchemy mssql?
The change on airflow/models/dag.py is necessary, as it's moving the rendering decision from a hard-coded Postgres syntax The changes on airflow/models/dagcode.py and airflow/models/serialized_dag.py look like they could incur a performance penalty as the query optimizer is not aware that only the first result will be read. Does SQL Server not support the original exists query? |
it does support exists, but doesn't support EXISTS expression in the columns clause of a SELECT. Failing airflow tests with mssql and exists checks - https://github.com/aneesh-joseph/airflow-tests/pull/15/checks?check_run_id=918457634 so I had to do the change in this PR or will have to change it to something like:
|
This looks like a better solution than the one with .first(), but it still finds reads through all of the matching values.
Maybe scalar already does that though? Update: first() does add limits though, so the version in the PR is already close to optimal. it does however deserialize the entire object, so could be improved with something like
|
b4bfc73
to
23b6831
Compare
@jarkkorantala yes, but it breaks on My SQL - sqlalchemy/sqlalchemy#5481
I have updated the PR so that it queries for literal(True) instead of querying the complete object. These checks are now getting translated into the below queries MySQL
MS SQL
SQLite
Postgres
Before this PR, they were getting translated into MySQL
MS SQL(wrong query)
SQLite
Postgres
|
25b5ace
to
4eb8032
Compare
3ba9344
to
cfb162e
Compare
@potiuk thank you, that sorted it.. btw does the master build run on self hosted runners which have more memory.. should we enable mssql integration tests for master builds? |
It does and it is already sorted out :). the change I added is in the branch that is only executed when memory on the instance is less than < 32 G or so - this means that integration tests will run on master in MSSQL as well :).
|
All points addressed and seems we are finally ready to merge that one !
🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 |
awesome, thank you Jarek 👍 |
Thanks @aneesh-joseph this is great, it took lot of time and pings but thank you for your patience |
@@ -897,7 +898,9 @@ def get_num_active_runs(self, external_trigger=None, session=None): | |||
) | |||
|
|||
if external_trigger is not None: | |||
query = query.filter(DagRun.external_trigger == external_trigger) | |||
query = query.filter( | |||
DagRun.external_trigger == (expression.true() if external_trigger else expression.false()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? What is the type of external_trigger
here to require this workaround?
[(dm.dag_id, dm.next_dagrun) for dm in dag_models] | ||
) | ||
|
||
active_dagruns = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a join here or CTE rather than making multiple queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. Feel free to give it a shot if you like :)
./breeze --backend=mssql
)^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.