Skip to content

Conversation

@wjddn279
Copy link
Contributor

Description

Hello,

While testing Airflow 3.0.6 on Kubernetes, I observed that the dag-processor keeps restarting.
Upon checking the pod logs, I found the following error:

sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (1038, 'Out of sort memory, consider increasing server sort buffer size')
[SQL: SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id
FROM serialized_dag
WHERE serialized_dag.dag_id = %s ORDER BY serialized_dag.created_at DESC
 LIMIT %s]
[parameters: ('my_dag', 1)]
(Background on this error at: https://sqlalche.me/e/14/e3q8)

After investigating the root cause, I found that some rows in serialized_dag contained data values exceeding 1MB
image

Since MySQL’s default sort_buffer_size is 256KB, any row where serialized_dag.data exceeds this size cannot fit into the buffer. As a result, the query fails with the “Out of sort memory” error and it makes pod restart.

There are three ways to solve this problem.

  • User adjusts sort buffer size (increase)
  • Add index to created_at
  • Modify Queries

No. 1 requires the user to change arbitrarily, and it is difficult to figure out what side effects No. 2 will have on the system. Therefore, we propose to change the existing query in the following way.

SELECT serialized_dag.data, serialized_dag.data_compressed, serialized_dag.id, serialized_dag.dag_id, serialized_dag.created_at, serialized_dag.last_updated, serialized_dag.dag_hash, serialized_dag.dag_version_id
FROM serialized_dag
WHERE serialized_dag.id = (
    SELECT serialized_dag.id
    FROM serialized_dag
    WHERE serialized_dag.dag_id = 'my_dag' 
    ORDER BY serialized_dag.created_at DESC
    LIMIT 1
) 

The corresponding pr is the pr that contains the change point.
Since the modified query produces the same result as the original one, no additional test cases or changes to existing tests are required.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg
Copy link

boring-cyborg bot commented Sep 12, 2025

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@potiuk potiuk added this to the Airflow 3.1.0 milestone Sep 12, 2025
@wjddn279 wjddn279 force-pushed the fix-latest_item_select_object-query branch from ca8869b to 7aea8e0 Compare September 13, 2025 02:52
@wjddn279
Copy link
Contributor Author

@potiuk
thanks for review!
Do I need the approval of another reviewer to merge this pr?

@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Sep 14, 2025
@eladkal eladkal requested a review from kaxil September 14, 2025 02:58
@kaxil kaxil modified the milestones: Airflow 3.1.0, Airflow 3.1.1 Sep 18, 2025
@wjddn279
Copy link
Contributor Author

@ashb @kaxil

Hello! Is any more review needed to merge this pr?

@kaxil
Copy link
Member

kaxil commented Oct 21, 2025

Could you please rebase and fix the static checks too please. And it would be trivial for MySQL specific query:

@classmethod
def latest_item_select_object(cls, dag_id):
    from airflow.settings import engine
    
    if engine.dialect.name == 'mysql':
        # Prevent "Out of sort memory" caused by large values in cls.data column for MySQL. Details in https://github.com/apache/airflow/pull/55589
        latest_item_id = select(cls.id).where(cls.dag_id == dag_id).order_by(cls.created_at.desc()).limit(1).scalar_subquery()
        return select(cls).where(cls.id == latest_item_id)
    else:
        return select(cls).where(cls.dag_id == dag_id).order_by(cls.created_at.desc()).limit(1)

or use session.bind.dialect.name in ["sqlite", "mysql"] and pass session to latest_item_select_object: def latest_item_select_object(cls, dag_id, session): similar to

if session.bind.dialect.name in ["sqlite", "mysql"]:

Apologies for the delay in review @wjddn279

@kaxil kaxil modified the milestones: Airflow 3.1.1, Airflow 3.1.2 Oct 21, 2025
@wjddn279 wjddn279 force-pushed the fix-latest_item_select_object-query branch from 7aea8e0 to 0454907 Compare October 22, 2025 00:35
@wjddn279
Copy link
Contributor Author

@kaxil
Thanks for review! I add the logic checking whether the engine has MYSQL dialect as you said.
Please check once again :)

@potiuk potiuk modified the milestones: Airflow 3.1.2, Airflow 3.1.1 Oct 22, 2025
@potiuk potiuk added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Oct 22, 2025
@potiuk potiuk merged commit 757db27 into apache:main Oct 22, 2025
59 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 22, 2025

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

github-actions bot pushed a commit that referenced this pull request Oct 22, 2025
… of sort memory" error (#55589)

* fix get latest serialized_dag model query

* fix get latest serialized_dag model query

* add db type check logic
(cherry picked from commit 757db27)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
@github-actions
Copy link

Backport successfully created: v3-1-test

Status Branch Result
v3-1-test PR Link

@kaxil kaxil modified the milestones: Airflow 3.1.1, Airflow 3.1.2 Oct 22, 2025
kaxil pushed a commit that referenced this pull request Oct 23, 2025
… of sort memory" error (#55589) (#57042)

* [v3-1-test] Fix Outlet Event Extra Data is Empty in Task Instance Success Listener (#54568) (#57031)

Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>

* [v3-1-test] fix get latest serialized_dag model query to prevent "Out of sort memory" error (#55589)

* fix get latest serialized_dag model query

* fix get latest serialized_dag model query

* add db type check logic
(cherry picked from commit 757db27)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>
Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
kaxil pushed a commit that referenced this pull request Oct 23, 2025
… of sort memory" error (#55589) (#57042)

* [v3-1-test] Fix Outlet Event Extra Data is Empty in Task Instance Success Listener (#54568) (#57031)

Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>

* [v3-1-test] fix get latest serialized_dag model query to prevent "Out of sort memory" error (#55589)

* fix get latest serialized_dag model query

* fix get latest serialized_dag model query

* add db type check logic
(cherry picked from commit 757db27)

Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Kevin Yang <85313829+sjyangkevin@users.noreply.github.com>
Co-authored-by: Jeongwoo Do <48639483+wjddn279@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:serialization backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants