-
Notifications
You must be signed in to change notification settings - Fork 16.3k
fix: resolve 404 log error for non-latest task tries in multi-host worker environments #50175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve 404 log error for non-latest task tries in multi-host worker environments #50175
Conversation
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch! Thanks for the PR, but I think we can fix on API side instead of file_task_handler side.
c0dc269 to
2d209f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR! Looks good. It would be great to include a test case in the unit tests where ti is not None but try_number is different than the ti.try_number to test the additional condition
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Looks good to me, but it seems like there is side effect for mocking session in the test, which leads to dead lock for SQLite.
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice looks good overall just a few tests to fix.
airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_log.py
Outdated
Show resolved
Hide resolved
458c374 to
c325ee1
Compare
bugraoz93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the tests are still failing. Could you please check them?
airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py
Outdated
Show resolved
Hide resolved
|
A more general thing about routes and services: There are similar cases where we have methods inside route files. We already have some level of distinction between them. How about moving them to services? What do you think? |
Looks good to me to move the query helper function to the Just to double-check, do you mean to move the Since this helper simply wraps a query as a function, and the existing service classes (e.g., for connections, pools, variables) are designed for bulk operations, creating a new class might not be necessary here. |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits regarding discussion, but LGTM.
airflow-core/src/airflow/api_fastapi/core_api/routes/public/log.py
Outdated
Show resolved
Hide resolved
Yes, I agree on the second look, Yes, it can be on the root level. For sure, it depends on the use case if we look at it in general. We aren't doing full object-oriented programming in general, it is mixed with Functional mostly due to the nature of Python. Unless we use service classes as a singleton across all routes, it can only add overhead of creating the object rather than calling the method. I don't have a strong opinion on that since the entire stack is using a mixed approach and makes sense in most cases :) |
|
I will take a look moving other methods from routes to proper places soon |
4dbeea1 to
dbdeda7
Compare
508d0ea to
75f2296
Compare
…rker environments
…om `get_log` endpoint function
Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
75f2296 to
9d338aa
Compare
|
The test seems flaky, update with latest main. |
Finally, all tests have passed! Thanks @jason810496. I really appreciate your support. |
Backport failed to create: v3-0-test. View the failure log Run details
You can attempt to backport this manually by running: cherry_picker 0b0ff5d v3-0-testThis should apply the commit to the v3-0-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continue |
…rker environments (apache#50175) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error apache#50763 Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> --------- Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> (cherry picked from commit 0b0ff5d)
|
Manual packport to |
…rker environments (apache#50175) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error apache#50763 Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> --------- Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> (cherry picked from commit 0b0ff5d)
…rker environments (#50175) (#50833) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error #50763 --------- (cherry picked from commit 0b0ff5d) Co-authored-by: oboki <oboki@kakao.com>
…rker environments (apache#50175) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error apache#50763 Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> --------- Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
…-host worker environments (apache#50175)" This reverts commit 0b0ff5d.
|
Reverting here #51135 |
…rker environments (#50175) (#50833) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error #50763 --------- (cherry picked from commit 0b0ff5d) Co-authored-by: oboki <oboki@kakao.com>
…rker environments (apache#50175) * fix: resolve 404 log error for non-latest task tries in multi-host worker environments * refactor: extract TaskInstance and TaskInstanceHistory query logic from `get_log` endpoint function * test: add unit test for `get_task_instance_or_history_for_try_number` function * fix: resolve sqlite lock error apache#50763 Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> --------- Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
…-host worker environments (apache#50175)" (apache#51135) This reverts commit 0b0ff5d.
…-host worker environments (apache#50175)" (apache#51135) This reverts commit 0b0ff5d.

In environments with multiple workers (e.g.,
CeleryExecutor), task logs for previous tries (i.e., not the latesttry_number) may fail to load with the following error:As shown in the screenshots below,
attempt=5was actually executed onworker-1, but the Web UI incorrectly tries to fetch the logs fromworker-2:This happens because
_get_log_retrieval_urlgenerates the log URL based onTaskInstance.hostname, which only stores the hostname of the latest execution attempt. It does not keep track of the history of previous tries.To fix this, I updated the logic to use the
TaskInstanceHistorymodel, just as the "Details" tab does, so the correct hostname is used for each specifictry_number.With this change, logs for previous attempts load correctly as expected.