Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch served logs when no remote/executor logs available for non-running task try #39177

Merged
merged 1 commit into from
Apr 25, 2024

Conversation

kahlstrm
Copy link
Contributor

@kahlstrm kahlstrm commented Apr 22, 2024

This PR implements #32561 in a different way. This caused a regression for our use case, where non-running task try logs weren't shown in UI for running tasks. This is due to us storing the logs on the worker with a Persistent Volume.

Instead of never fetching the logs from the server for non-running task tries, try to fetch them if and only if there are no remote or executor logs available already.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kahlstrm kahlstrm requested a review from RNHTTR April 24, 2024 12:04
@kahlstrm kahlstrm force-pushed the main branch 3 times, most recently from 368bb21 to 3c32218 Compare April 24, 2024 17:04
@RNHTTR
Copy link
Contributor

RNHTTR commented Apr 24, 2024

I can't resolve my conversations for some reason... maybe because some git stuff as it's only showing one commit? Either way, my comments were addressed.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstandish ? Can you also take a look maybe?

@potiuk potiuk merged commit eca077b into apache:main Apr 25, 2024
42 checks passed
@eladkal eladkal added the type:bug-fix Changelog: Bug Fixes label Apr 25, 2024
jedcunningham pushed a commit that referenced this pull request Apr 26, 2024
@dstandish
Copy link
Contributor

@kahlstrm can you clarify what you mean here?

This PR implements #32561 in a different way. This caused a regression for our use case, where non-running task try logs weren't shown in UI for running tasks. This is due to us storing the logs on the worker with a Persistent Volume.

Specifically this part:

This is due to us storing the logs on the worker with a Persistent Volume

What does storing logs on the worker with a PV have to do with anything? If you're storing logs on a PV, shouldn't the webserver have access to it, so it can read the logs directly from the PV?

This PR definitely has introduced a bug, because now users cannot see served logs from triggerer while deferred. But I'm just not sure exactly what functionality here we need to preserve and implement in a different way.

@kahlstrm
Copy link
Contributor Author

kahlstrm commented Aug 5, 2024

@kahlstrm can you clarify what you mean here?

This PR implements #32561 in a different way. This caused a regression for our use case, where non-running task try logs weren't shown in UI for running tasks. This is due to us storing the logs on the worker with a Persistent Volume.

Specifically this part:

This is due to us storing the logs on the worker with a Persistent Volume

What does storing logs on the worker with a PV have to do with anything? If you're storing logs on a PV, shouldn't the webserver have access to it, so it can read the logs directly from the PV?

This PR definitely has introduced a bug, because now users cannot see served logs from triggerer while deferred. But I'm just not sure exactly what functionality here we need to preserve and implement in a different way.

I'm no longer working with this particular project, but the setup was a PV on the worker that was not mounted on the webserver. When it comes to the bug, I would guess this line change is the culprit for the behavior. The reasoning for that line was to enable fetching previous task instance attempt served logs when there are no remote logs available, but this then introduced the incorrect behavior for the deferred case.

Is TaskInstanceState.DEFERRED always the latest task task instance attempt? If yes, then changing the aforementioned line to the following would perhaps fix this:

if is_in_running_or_deferred and not executor_messages and (not remote_logs or ti.try_number == try_number):

This would efffectively make it act the same as prior to this commit but retain the logic of fetching served logs for previous attempts when no remote logs are available.

This increases the cognitive complexity and readabality a bit, and refactoring the boolean logic as well would be ideal.

dstandish added a commit to astronomer/airflow that referenced this pull request Aug 5, 2024
… logs

apache#39177 introduced a bug where, if the task was in deferred state, served logs would not be checked.
@dstandish
Copy link
Contributor

Thanks @kahlstrm, kind of you to follow up. What i'm going with now in #41272 is basically reverting everything -- the fix before yours (that introduced the bug you found), and your two fixes for the bugs introduced by that fix. I am just not sure it's worth the complexity just to avoid an edge case 403 error message that isn't of much consequence.
If someone has time to reintroduce a better approach to suppressing the 403 in that case (e.g. perhaps just suppress the 403) then they can. But for now, I just want to fix the inability to access logs while in deferred state.

@kahlstrm
Copy link
Contributor Author

kahlstrm commented Aug 6, 2024

Thanks @kahlstrm, kind of you to follow up. What i'm going with now in #41272 is basically reverting everything -- the fix before yours (that introduced the bug you found), and your two fixes for the bugs introduced by that fix. I am just not sure it's worth the complexity just to avoid an edge case 403 error message that isn't of much consequence. If someone has time to reintroduce a better approach to suppressing the 403 in that case (e.g. perhaps just suppress the 403) then they can. But for now, I just want to fix the inability to access logs while in deferred state.

Sounds good to me 👍 I agree with you on this, that adding this amount of logical complexity just to avoid a single request error is not worth it, but didn't myself want to revert the wanted behavior of #32561 immediately. As it now has turned out, having this amount of bugs/unwanted behavior come out of such change is not worth it IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants