-
Notifications
You must be signed in to change notification settings - Fork 14.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Difference between logs which stores in files and logs in UI #39686
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
This happen due to log deduplication, which might happen when logs streaming from remote logging airflow/airflow/utils/log/file_task_handler.py Lines 130 to 140 in 3938f71
|
As I understood, the main problem is in log.splitlines(), which split log-string by simple lines and not by log-messages. Then function analyzes line by line and deduplicates lines, but we need to analyze and deduplicate log-messages. |
If you have a suggestion how improve logging feel free to raise a PR which will work with any type of existed loggers without breaking changes. |
I had the same multiple times, for example using DockerOperator which logs all stdout of the container upon failure. Also the logs are messed up not only because of the split lines but also because the file log handler per default tries to sort messages. This not only causes a lot of overhead on the server, it also changes the order any makes a confusion. Looking forward that somebody raises a PR allowing log sorting and merging to be turned off :-) |
Apache Airflow version
2.9.1
If "Other Airflow 2 version" selected, which one?
No response
What happened?
We print to log some information and saw that logs has difference between logs which stores in files and logs in UI (also downloaded logs). Difference is in "empty lines" (we need them in some cases).
stored in file -- real log

downloaded from UI
Web UI:
What you think should happen instead?
I think that they should be equal in information that was loggen and saved.
How to reproduce
Operating System
debian / docker
Versions of Apache Airflow Providers
No response
Deployment
Docker-Compose
Deployment details
standard apache/airflow:2.9.1
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: