-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DSIP-18][Remote Logging] Add support for writing task logs to remote storage #13017
Comments
Thank you for your feedback, we have received your issue, Please wait patiently for a reply.
|
Good idea! One humble suggestion that different log level contributes to failure diagnosis differently, level of |
Hi, @Radeity , thanks a lot for your comment and suggestion.
|
Hi, @rickchengx, i agree with you that we should keep consistency of local logs and remote logs. My central idea is to keep more useful logs after cleaning up, which can make DS different from Airflow. However, logs are widely used in practice for various maintenance activities, such as testing, failure diagnosis, and program comprehension, thus, my above suggestion to keep higher level log maybe immature. Anyway, thanks for your reply, hope my suggestion can give any inspiration! |
Good idea! |
Here are some discussions on the weekly meeting Q1: In the k8s environment, users can choose to mount persistent volumes (E.g., OSS) to synchronize task logs to remote storage.
Q2: Users can configure whether to use remote storage for task logs Q3: The master-server also has task logs, which need to be uploaded to remote storage in a unified manner. Q4: Is it possible to set the task log retention policy through the configuration supported by the remote storage itself? Thanks again for all the suggestions at the weekly meeting, please correct me if I'm wrong. |
This seems already finished. |
Search before asking
Description
Why remote logging?
Feature Design
Connect to different remote targets
DS can support a variety of common remote storage, and has strong scalability to support other types of remote storage
When to write logs to remote storage
Like airflow, DS writes the task logs to remote storage after the task completes (success or fail).
How to read logs
Since the task log is stored in both the worker's local and remote storage, when the
api-server
needs to read the log of a certain task instance, it needs to determine the reading strategy.Airflow first tries to read the logs stored remotely, and if it fails, reads the local logs. But I prefer to try to read the local log first, and then read the remote log if the local log file does not exist. Because this can reduce the consumption of network bandwidth.
We could discuss this further.
Log retention strategy
For example, the maximum capacity of remote storage can be set, and old logs can be deleted by rolling.
Sub-tasks
Ref
Any comments or suggestions are welcome.
Use case
Discussed above.
Related issues
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: