Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POC / WIP testing out using logfmt for more structured logging #27787

Closed
wants to merge 2 commits into from

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Nov 18, 2022

Just starting the exploration process. I think structured logging could unlock a lot of value. CC @ashb @potiuk @jedcunningham @uranusjr

Here I'm solely changing the formatter class.

It's possible we could use more elaborate solution like https://structlog.org/

There's also the question of whether to use logfmt key-value style or json.

Sample after checking out this PR.

I explored a few different things...

First, this is what happens to task logs when we just enable the logfmt formatter and don't otherwise change the code:

at=INFO when=2022-11-18T15:09:52.696-0800 msg="Dependencies all met for <TaskInstance: example_bash_operator.also_run_this abc123-3 [queued]>"
at=INFO when=2022-11-18T15:09:52.714-0800 msg="Dependencies all met for <TaskInstance: example_bash_operator.also_run_this abc123-3 [queued]>"
at=INFO when=2022-11-18T15:09:52.715-0800 msg=\n--------------------------------------------------------------------------------
at=INFO when=2022-11-18T15:09:52.715-0800 msg="Starting attempt 1 of 1"
at=INFO when=2022-11-18T15:09:52.716-0800 msg=\n--------------------------------------------------------------------------------
at=INFO when=2022-11-18T15:09:52.741-0800 msg="Executing <Task(BashOperator): also_run_this> on 2022-11-18 23:06:50+00:00"
at=INFO when=2022-11-18T15:09:52.745-0800 msg="Started process 90021 to run task"
at=INFO when=2022-11-18T15:09:52.752-0800 msg="Running: ['airflow', 'tasks', 'run', 'example_bash_operator', 'also_run_this', 'abc123-3', '--job-id', '38', '--raw', '--subdir', '/Users/dstandish/code/airflow/airflow/example_dags/example_bash_operator.py', '--cfg-path', '/var/folders/7_/1xx0hqcs3txd7kqt0ngfdjth0000gn/T/tmp24s66bc9']"
at=INFO when=2022-11-18T15:09:52.755-0800 msg="Job 38: Subtask also_run_this"
at=INFO when=2022-11-18T15:09:52.857-0800 msg="Running <TaskInstance: example_bash_operator.also_run_this abc123-3 [running]> on host Daniels-MacBook-Pro-2.local"
at=INFO when=2022-11-18T15:09:52.955-0800 msg="Exporting the following env vars:\nAIRFLOW_CTX_DAG_OWNER=airflow\nAIRFLOW_CTX_DAG_ID=example_bash_operator\nAIRFLOW_CTX_TASK_ID=also_run_this\nAIRFLOW_CTX_EXECUTION_DATE=2022-11-18T23:06:50+00:00\nAIRFLOW_CTX_TRY_NUMBER=1\nAIRFLOW_CTX_DAG_RUN_ID=abc123-3"
at=INFO when=2022-11-18T15:09:52.957-0800 msg="Tmp dir root location: \n /var/folders/7_/1xx0hqcs3txd7kqt0ngfdjth0000gn/T"
at=INFO when=2022-11-18T15:09:52.958-0800 msg="Running command: ['/usr/local/bin/bash', '-c', 'echo \"ti_key=example_bash_operator__also_run_this__20221118\"']"
at=INFO when=2022-11-18T15:09:52.966-0800 msg=Output:
at=INFO when=2022-11-18T15:09:52.974-0800 msg="ti_key=example_bash_operator__also_run_this__20221118"
at=INFO when=2022-11-18T15:09:52.975-0800 msg="Command exited with return code 0"
at=INFO when=2022-11-18T15:09:53.014-0800 msg="Marking task as SUCCESS. dag_id=example_bash_operator, task_id=also_run_this, execution_date=20221118T230650, start_date=20221118T230952, end_date=20221118T230953"
at=INFO when=2022-11-18T15:09:53.044-0800 msg="Task exited with return code 0"
at=INFO when=2022-11-18T15:09:53.073-0800 msg="0 downstream tasks scheduled from follow-on schedule check"

But we can improve this a little bit by changing the log calls to be a little more structured. See commit #2 to see what changed. Here's the resulting output.

at=INFO when=2022-11-18T16:17:27.726-0800 event="dependencies all met" task="<TaskInstance: example_bash_operator.also_run_this abc123-5 [queued]>"
at=INFO when=2022-11-18T16:17:27.748-0800 event="dependencies all met" task="<TaskInstance: example_bash_operator.also_run_this abc123-5 [queued]>"
at=INFO when=2022-11-18T16:17:27.749-0800 event="starting attempt" try_number=1 max_tries=1
at=INFO when=2022-11-18T16:17:27.775-0800 event="proceeding with task execution" task="<Task(BashOperator): also_run_this>" execution_date="2022-11-19 00:16:56+00:00"
at=INFO when=2022-11-18T16:17:27.779-0800 msg="Started process 97845 to run task"
at=INFO when=2022-11-18T16:17:27.786-0800 event="running task" command="['airflow', 'tasks', 'run', 'example_bash_operator', 'also_run_this', 'abc123-5', '--job-id', '56', '--raw', '--subdir', '/Users/dstandish/code/airflow/airflow/example_dags/example_bash_operator.py', '--cfg-path', '/var/folders/7_/1xx0hqcs3txd7kqt0ngfdjth0000gn/T/tmplustcnhv']" job=56 subtask=also_run_this
at=INFO when=2022-11-18T16:17:27.900-0800 event="running task" task_instance="<TaskInstance: example_bash_operator.also_run_this abc123-5 [running]>" hostname=Daniels-MacBook-Pro-2.local
at=INFO when=2022-11-18T16:17:27.999-0800 exporting_env_vars="{'AIRFLOW_CTX_DAG_OWNER': 'airflow', 'AIRFLOW_CTX_DAG_ID': 'example_bash_operator', 'AIRFLOW_CTX_TASK_ID': 'also_run_this', 'AIRFLOW_CTX_EXECUTION_DATE': '2022-11-19T00:16:56+00:00', 'AIRFLOW_CTX_TRY_NUMBER': '1', 'AIRFLOW_CTX_DAG_RUN_ID': 'abc123-5'}"
at=INFO when=2022-11-18T16:17:28.001-0800 temp_dir_root=/var/folders/7_/1xx0hqcs3txd7kqt0ngfdjth0000gn/T
at=INFO when=2022-11-18T16:17:28.003-0800 running_command="['/usr/local/bin/bash', '-c', 'echo \"ti_key=example_bash_operator__also_run_this__20221119\"']"
at=INFO when=2022-11-18T16:17:28.019-0800 process_output="ti_key=example_bash_operator__also_run_this__20221119"
at=INFO when=2022-11-18T16:17:28.021-0800 event="command exited" return_code=0
at=INFO when=2022-11-18T16:17:28.067-0800 state=SUCCESS execution_date=20221119T001656 start_date=20221119T001727 end_date=20221119T001728 dag_id=example_bash_operator task_id=also_run_this
at=INFO when=2022-11-18T16:17:28.120-0800 event="task exited" return_code=0
at=INFO when=2022-11-18T16:17:28.149-0800 event="follow-on schedule check" downstream_tasks_scheduled=0

Notice that things like try_number and return_code and such are represented as key-value pairs instead of english. I think this is as-or-more readable.

Note also that we can split things out like "event" which could be short description of what happened, then additional details are presented in key-values.

@dstandish
Copy link
Contributor Author

I suppose another thing we could consider is, keep the existing format for the first part of the line but then just always make the rest structured.

i.e. keep prefix [2022-10-02T07:41:59.425-0700] {taskinstance.py:1357} INFO - but then, over time, make it so everything that comes after is structured.

@potiuk
Copy link
Member

potiuk commented Nov 25, 2022

I think w do not have anything else to do but simply implement Open Telemetry.

It's already been discussed, and approved. We just need to implement it https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow

In the form defined and researched by @howardyoo - this is the best way we can implement "smarts" in the telemetry of Airflow.

@dstandish
Copy link
Contributor Author

I think w do not have anything else to do but simply implement Open Telemetry.

It's already been discussed, and approved. We just need to implement it https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow

In the form defined and researched by @howardyoo - this is the best way we can implement "smarts" in the telemetry of Airflow.

I looked at the AIP, but it looks like logging was deliberately excluded from it:

While Metrics seems to be ready to integrate within the scope of this AIP and Open-Telemetry integrations (no surprises foreseen), we believe Log integration requires more work and POC that should be accompanied with an approach to rewrite and simplify logging integration as Airflow logging is particularly vulnerable to disruption and it is integrated with Airflow UI in a subtle way. We propose to exclude Logging from the scope of the AIP (and implement it as a follow-up AIP).

@potiuk
Copy link
Member

potiuk commented Nov 28, 2022

I looked at the AIP, but it looks like logging was deliberately excluded from it:

Rather deferred until it's ready. The AIP was approved few months ago and it will take likely few months before someone starts working on it. In the meantime OpenTelemetry's logging has evolved. I guess you should see what's the status of logging there. The direction is clear.

@howardyoo
Copy link
Contributor

howardyoo commented Nov 28, 2022 via email

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 13, 2023
@github-actions github-actions bot closed this Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants