-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix the write-to-es feature #53821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix the write-to-es feature #53821
Conversation
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Big thanks for helping out!
Only some backward compact need to be fixed and the unit test for new ElasticsearchRemoteLogIO is required.
LGTM overall.
providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py
Outdated
Show resolved
Hide resolved
providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py
Show resolved
Hide resolved
providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py
Show resolved
Hide resolved
providers/elasticsearch/tests/unit/elasticsearch/log/test_es_task_handler.py
Show resolved
Hide resolved
| base_log_folder: Path = attrs.field(converter=Path) | ||
| delete_local_copy: bool | ||
|
|
||
| processors = () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be set up as a processor so that log messages get sent to ElasticSearch as soon as they are written, just once when the log is "closed". See CloudWatch logging handler for the only example we have right now.
That can be a future PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ashb I took a quick look at how we'd make this a processor in the future and I got stuck on the fact that while the upload() method takes the ti as an argument, the process function takes only the event_dict, and we need the ti to construct the log_id which needs to be on every log event that gets indexed to ES/OS.
Can we expect each log event from the task logger to contain the ti or otherwise give us the log_id components? (Or is that a necessary future change?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply. Here are some updates that might answer your question:
The LogTemplate DB Model will be removed from ES, which also means "we need the ti to construct the log_id which needs to be on every log event that gets indexed to ES/OS." will never need anymore. We will retrieve log_id_template directly from airflow.conf instead of fetching from LogTemplate DB Model.
Detail in https://lists.apache.org/thread/nlmhs1plo77qnlp7rqk27mkb2hs41f1p
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @potiuk about a recent Slack discussion we had, this is a prime example of why I'm sad we have separate providers for OpenSearch and ElasticSearch. What we have to do in one we should really do in both.
providers/elasticsearch/src/airflow/providers/elasticsearch/log/es_task_handler.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/config_templates/airflow_local_settings.py
Outdated
Show resolved
Hide resolved
…rite on RemoteIO class
|
@jason810496 @ashb I added unit test using |
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for the update!
It would be great to verify whether the TaskInstanceLog UI page displays correctly with this change. If so, this PR should be good to go. Thank you!
providers/elasticsearch/tests/unit/elasticsearch/log/test_es_task_handler.py
Outdated
Show resolved
Hide resolved
No worries I think I found it. Let me expand the testing logic |
Sorry for the late reply. The following command will be helpful to setup Airflow with ElasticSearch in Breeze. breeze start-airflow --python 3.10 --backend postgres --integration elasticsearch --mount-sources providers-and-tests --use-airflow-version <version>It would be great to test with the following version matrix for testing compatibility.
Thanks a lot! |
jason810496
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! The integration tests with test container LGTM.
It would be nice to test the change by start-airflow command with different Airflow core versions mentioned above to ensure the user behavior. Thanks!
|
@jason810496 Thanks. I can confirm that the write-to-es feature works for version @jhgoebbert Sorry for my late reply ! I've adopted your suggestion to capture the |
|
@Owen-CH-Leung can you rebase and resolve conflicts? |
| elasticsearch: | ||
| json_format: 'True' | ||
| log_id_template: "{dag_id}_{task_id}_{execution_date}_{try_number}" | ||
| log_id_template: "{dag_id}-{task_id}-{run_id}-{map_index}-{try_number}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a breaking change for the chart?
cc @jedcunningham
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we do not have execution date anymore.
So maybe we need to make it based on AF2/3 as string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can someone refresh my memory - does this config item control writing, reading, or both?
IOW is there a risk here that if the chart default changes, or changes based on 2.x vs 3.x, that valid logs written from Airflow 2.x tasks won't be readable when a stack goes to Airflow 3.x?
(I think I recall that the task run table has a column for its log_id template, so the correct log_id can always be reconstructed for any task run...however I also remember somebody mentioning this column might be removed? I just don't remember exact details here.)
Anyway earlier logs vanishing in the UI because of changing chart defaults would be an uncomfortable surprise 😅
Rebased. There're errors at main though for |
|
@Owen-CH-Leung et al, any reason we can't get this rebased and ready for merge soon? My organization (and many others I expect) is still blocked from going to Airflow 3 since last summer because of ES logging, this PR should unblock us so we are very excited to see it release 🤗 |
fixes: #50349
Fixes: #51456
ElasticsearchTaskHandlerhas a feature that allows task log to be directly written to elasticsearch. Such feature is broken in airflow 3.The root cause is that in Airflow 3, the write path for remote logging has changed. There's a detailed description here by @jason810496 which illustrates how it works in airflow 3. ( Thank you Jason ! )
In summary, the solution is to add
RemoteLogIOfor Elasticsearch also. Whenwrite_to_esis set to true, airflow will initiate aElasticsearchRemoteLogIOwhich will handle the task log writing to elasticsearch