Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks Provider _get_databricks_task_id only cleanses task id #44250

Open
2 tasks done
mwoods-familiaris opened this issue Nov 21, 2024 · 2 comments · May be fixed by #43106
Open
2 tasks done

Databricks Provider _get_databricks_task_id only cleanses task id #44250

mwoods-familiaris opened this issue Nov 21, 2024 · 2 comments · May be fixed by #43106
Assignees

Comments

@mwoods-familiaris
Copy link

Apache Airflow Provider(s)

databricks

Versions of Apache Airflow Providers

apache-airflow-providers-databricks==6.13.*

Apache Airflow version

2.10.2

Operating System

Debian GNU/Linux 12 (bookworm)

Deployment

Astronomer

Deployment details

No response

What happened

_get_databricks_task_id only cleanses the task id, ref:

return f"{task.dag_id}__{task.task_id.replace('.', '__')}"

task_id = f"{self.dag_id}__{task_id.replace('.', '__')}"

However, the dag_id may also contain . - so the replacement of . with __ should be applied to the whole string, not just the task id portion, else periods placed in the dag name results in errors such as:

[2024-11-21, 13:12:42 GMT] {taskinstance.py:3310} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable
    return ExecutionCallableRunner(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 406, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py", line 1252, in execute
    self.monitor_databricks_job()
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py", line 1203, in monitor_databricks_job
    current_task_run_id = self._get_current_databricks_task()["run_id"]
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py", line 1165, in _get_current_databricks_task
    return {task["task_key"]: task for task in sorted_task_runs}[
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'my.airflow.dag.with.periods__my_airflow_task'

(as the invalid chars are getting silently stripped by databricks, so the task key on the databricks side is myairflowdagwithperiods__my_airflow_task rather than my.airflow.dag.with.periods__my_airflow_task)

What you think should happen instead

The replacement of . with __ should be applied to the whole task key / run name string, not just the task id portion

How to reproduce

Use the affected operator(s) e.g. DatabricksNotebookOperator on a DAG which contains . in the dag_id

Anything else

Every time

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@mwoods-familiaris mwoods-familiaris added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Nov 21, 2024
Copy link

boring-cyborg bot commented Nov 21, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@rawwar
Copy link
Collaborator

rawwar commented Nov 22, 2024

PR: #43106 will also fix this issue

@rawwar rawwar self-assigned this Nov 22, 2024
@rawwar rawwar removed the needs-triage label for new issues that we didn't triage yet label Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants