-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix support for macros with dots in DataProcJobBuilder #28970
Conversation
Do not sanitize job name in DataProcJobBuilder because it can use Jinja macros. Move sanitization to DataprocJobBaseOperator default value for job_name parameter to keep supporting task groups. Fixes apache#28810
f0ccecd
to
b6bb0e9
Compare
@pytest.mark.parametrize( | ||
"job_name", | ||
[ | ||
pytest.param("name", id="simple"), | ||
pytest.param("name_with_dash", id="name with underscores"), | ||
pytest.param("group.name", id="name with dot"), | ||
pytest.param("group.name_with_dash", id="name with dot and underscores"), | ||
], | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need to remove this tests cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were added with sanitization (dot substitution) in DataProcJobBuilder
. Now that I removed sanitization from that class all test cases with dots would fail. I just reverted this test to the state before the changes in DataProcJobBuilder
and added a test that checks if proper job_id is passed from the operator instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were added to avoid regression of #23439
How can you be sure job_name
works with these values if there is no coverage for these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladkal I moved the replacement to DataprocJobBaseOperator
default value for job_name
and added another test that checks if job_name
generated in DataprocJobBaseOperator
has dots replaced by _
so that issue should still be fixed. That issue explicitly says that it's the default job name being broken:
DataprocJobBaseOperator have default of using task_id for job name
And that's exactly what this PR fixes, letting the people who use DataProcJobBuilder
directly to pass any job name they want including templates containing dots.
@Taragolis @eladkal Please review my change. Please note that this change makes migration from old job operators like
Of course I can duplicate task ID, but I shouldn't have to considering that templates work in
|
I tried understand initial issue #28810
Is this PR open for allow something like this? DataProcJobBuilder(
task_id="{{ (dag.dag_id + '-' + task.task_id.replace('_', '-'))[:90] }}"
...
) |
@Taragolis yes, that's exactly what I am trying to do. It worked for a long time and regressed after #23791 |
Sorry for late response. It is still unclear for me about this changes. I have a look on
|
@Taragolis I am aware of those things. In our case we use non-deprecated
|
I guess you found some undocumented feature, and as many undocumented features it might stop work at any moment. airflow/airflow/providers/google/cloud/operators/dataproc.py Lines 1909 to 1911 in 1e7c064
|
I think this is really nice feature to have to use the DataProcJobBuillder instead of "raw" dictionary and it was - I believe the original intention of the builder - and while it is used like that internally by the "specific" operators, there should generally be no problem with it. So I would see that as a nice new "feature" to add. Yes it worked (to some extend before sanitization - but that was more of an accident than intention and sanitization indeed broke it. Why don't we turn it into a "real" feature:
I believe (correct me if I am wrong) we can sanitize the job name here:
Is there any reason we cannot move the sanitization to inside the |
This approach would also have the advantage, that even job names passes by hand would be sanitized and "." removed. |
Nah, it wouldn't work in current implementation, moreover if So I would suggest to remove in next major version everything related to deprecated stuff, rather that try to fix/adopt to current operators. WDYT? |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Do not sanitize job name in DataProcJobBuilder because it can use Jinja macros. Move sanitization to DataprocJobBaseOperator default value for job_name parameter to keep supporting task groups.
Fixes #28810