-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCSToBigQueryOperator - Not generating the Unique BQ Job Name #11660
Comments
I just started using this operator via the backports package a few days ago and I hit this at least one in ten times I invoke the operator, making it unusable without manual supervision. I do not use a dynamic dag but I do have a few
|
Yeah, maybe a bug, try the previous version(GoogleCloudStorageToBigQueryOperator). It works. |
This seems to be the method that incorrectly generates the job IDs but the format does not seem to match exactly what I get in the logs. |
Are you sure you're running the current backport versions? I was getting this issue in 2020.6.24, but it looks like it is solved in 2020.10.5. Unfortunately, 2020.10.5 isn't compatible with the new bigquery/pubsub libraries (2.0.0), so I can't test. |
I'm using
|
Even the big query hook has the same issue, Im, not a coder :) so not able to find the exact cause. When we use |
apparently they upgraded and fixed it at airflow backport 2020.10.29 version. |
Awesome!!! So im closing this now. |
For Cloud Composer users like me, trying to use
In the cloud build step. The warning from earlier in the log may provide some hints as to why this happens:
So for now at least Cloud Composer users are stuck with using either the old non-buggy Operator or waiting for Google to patch this. |
Can you latest composer image |
@muscovitebob 0 I believe indeed it is a dependency problem that is very likely to be addressed in the last version of composer image. From what we know the Composer team keeps the images updated with the releases of Apache Airlfow and the providers and the next image will even include the latest google providers baked, but you should try to install the latest provider there. Note that there is a new version of google backport provider as a release candidate (voting on it finishes on Thursday) so you might even try to install this version instead https://pypi.org/project/apache-airflow-backport-providers-google/2020.11.13rc1/ It has even more fixes:
|
Upgrade with these went well, thanks much! Did not realise there was a new release. |
I was using
GoogleCloudStorageToBigQueryOperator
then I wanted to useGCSToBigQueryOperator
. When I run parallel data export from GCS to BQ, (via a for loop Im generating dynamic task) It is generating the BQ Job name astest-composer:us-west2.airflow_1603109319
(I think its taking node name + current timestamp) as the job id for all the tasks.Error
This is not allowing to import 2nd table, it has to wait for a min(retry in DAG) then its imported.
But the older one is giving proper Job ID like (Job_someUUID)
3 parallel table Import:
Job test-composer:us-west2.airflow_1603109319
job_NYEBXXXXXvoflDiEj2j
), table2(job_9xGl7WlVXXXXXWBriaqbhLQY
), table3(job_aqmVLXXXXXL2YqVCGAqb_5EtW
)The text was updated successfully, but these errors were encountered: