-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Asset Inventory] Dataflow pipeline is using an deprecated SDK version (2.41.0) #1374
Comments
bmenasha
added a commit
to bmenasha/professional-services
that referenced
this issue
Nov 15, 2024
…1374 and other performance/bug fixes Issue # 1374: Use the latest Dataflow SDK version. Issue # 1373: Unable to deal with new cloudbuild.googleapis.com/Build assets: The core issue was that the discovery_name of this new asset type is incorrectly reported as cloudbuild.googleapis.com/Build rather than 'Build'. Try to deal with that by correcting any discovery_name with a '/' in it. But there are other fixes necessary to speed processing. Other performance/bug fixes: - Use the discovery document generated schema if we have one over any resource generated one. This is a big performance improvement as determining the schema from the resource is time consuming, it's also not productive as if we have an API resource schema, it should always match the resource json anyways. - Add ancestors, update_time, location, json_data to discovery generated schema. This prevents those properties from being dropped if we always rely on it. - Sanitize discovery document generated schemas. If we are to always rely on them, it's possible they could be invalid, so enforce the bigquery rules on them as well. - Use copy.deepcopy less, only when we copy a source into a destination field. - Prevent bigquery columns with BQ_FORBIDDEN_PREFIXES from being created. There are some bigtable resources that can include these prefixes. - Some BigQuery model resources had NaN and Infinity values for numeric fields. Try to handle those in sanitization. - When merging schemas, stop after we have BQ_MAX_COLUMNS fields. This helps to stop the merge process earlier. (It can take forever if there are many unique fields and many elements). - When enforcing schema on a resource, recognize when we are handling addition properties and add the additional property fields to the value of the additional property key value list in push_down_additional_properties. This produced more regular schemas. - Add ignore_unknown_values to the load job so that we don't fail if resource contains fields not present in the schema. - Accept and pass --add-load-date-suffix via main.py. - Better naming of some local variables for readability. - Some format changes suggested by Intellij.
bmenasha
added a commit
to bmenasha/professional-services
that referenced
this issue
Nov 15, 2024
…1374 and other performance/bug fixes Issue # 1374: Use the latest Dataflow SDK version. Issue # 1373: Unable to deal with new cloudbuild.googleapis.com/Build assets: The core issue was that the discovery_name of this new asset type is incorrectly reported as cloudbuild.googleapis.com/Build rather than 'Build'. Try to deal with that by correcting any discovery_name with a '/' in it. But there are other fixes necessary to speed processing. Other performance/bug fixes: - Use the discovery document generated schema if we have one over any resource generated one. This is a big performance improvement as determining the schema from the resource is time consuming, it's also not productive as if we have an API resource schema, it should always match the resource json anyways. - Add ancestors, update_time, location, json_data to discovery generated schema. This prevents those properties from being dropped if we always rely on it. - Sanitize discovery document generated schemas. If we are to always rely on them, it's possible they could be invalid, so enforce the bigquery rules on them as well. - Use copy.deepcopy less, only when we copy a source into a destination field. - Prevent bigquery columns with BQ_FORBIDDEN_PREFIXES from being created. There are some bigtable resources that can include these prefixes. - Some BigQuery model resources had NaN and Infinity values for numeric fields. Try to handle those in sanitization. - When merging schemas, stop after we have BQ_MAX_COLUMNS fields. This helps to stop the merge process earlier. (It can take forever if there are many unique fields and many elements). - When enforcing schema on a resource, recognize when we are handling addition properties and add the additional property fields to the value of the additional property key value list in push_down_additional_properties. This produced more regular schemas. - Add ignore_unknown_values to the load job so that we don't fail if resource contains fields not present in the schema. - Accept and pass --add-load-date-suffix via main.py. - Better naming of some local variables for readability. - Some format changes suggested by Intellij.
this pull request should resolve it. #1394 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The GCP Dataflow console is alerting us that the Dataflow SDK version used by the Asset Inventory tool is deprecated (2.41.0).
Can you please modify the template to use an up-to-date SDK ?
The text was updated successfully, but these errors were encountered: