Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination Bigquery Normalization: Name transformation should be consistent between normalization and connector #25163

Closed
edgao opened this issue Apr 13, 2023 · 3 comments
Labels
team/destinations Destinations team's backlog type/bug Something isn't working

Comments

@edgao
Copy link
Contributor

edgao commented Apr 13, 2023

The connector will ensure that the dataset ID (i.e. stream namespace) starts with an alphanumeric character - https://github.com/airbytehq/airbyte/blob/master/airbyte-integrations/connectors/destination-bigquery/src/main/java/io/airbyte/integrations/destination/bigquery/BigQuerySQLNameTransformer.java#L38

but normalization doesn't do this. So if you configure a custom stream namespace that starts with an underscore (e.g. _foo)... then destination-bigquery will create the raw tables in the n_foo dataset, but normalization will search for them in an _foo dataset.

We probably want to add the n prefix in normalization. (this stops being a problem once normalization runs in java, since we can just pass around a single BigQueryWriteConfig - but that's not an immediate solution)

(dataset names are allowed to start with an underscore, but they won't show up in the UI - https://cloud.google.com/bigquery/docs/datasets#dataset-naming)

@edgao edgao added type/bug Something isn't working team/destinations Destinations team's backlog labels Apr 13, 2023
@evantahler
Copy link
Contributor

evantahler commented Apr 18, 2023

Grooming

  • ${namespace}_airbyte -> "hubspot" + _airbyte => hubspot_airbyte
    *happy path
  • ${namespace}_airbyte -> "" + _airbyte => _airbyte
    • hidden in BQ
  • ${namespace}_airbyte -> "_public" + _airbyte => _public_airbyte
    • hidden in BQ

#3 won't be fixed by a platform change, so we still need to handle this.

  • appending "n" is silly to begin with, we should at lesat append "Airbyte" or something meaningful

The plan:

  1. in metabase confirm that case 3 is rate
  2. if checkin catalogs #3 is really rare, this fix belongs in the platform so that "" namespaces are never empty and we append something (name of the source?). We make this change as part of this story.
  3. Remove the "n_" hack the destination has now
  4. docs to educate BQ users that _ tables are hidden (add to destination docs).
  5. Migration plan - what do we tell existing users (use output of step 1)

@edgao
Copy link
Contributor Author

edgao commented Apr 21, 2023

I was reminded that https://github.com/airbytehq/airbyte-internal-issues/issues/943 also exists. I think that's a nonissue because the UI trims off whitespace (#17113), but it's another thing to fix when we're doing the typing/deduping in java.

@edgao
Copy link
Contributor Author

edgao commented Jun 23, 2023

closing since we want platform to do this https://github.com/airbytehq/airbyte-internal-issues/issues/1875

@edgao edgao closed this as not planned Won't fix, can't repro, duplicate, stale Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/destinations Destinations team's backlog type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants