Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] docs generate appears to be returning no table metadata when run with the --no-compile option #1216

Closed
mikealfare opened this issue May 1, 2024 · 1 comment
Assignees
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe

Comments

@mikealfare
Copy link
Contributor

mikealfare commented May 1, 2024

Current Behavior

Starting on the evening of 4/29/2024, TestDocsGenerateBigQuery is failing across all versions of dbt-bigquery. For 1.7.latest and main, test_run_and_generate is passing and test_run_and_generate_no_compile is failing. For 1.6.latest and prior, both are failing.

Failed test runs can be seen here, passing previously and then failing across the board.

Expected Behavior

We should be able to run docs generate --no-compile.

Steps To Reproduce

For each release branch, I chose a bundle from dbt-core-bundles and took the bundle_requirements_mac_3.8.txt file and used it as a constraints file for dbt-bigquery. I needed to unpin pytz because it was pinned in both the constraints file and dev-requirements.txt. Since the constraints file reflects what would actually get shipped, I chose to unpin in dev-requirements.txt.

Taking 1.7.latest as an example:

  1. Use bundle 1.7.55
  2. Unpin pytz~=2023.3 in dev-requirements.txt to pytz
  3. Install locally against the constraints file:
    pip install -e . -r dev-requirements.txt -c bundle_requirements_mac_3.8.txt
  4. Run the offending test class:
    pytest tests/functional/adapter/test_basic.py -k "TestDocsGenerateBigQuery"
  5. test_run_and_generate_no_compile fails and test_run_and_generate passes for 1.7.latest and 1.8.0b1 or fails as well

Here is a summary of each scenario for each release branch:

branch/tag bundle failed
main main --no-compile
1.7.latest 1.7.55 --no-compile
1.6.latest 1.6.72 both
1.5.latest 1.5.78 both
1.4.latest 1.4.64 both
1.3.latest 1.3.71 both

Relevant log output

# this is happening because `catalog.json` is empty, which can be confirmed by manually viewing it

for key in "nodes", "sources":
    for unique_id, expected_node in expected_catalog[key].items():
        found_node = catalog[key][unique_id]
        KeyError: 'model.test.model'

Environment

- OS: all
- Python: all
- dbt-core: all
- dbt-bigquery: all

Additional Context

This is able to be reproduced using a constraint file of hard pins that pre-dates the integration test failures. This suggests the change that caused the issue is not in dbt-bigquery nor its dependencies. It's either an OS thing (unlikely since it's across platforms) or a change in BigQuery itself.

We were relying on INFORMATION_SCHEMA.__TABLES__ in the catalog query, which is not recommended. However, fixing that seemed to still generate the same error.

Since we see a change in behavior between 1.6 and 1.7, it's worth looking at the diff there for both dbt-bigquery and dbt-core, keeping in mind that it's a change that affects runs without --no-compile, but not with --no-compile.

While debugging, I found that the --no-compile route ran through BigQueryAdapter._get_catalog_schemas but not BigQueryAdapter._catalog_filter_table, perhaps because it failed before getting to the latter? However, when --no-compile was not used, BigQueryAdapter._get_catalog_schemas was never used and BigQueryAdapter._catalog_filter_table was used. In the --no-compile scenario, BigQueryAdapter._get_catalog_schemas received two candidate schemas, which feels odd since there should be a single test schema.

It's worth noting we have 10,249 schemas in the database. We should probably drop all of the test#######_test_% schemas to make sure we're not seeing something funny because of that.

I think it's pagination because we broke 10K schemas, and not a functional bug. If so, we still should look at what happens when we're in a database with 10K schemas as we're not properly handling that scenario.

@mikealfare mikealfare added bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe labels May 1, 2024
@mikealfare
Copy link
Contributor Author

We went beyond the pagination setting for the SDK and were not finding the test schema that was just created, hence the catalog was empty. The CI database was cleared out and tests are now passing. This turns out to be a bug related to pagination, which has been captured in the this ticket.

@mikealfare mikealfare self-assigned this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe
Projects
None yet
Development

No branches or pull requests

1 participant