Alias database to allow matching between rows and Manifest #85

Fokko · 2020-05-22T09:05:01Z

Currently the docs generation is broken because we need to supply
the database name when fetching the relations:

if dct['table_database'] is None:
    dct['table_database'] = dct['table_schema']

However, when we get the manifest we don't get the database:

{CatalogKey(database='', schema='fokko', name='logistical_configuration_data'):
	['model.dbtlake.logistical_configuration_data']}

Therefore the keys never line up, and we can't match the Catalogs:

https://github.com/fishtown-analytics/dbt/blob/9d0eab630511723cd0bc328f6f11d3ffe6c8f879/core/dbt/task/generate.py#L108

We get from the describe relations:

CatalogKey(database='fokko', schema='fokko', name='logistical_configuration_data')

Due to the logic above. I think ALIASing this is the easiest way out. Making the database non-optional in core would be another option, that would be cleaner in the long run. Please advise.

Currently the docs generation is broken because we need to supply the database name when fetching the relations: if dct['table_database'] is None: dct['table_database'] = dct['table_schema'] However, when we get the manifest we don't get the database: {CatalogKey(database='', schema='fokko', name='logistical_configuration_data'): ['model.dbtlake.logistical_configuration_data']} Therefore the keys never line up, and we can't match the Catalogs: https://github.com/fishtown-analytics/dbt/blob/9d0eab630511723cd0bc328f6f11d3ffe6c8f879/core/dbt/task/generate.py#L108 We get from the describe relations: CatalogKey(database='fokko', schema='fokko', name='logistical_configuration_data') Due to the logic above.

Fokko · 2020-05-22T09:10:59Z

cc @jtcohen6

jtcohen6 · 2020-05-22T15:39:56Z

Good find @Fokko. I confirmed that while the docs generation commands work, the resulting docs site is missing information from the catalog.

@beckjake Could you take a look at the ALIAS approach here? It feels related to the changes in #83 around schema/database.

beckjake · 2020-05-22T16:10:29Z

This is a good find, and the approach looks valid.

That said, would it perhaps make sense to change the list_relations_without_caching method's self.Relation.create? I haven't tested that, but it seems like it'd solve the problem effectively the same way. Or perhaps the SparkRelation.__post_init__ I added in #83 should set self.database = self.schema or self.schema = self.database depending upon None-ness. I think that would be reasonable as well.

I would feel very comfortable with that fix, whereas I feel a bit concerned about the knock-on effects downstream of setting an actual value as an alias.

I don't think either this PR or my suggestion will actually conflict with #83 (though I haven't tested). I'm pretty confident #83 totally misses this issue.

As an aside: I feel like a broken record here, but we really need a better test story for plugins. This kind of issue just shouldn't happen, and our test suite isn't even at a point where we can reasonably try to add a test for this. I guess we could modify the db-integration-tests branch we use for spark to support reading from a json file and validating some structural things, but that's a lot to ask on a PR.

Fokko · 2020-05-22T16:27:53Z

Thanks for the insights. I'll give it a try to set it in the __post_init__.

I don't think the database can ever be set since it excluded in the accepted connection keys: https://github.com/fishtown-analytics/dbt-spark/blob/master/dbt/adapters/spark/connections.py#L53

Fokko · 2020-05-22T16:29:34Z

The only down-stream issue that I've seen so far is the schema and database being set in the docs:

But I don't care so much about that, however, also the statistics seem to be broken again. Might dive into this somewhere next week, kinda busy at the moment.

beckjake · 2020-05-22T17:00:35Z

That _connection_keys method actually lists the keys in the credentials that will be logged in dbt debug output. It exists to avoid logging passwords/private keys/etc.

I think in 0.17.0 there will be more things that could have problems with it, because we use translate_aliases in more places. Even if it's fine there, in the long run we'd like to expand the use of aliases quite a lot to exist just about everywhere, and that's a lot harder if they can step on each other.

I'd prefer to add a special flag to core for disabling fields in adapters or even support this specific adapter behavior where database=schema as an option in Relations in core, if it comes down to it. That would be a lot of work, but at least it wouldn't constrain the design space so much.

jtcohen6 · 2020-05-26T02:41:22Z

@beckjake I have a draft PR open (#91) that attempts to follow your recommendations above. I'm still running into issues with catalog generation.

@Fokko I opened a separate issue (#90) re: owner / table stats not showing up. I think this has been broken for a while, and we should absolutely fix it.

I also opened an issue re: the less-than-ideal Relation display in the docs site: dbt-labs/dbt-docs#94

Fokko force-pushed the fd-fix-describe-table branch from 6a65a33 to cabc03b Compare May 22, 2020 09:08

Fokko mentioned this pull request May 23, 2020

The organisation is an integer #88

Closed

Fokko added 2 commits May 23, 2020 10:49

Change _get_cache_schemas as well

263308c

Remove redudant code

4563394

jtcohen6 mentioned this pull request May 26, 2020

Always set database = schema #91

Closed

beckjake mentioned this pull request May 26, 2020

Fix the catalog, making use of rc3 fixes #92

Merged

jtcohen6 closed this in #92 May 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alias database to allow matching between rows and Manifest #85

Alias database to allow matching between rows and Manifest #85

Fokko commented May 22, 2020

Fokko commented May 22, 2020

jtcohen6 commented May 22, 2020

beckjake commented May 22, 2020

Fokko commented May 22, 2020

Fokko commented May 22, 2020

beckjake commented May 22, 2020

jtcohen6 commented May 26, 2020

Alias database to allow matching between rows and Manifest #85

Alias database to allow matching between rows and Manifest #85

Conversation

Fokko commented May 22, 2020

Fokko commented May 22, 2020

jtcohen6 commented May 22, 2020

beckjake commented May 22, 2020

Fokko commented May 22, 2020

Fokko commented May 22, 2020

beckjake commented May 22, 2020

jtcohen6 commented May 26, 2020