Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore original schema-ddl behaviour for objects with no defined fields #1369

Closed
wants to merge 1 commit into from

Conversation

istreeter
Copy link
Contributor

@istreeter istreeter commented Nov 29, 2024

This concerns schemas like:

{"type": "object", "additionalProperties": false}

Older versions of schema-ddl would convert this to a schema type to String (JSON) parquet column. In snowplow/schema-ddl#205 we changed the behaviour so this schema is converted to a None, i.e. do not create a column for this schema. It was a good change for newer loaders (aside from RDB Loader).

But that caused problems for RDB Loader under an edge-case scenario: If the schema above is evolved from 1-0-0 to 1-0-1 and the new schema adds a field to the schema, then RDB Loader tries to create a column for the new field. But that clashes with the old string column created with the older version of RDB Loader.

This PR returns to the original behaviour of schema-ddl for this schemas with no explicit properties. It does so without us making any change to schema-ddl, so we still get all the benefits of snowplow/schema-ddl#205 for the other loaders.

This concerns schemas like:

```
{"type": "object", "additionalProperties": false}
```

Older versions of schema-ddl would convert this to a schema type to
String (JSON) parquet column.  In snowplow/schema-ddl#205 we changed the
behaviour so this schema is converted to a `None`, i.e. do not create a
column for this schema. It was a good change for newer loaders (aside
from RDB Loader).

But that caused problems for RDB Loader under an edge-case scenario: If
the schema above is evolved from `1-0-0` to `1-0-1` and the new schema
adds a field to the schema, then RDB Loader tries to create a column for
the new field.  But that clashes with the old string column created with
the older version of RDB Loader.

This PR returns to the original behaviour of schema-ddl for this schemas
with no explicit properties.  It does so without us making any change to
schema-ddl, so we still get all the benefits of snowplow/schema-ddl#205
for the other loaders.
@istreeter istreeter changed the base branch from master to develop November 29, 2024 20:50
Copy link
Contributor

@spenes spenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial concern was whether we would any additional change for Loader but it seems like we are using fieldsFromTypes in Loader as well to determine the list of columns. I don't have anything else to add, I think it looks good 👍

@istreeter
Copy link
Contributor Author

Closing. We decided the new behaviour is too good to revert. We will handle legacy installations on a case-by-case basis.

@istreeter istreeter closed this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants