Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Normalization for Nested Data #886

Closed
3 tasks
cgardens opened this issue Nov 10, 2020 · 3 comments · Fixed by #2044
Closed
3 tasks

Basic Normalization for Nested Data #886

cgardens opened this issue Nov 10, 2020 · 3 comments · Fixed by #2044
Assignees
Labels
type/enhancement New feature or request
Milestone

Comments

@cgardens
Copy link
Contributor

cgardens commented Nov 10, 2020

Tell us about the problem you're trying to solve

  • Basic Normalizations works well with "flat" data but if there is any nested data we run into problems.

Problems:

  • - frontend truncates the nesting in the catalog so the correct catalog is not persisted / sent to the worker
  • - standard testing will be a bit more complicated and will need to be adjusted for each integration. the hard thing here is that instead of looking in one table, it will now need to make sure it reads all of the tables involved in the nesting
  • - we still haven't been able to manual test this thoroughly.

Splitting out from basic Normalization Issue: #782

@ghost
Copy link

ghost commented Jan 25, 2021

Would the plan here be to extract nested data into separate tables? E.g., if "product" has a list of "product variants", is there then a product_variants table with product_id pointing to the product table? What if the nested data structure has no natural index?

@ChristopheDuong
Copy link
Contributor

Would the plan here be to extract nested data into separate tables? E.g., if "product" has a list of "product variants", is there then a product_variants table with product_id pointing to the product table? What if the nested data structure has no natural index?

Yes, it is best to have natural index columns in order to rebuild the original data through join operations between the different "normalized" tables. So there would indeed be some recommendations to make sure such key exists in your tables.

  • Thus, the plan would be to build as you describe such separate tables.
  • But we would probably still have to keep nested versions around for the cases when such keys are not available...
  • Another option would be to have some kind of configurations in the UI to make this choice. If you need to unnest a column, you need to specify a natural key otherwise we leave the data nested.

Additionally, we've been experimenting at generating a hash_id column from the parent row data to attach to each unnested record in the child table

@ChristopheDuong
Copy link
Contributor

Ready to be merged & closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants