Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Parents #162

Closed
adamFinastra opened this issue Jul 1, 2020 · 4 comments · Fixed by #164
Closed

Multiple Parents #162

adamFinastra opened this issue Jul 1, 2020 · 4 comments · Fixed by #164
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@adamFinastra
Copy link

adamFinastra commented Jul 1, 2020

  • SDV version: Latest
  • Python version: 3.7
  • Operating System: Mac OS

Description

I have 3 Tables

Product
ProductKey|P0|P1|P2

Account
AccountKey|HouseholdKey|ProductKey|A0|A1

Household
HouseholdKey|H1|H2|H3|H4|H5

The account key depends on both product and household tables. How can you define the account as having multiple parents?

metadata = Metadata()
metadata.add_table('product', data=tables['product'], primary_key='ProductKey')
metadata.add_table('account', data=tables['account'], primary_key='AccountKey',foreign_key='ProductKey',parent='product')

The above works for 2 tables, but how can I include the household table since Account is a child of that as well?

Edit:
I have defined

account_fields = { 'HouseholdKey': { 'type': 'categorical', "ref": { "field": "HouseholdKey", "table": "household" } }, 'ProductKey': { 'type': 'categorical', "ref": { "field": "ProductKey", "table": "product" } } }

and then ran

metadata = Metadata()
metadata.add_table('household', data=tables['household'], primary_key='HouseholdKey')
metadata.add_table('product', data=tables['product'], primary_key='ProductKey')
metadata.add_table('account',data=tables['account'],primary_key='AccountKey',fields_metadata=account_fields)

Does this seem to be the correct way to define this mullti-parent relationship? This does not seem to return the correct results or capture the relationships between the tables when .sample_all is called.

@adamFinastra adamFinastra changed the title ValueError: There are nan values in your data Multiple Parents Jul 1, 2020
@csala
Copy link
Contributor

csala commented Jul 2, 2020

Hello @adamFinastra, thanks for pointing this out.

As you noticed, the current stable version of SDV does not support Multi-parent schemas yet, but development for this feature is currently underway and the next release (which will be out this week) will introduce support for it.

I'm assigning this issue to the next milestone to properly keep track of this.

@csala csala self-assigned this Jul 2, 2020
@csala csala added the feature request Request for a new feature label Jul 2, 2020
@csala csala modified the milestones: 0.3.3, 0.3.4 Jul 2, 2020
@adamFinastra
Copy link
Author

Thank you! Looking forward to the next release!

@Gayathri-Gunasekar
Copy link

While creating metadata for synthetic data generation using file from local folder, the relationship returns a empty list ( "relationships": [ ])
how can i set relationships , I have used metadata.add_relationship() but it giving already we have a primary key,
but not reflecting in final data

@Gayathri-Gunasekar
Copy link

InvalidMetadataError: Relationship between tables (CUSTOMERSs, ORDERSs) contains an unknown primary key {'CustomerID (PK)'}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants