Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Metadata.detect_from_dataframe function #2211

Closed
npatki opened this issue Sep 9, 2024 · 0 comments · Fixed by #2222 or #2186
Closed

Add a Metadata.detect_from_dataframe function #2211

npatki opened this issue Sep 9, 2024 · 0 comments · Fixed by #2222 or #2186
Labels
feature request Request for a new feature

Comments

@npatki
Copy link
Contributor

npatki commented Sep 9, 2024

Problem Description

In an upcoming version of SDV, we will be consolidating the Multi and Single table metadata objects into a streamlined Metadata object. Effectively, this object is the same as multi-table, so it includes functions such as detect_from_dataframes for detecting multi tables at once.

However, if you have a single table, such a function is not ideal. It forces you to input your data as a dictionary even though you only have a single dataframe. We should not be adding more friction to the single table case.

Expected behavior

The streamlined metadata object should have a function called detect_from_dataframe (singular). To improve the usage of this function, we will make it a class function that ultimately returns an instance of the metadata object.

from sdv.metadata import Metadata

metadata = Metadata.detect_from_dataframe(data=my_dataframe, table_name='users')

Parameters:

  • (required) data, a singular pandas DataFrame object that represents a single table
  • table_name: The name of the table that the data represents. This will be saved in the metadata.
    • (default) If no table name is provided, just default to using table

Functionality: Detect the columns from the data create metadata with the given table name.

Output: An instance of the metadata object with the detected metadata

Additional context

For multi-table usages, we still recommend using detect_from_dataframes (plural) in order to detect relationships.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
1 participant