Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: percent empty #118

Merged
merged 9 commits into from
Feb 11, 2023
Merged

feat: percent empty #118

merged 9 commits into from
Feb 11, 2023

Conversation

axiomofjoy
Copy link
Contributor

No description provided.

@axiomofjoy axiomofjoy linked an issue Dec 19, 2022 that may be closed by this pull request
@axiomofjoy axiomofjoy self-assigned this Jan 22, 2023
Comment on lines 8 to 9
def percent_empty(df: pd.DataFrame) -> "pd.Series[float]":
return df.isnull().sum() / df.shape[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to pass in columns to run the compute over.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import pandas as pd


def percent_empty(df: pd.DataFrame) -> "pd.Series[float]":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return dict mapping column_name to percent empty

import pandas as pd


def percent_empty(df: pd.DataFrame) -> "pd.Series[float]":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return None if dataframe has no rows.

num_records = dataframe.shape[0]
if num_records == 0:
return {col: None for col in column_names}
return dict(dataframe[column_names].isnull().sum() / dataframe.shape[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: num records?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 65c0d0e

Comment on lines +47 to +54
def _get_percent_empty_dataloader(model: Model) -> DataLoader[str, Optional[float]]:
async def _percent_empty_load_function(column_names: List[str]) -> List[Optional[float]]:
column_name_to_percent_empty = percent_empty(
dataframe=model.primary_dataset.dataframe, column_names=column_names
)
return [column_name_to_percent_empty[col] for col in column_names]

return DataLoader(load_fn=_percent_empty_load_function)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might not be worth a dataloader tbh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will address in a separate pr

@axiomofjoy axiomofjoy merged commit 108cd3e into main Feb 11, 2023
@axiomofjoy axiomofjoy deleted the feat/percent-empty branch February 11, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[metrics] Percent empty
2 participants