Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Implement Pandas Profiling column statistics extractor #1104

Closed
mgorsk1 opened this issue May 17, 2021 · 1 comment
Closed
Labels
status:completed Issue is completed and on master

Comments

@mgorsk1
Copy link
Contributor

mgorsk1 commented May 17, 2021

Pandas profiling (https://github.com/pandas-profiling/pandas-profiling) is a tool widely used for calculating reports on tables. It returns parsable json reports with quantile/descriptive statistics on column level. This PR proposes to develop PandasProfilingColumnStatsExtractor to populate TableColumnStats entities.

Expected Behavior or Use Case

  • new extractor exists parsing pandas-profiling reports to TableColumnStats

Service or Ingestion ETL

  • databuilder

Possible Implementation

  • rely on pandas profiling json report to collect metadata
@mgorsk1 mgorsk1 added help wanted Extra attention is needed good first issue Good for newcomers keep fresh Disables stalebot from closing an issue labels May 17, 2021
@mgorsk1
Copy link
Contributor Author

mgorsk1 commented May 17, 2021

cc @sbrugman

@mgorsk1 mgorsk1 added status:in_progress Issue that is being worked on right now status:completed Issue is completed and on master and removed good first issue Good for newcomers help wanted Extra attention is needed status:in_progress Issue that is being worked on right now keep fresh Disables stalebot from closing an issue labels May 17, 2021
@mgorsk1 mgorsk1 closed this as completed May 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:completed Issue is completed and on master
Projects
None yet
Development

No branches or pull requests

1 participant