Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: %sqlcmd profile #66

Closed
edublancas opened this issue Jan 3, 2023 · 1 comment · Fixed by #168
Closed

proposal: %sqlcmd profile #66

edublancas opened this issue Jan 3, 2023 · 1 comment · Fixed by #168
Assignees
Labels
stash Label used to categorize issues that will be worked on next

Comments

@edublancas
Copy link

edublancas commented Jan 3, 2023

When working with a new dataset, practitioners need to explore and summarize it quickly: column values, types, distributions, etc. We could create a %sqlcmd profile magic that produces an HTML table/report of a table. Similar to pandas-profiling, except this would run the analysis on the SQL engine, making it more scalable.

Examples:

display an embedded summary:

%sqlcmd profile --table my_table

store a report:

%sqlcmd profile --table my_table --output report.html
@edublancas edublancas changed the title proposal: %sqlprofile proposal: %sqlcmd profile Jan 30, 2023
@edublancas edublancas added the stash Label used to categorize issues that will be worked on next label Feb 7, 2023
@edublancas
Copy link
Author

we can start with something simple here and show the same stats that pandas.describe shows.

to display it nicely, we can use prettytable (see inspect.py for an example)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stash Label used to categorize issues that will be worked on next
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants