-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Table profile added #168
Table profile added #168
Conversation
is this ready for review? |
Yes |
I also noticed that the numbers in the profiling table are too long. let's implement a custom format display that shortens it by displaying them in scientific notation. @neelasha23 implemented something like that for sklearn-evaluation's interactive confusion matrix, maybe we can re-use the code? |
I found this code. It consists of 2 parts, one ( We can use @edublancas |
the dependence on numpy is a problem here. it sounds like too much to add numpy just to use such function. looks like with can do it without numpy as well: https://stackoverflow.com/a/69569277/709975 let's keep it here for now, we'll move it to core if we need it elsewhere |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -106,13 +106,18 @@ def __init__(self, sqlaproxy, config): | |||
self.keys = {} | |||
if sqlaproxy.returns_rows: | |||
self.keys = sqlaproxy.keys() | |||
if config.autolimit: | |||
if isinstance(config.autolimit, int) and config.autolimit > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this? this would break existing compatibility (setting None and setting autolimit to 0 should display all values) - it's a bit counterintuitive but we inherited this behavior from ipython-sql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I removed the hardcoded configuration from SqlCmdMagic
there are some missing default configurations (autolimit
and style
). In this case, config.autolimit
and config.style
return <LazyConfigValue>
.
I added the config.autolimit > 0
since according to one of the tests if autolimit is 0 we should return everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see what you're saying! we have a design problem here.
the configuration is attached to the %sql
magic, but ideally we want the config to be accessible to all magics. I'm unsure if this is possible so I think for now the best thing to do is to create another version of run
that doesn't take the config argument (we won't be able to provide the autolimit feature when running the profiling but that's fine)
I remember suggesting creating a raw_run
function already but I can't remember if it was in the ggplot PR or in a different one that @tonykploomber is working on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also probably open an issue to research more if magics allow setting global variables.
_style = None | ||
if isinstance(config.style, str): | ||
_style = prettytable.__dict__[config.style.upper()] | ||
|
||
self.pretty = PrettyTable( | ||
self.field_names, style=prettytable.__dict__[config.style.upper()] | ||
self.field_names, style=_style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this is fixing a bug (since the _style variable wasn't used?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -70,3 +72,117 @@ def test_columns_with_schema(ip, tmp_empty): | |||
).result._repr_html_() | |||
|
|||
assert "some_number" in out | |||
|
|||
|
|||
def test_table_profile(ip, tmp_empty): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests to the integration testing file? we should check if this works with other databases, we know it doesn't work with sqlite so we can ignore it.
but we should check for the other ones, we can mark tests as xfail fo the ones that don't pass the tests and we'll fix them later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test for each database but with the relevant profile fields (DuckDB and PostgreSQL should work with all fields)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolving.
What else is missing here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments
@yafimvo please resolve conflicts and pending issues so we can merge. I think the only one left is this |
Describe your changes
profile
magic command added to%sqlcmd
.How to use it:
%sqlcmd profile -t table_name
Issue ticket number and link
Closes #66
Checklist before requesting a review
📚 Documentation preview 📚: https://jupysql--168.org.readthedocs.build/en/168/