Table profile added #168

yafimvo · 2023-02-27T13:28:04Z

Describe your changes

profile magic command added to %sqlcmd.

How to use it:
%sqlcmd profile -t table_name

Issue ticket number and link

Closes #66

Checklist before requesting a review

I have performed a self-review of my code
I have added thorough tests (when necessary).
I have added the right documentation in the docstring and changelog (when needed)

📚 Documentation preview 📚: https://jupysql--168.org.readthedocs.build/en/168/

…rofile

edublancas · 2023-03-01T23:34:21Z

is this ready for review?

yafimvo · 2023-03-02T05:01:46Z

Yes

doc/_toc.yml

doc/user-guide/explore-tables.md

src/sql/inspect.py

src/sql/magic_cmd.py

edublancas · 2023-03-02T05:31:19Z

I also noticed that the numbers in the profiling table are too long. let's implement a custom format display that shortens it by displaying them in scientific notation. @neelasha23 implemented something like that for sklearn-evaluation's interactive confusion matrix, maybe we can re-use the code?

…rofile

yafimvo · 2023-03-05T16:23:40Z

I also noticed that the numbers in the profiling table are too long. let's implement a custom format display that shortens it by displaying them in scientific notation. @neelasha23 implemented something like that for sklearn-evaluation's interactive confusion matrix, maybe we can re-use the code?

I found this code. It consists of 2 parts, one (convert_to_scientific) takes data in a specific format (key-value) and checks if it's a number. The second part (_is_long_number) checks its length and formats it using np.

We can use _is_long_number as is but convert_to_scientific is a bit different. In the meantime, I added both of them to util in jupysql (with some modifications to convert_to_scientific).

@edublancas
Do you think we should move _is_long_number to ploomber_core?

…rofile

setup.py

src/sql/inspect.py

src/sql/magic_cmd.py

edublancas · 2023-03-07T03:43:40Z

I found this code. It consists of 2 parts, one (convert_to_scientific) takes data in a specific format (key-value) and checks if it's a number. The second part (_is_long_number) checks its length and formats it using np.

We can use _is_long_number as is but convert_to_scientific is a bit different. In the meantime, I added both of them to util in jupysql (with some modifications to convert_to_scientific).

@edublancas Do you think we should move _is_long_number to ploomber_core?

the dependence on numpy is a problem here. it sounds like too much to add numpy just to use such function. looks like with can do it without numpy as well: https://stackoverflow.com/a/69569277/709975

let's keep it here for now, we'll move it to core if we need it elsewhere

…rofile

edublancas

how should I interpret an empty table cell vs a cell with nan?

in the tutorial, I see this:

src/sql/inspect.py

edublancas · 2023-03-08T14:01:46Z

src/sql/run.py

@@ -106,13 +106,18 @@ def __init__(self, sqlaproxy, config):
        self.keys = {}
        if sqlaproxy.returns_rows:
            self.keys = sqlaproxy.keys()
-            if config.autolimit:
+            if isinstance(config.autolimit, int) and config.autolimit > 0:


why do we need this? this would break existing compatibility (setting None and setting autolimit to 0 should display all values) - it's a bit counterintuitive but we inherited this behavior from ipython-sql

Since I removed the hardcoded configuration from SqlCmdMagic there are some missing default configurations (autolimit and style). In this case, config.autolimit and config.style return <LazyConfigValue>.

I added the config.autolimit > 0 since according to one of the tests if autolimit is 0 we should return everything.

ah, I see what you're saying! we have a design problem here.

the configuration is attached to the %sql magic, but ideally we want the config to be accessible to all magics. I'm unsure if this is possible so I think for now the best thing to do is to create another version of run that doesn't take the config argument (we won't be able to provide the autolimit feature when running the profiling but that's fine)

I remember suggesting creating a raw_run function already but I can't remember if it was in the ggplot PR or in a different one that @tonykploomber is working on.

We should also probably open an issue to research more if magics allow setting global variables.

edublancas · 2023-03-08T14:02:08Z

src/sql/run.py

+            _style = None
+            if isinstance(config.style, str):
+                _style = prettytable.__dict__[config.style.upper()]
+
            self.pretty = PrettyTable(
-                self.field_names, style=prettytable.__dict__[config.style.upper()]
+                self.field_names, style=_style


why was this changed?

looks like this is fixing a bug (since the _style variable wasn't used?)

edublancas · 2023-03-08T14:04:40Z

src/tests/test_magic_cmd.py

@@ -70,3 +72,117 @@ def test_columns_with_schema(ip, tmp_empty):
    ).result._repr_html_()

    assert "some_number" in out
+
+
+def test_table_profile(ip, tmp_empty):


can you add some tests to the integration testing file? we should check if this works with other databases, we know it doesn't work with sqlite so we can ignore it.

but we should check for the other ones, we can mark tests as xfail fo the ones that don't pass the tests and we'll fix them later

Added test for each database but with the relevant profile fields (DuckDB and PostgreSQL should work with all fields)

yafimvo · 2023-03-08T18:55:20Z

how should I interpret an empty table cell vs a cell with nan?

in the tutorial, I see this:

It happened since it tried to run stdev on non-numeric values (DateTime) and the exception was vague (sqlalchemy.exc.ProgrammingError), so it ignored these cells.

Changed it.

idomic · 2023-03-13T12:34:19Z

What else is missing here?
Please resolve discussions/questions you answered on already

edublancas

added some comments

…66_profile

idomic · 2023-03-18T12:46:55Z

@yafimvo please resolve conflicts and pending issues so we can merge.

I think the only one left is this run function alias without the conf arg.

…rofile

yafimvo added 8 commits February 27, 2023 15:24

table profile added

b9e4027

lint

0fa3532

test fixed

eca6957

lint

a400a03

autopolars property added to config

7041081

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

8ab801f

…rofile

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

b1ea6e4

…rofile

save report added

9ccd1cc

sync-by-unito bot closed this Feb 27, 2023

edublancas reopened this Feb 27, 2023

edublancas requested changes Mar 2, 2023

View reviewed changes

yafimvo added 3 commits March 5, 2023 10:43

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

17f4d70

…rofile

percentile_disc added, schema added, docs updated

9a0dc82

numpy added to setup

56e3d2e

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

896973a

…rofile

edublancas requested changes Mar 7, 2023

View reviewed changes

yafimvo added 4 commits March 7, 2023 19:29

np removed, run_raw added, queries updated, test fixed

431d2fb

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

ab56ba6

…rofile

test fixed

fafa533

config.autolimit check fixed

122f106

yafimvo requested a review from edublancas March 8, 2023 10:53

edublancas requested changes Mar 8, 2023

View reviewed changes

integration tests added

83b9dd3

integration tests fixed

4d0f84d

yafimvo added 6 commits March 8, 2023 21:17

lint

105aa3d

index removed from integration tests

8e4aac3

postgres, mysql and maria excluded from profile test

829352d

lint

823cc61

postgresql fixed

29492a1

postgresql nan values fixed

abeb44a

Merge branch 'master' into 66_profile

a8517d2

yafimvo requested a review from edublancas March 13, 2023 20:21

edublancas requested changes Mar 14, 2023

View reviewed changes

yafimvo added 3 commits March 14, 2023 15:37

rebase

606b9bb

naming changed

ea81d9e

Merge branch '66_profile' of https://github.com/yafimvo/jupysql into …

a4c5618

…66_profile

idomic requested a review from tonykploomber March 16, 2023 12:08

yafimvo and others added 4 commits March 16, 2023 15:59

Merge branch 'master' into 66_profile

f88a053

rebase

46b6455

sqlalchemy downgraded to 1

1f5bea0

Merge branch '66_profile' of https://github.com/yafimvo/jupysql into …

6f1aaef

…66_profile

yafimvo added 3 commits March 19, 2023 12:47

config removed from raw_run

a0398f1

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

2a4af61

…rofile

Merge branch 'master' of https://github.com/yafimvo/jupysql into 66_p…

b600218

…rofile

yafimvo requested a review from edublancas March 20, 2023 16:32

edublancas approved these changes Mar 20, 2023

View reviewed changes

edublancas merged commit 55ed866 into ploomber:master Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table profile added #168

Table profile added #168

yafimvo commented Feb 27, 2023 •

edited by github-actions bot

Loading

edublancas commented Mar 1, 2023

yafimvo commented Mar 2, 2023

edublancas commented Mar 2, 2023

yafimvo commented Mar 5, 2023

edublancas commented Mar 7, 2023

edublancas left a comment

edublancas Mar 8, 2023

yafimvo Mar 8, 2023

edublancas Mar 14, 2023

idomic Mar 18, 2023

edublancas Mar 8, 2023

edublancas Mar 14, 2023

idomic Mar 18, 2023

edublancas Mar 8, 2023

yafimvo Mar 13, 2023

idomic Mar 18, 2023

yafimvo commented Mar 8, 2023

idomic commented Mar 13, 2023

edublancas left a comment

idomic commented Mar 18, 2023

Table profile added #168

Table profile added #168

Conversation

yafimvo commented Feb 27, 2023 • edited by github-actions bot Loading

Describe your changes

Issue ticket number and link

Checklist before requesting a review

edublancas commented Mar 1, 2023

yafimvo commented Mar 2, 2023

edublancas commented Mar 2, 2023

yafimvo commented Mar 5, 2023

edublancas commented Mar 7, 2023

edublancas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yafimvo commented Mar 8, 2023

idomic commented Mar 13, 2023

edublancas left a comment

Choose a reason for hiding this comment

idomic commented Mar 18, 2023

yafimvo commented Feb 27, 2023 •

edited by github-actions bot

Loading