Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding bar and pie plot to %sqlplot #417

Closed
edublancas opened this issue Apr 19, 2023 · 13 comments · Fixed by #508
Closed

adding bar and pie plot to %sqlplot #417

edublancas opened this issue Apr 19, 2023 · 13 comments · Fixed by #508
Assignees

Comments

@edublancas
Copy link

edublancas commented Apr 19, 2023

We should add %sqlplot bar to create bar plots.

The SQL should be pretty simple:

select col, count(*)
from table
group by col

then, we take the col, count pairs and use them to create the plot.

cc @jorisroovers - tagging you so you're in the loop!

@idomic
Copy link

idomic commented Apr 19, 2023

What about pie charts? 👍

@edublancas
Copy link
Author

I've never been a fan of pie charts 😂, humans aren't good at judging areas. I like normalized bar charts better - they can help to visualize the same data.

But I think @jorisroovers mention they use it. what do you think?

@jorisroovers
Copy link

Yes we use pie plots extensively. We even use the fancier donut/sunburst plots!

I think different audiences like different types of charts.

In a BI setting (target audience: managers) the visual aspect is key in my experience, i.e. making it simple yet pretty matters (you're telling a story or showing a quick snapshot of current state).

In a Data Science context, I think density of information is much more important, and there might even be some aversion to "management charts".

Long story short, my 2ct would be that JupySQL should support a wide range of chart types and let the user decide.

Plotly and bokeh might provide good inspiration here:

https://docs.bokeh.org/en/0.8.2/docs/user_guide/charts.html

https://plotly.com/python/

Hope this helps!

@edublancas
Copy link
Author

edublancas commented May 23, 2023

ok so let's add %sqlplot bar and %sqlplot pie

@mehtamohit013: please write acceptance criteria.

@mehtamohit013
Copy link

AC Criteria:

  • Add Bar plot and pie chart to ggplot API + Test cases
  • Modify the documentation and user guide to include both
  • Changelog

@edublancas
Copy link
Author

Add Bar plot and pie chart to ggplot API + Test cases

First ensure that the tests pass with duckdb, once that's done, we can evaluate which other DBs we'll support.

Modify the documentation and user guide to include both

add the examples to the existing section in the docs

btw, the commands should be in %sqlplot (not in %sqlcmd plot), I'll fix the references in my previous comments to avoid confusion.

@edublancas edublancas changed the title adding bar plot to %sqlcmd plot adding bar plot to %sqlplot May 23, 2023
@edublancas edublancas changed the title adding bar plot to %sqlplot adding bar and pie plot to %sqlplot May 23, 2023
@mehtamohit013
Copy link

mehtamohit013 commented May 23, 2023

@edublancas
So, if I understood correctly:
There are two APIs for plotting: ggplot and %sqlplot and both internally rely on src/sql/plot.py to plot both histogram and boxplot. So the scope of the issue would be to add the bar and pie class to the src/sql/plot.py and integrate it would %sqlplot using matplotlib as backend. Also, I have to check if there are any special functions required to calculate these when executing SQL queries and check which DBs support it.

@edublancas
Copy link
Author

right. integrate it with %sqlplot and add your code to plot.py, later we can define if we also offer this as part of the ggplot API

Also, I have to check if there are any special functions required to calculate these when executing SQL queries and check which DBs support it.

Yes, I'd say first get DuckDB working and once that's ready we can add a test to our integration tests to find out which ones are passing and which ones are failing, then we define what we do

@mehtamohit013
Copy link

mehtamohit013 commented May 23, 2023

The SQL should be pretty simple:

select col, count(*)
from table
group by col

then, we take the col, count pairs and use them to create

I was thinking more generic, letting the user pass the two columns x and height as done in matplotlib API and we simply plot it.

Also, is there a need for a stacked bar graph (I don't know if that's a thing or not)?
Also, for multiple-column bar graphs similar to histogram?
@edublancas

@edublancas
Copy link
Author

I was thinking more generic, letting the user pass the two columns x and height as done in matplotlib API and we simply plot it.

we want JupySQL to be a higher-level API than matplotlib. If you think from the perspective of a data analyst using JupySQL, they want to explore the data quickly to understand it, and computing the height is an extra step that we want to avoid. Users should only point us to a column and we should group and plot the bar/pie chart

@edublancas
Copy link
Author

let's leave out stacked or multiple-column charts for now

@mehtamohit013
Copy link

we want JupySQL to be a higher-level API than matplotlib. If you think from the perspective of a data analyst using JupySQL, they want to explore the data quickly to understand it, and computing the height is an extra step that we want to avoid. Users should only point us to a column and we should group and plot the bar/pie chart

@edublancas
For the pie chart, should we do two columns one for label and another for size? or the same as the bar chart?

@edublancas
Copy link
Author

For the pie chart, should we do two columns one for label and another for size? or the same as the bar chart?

same as bar chart. the command should only take a column as an argument, and we should compute the percentages, then pass this to the plotting function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants