Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graphs slow with big data #462

Closed
set92 opened this issue Apr 14, 2021 · 7 comments
Closed

Graphs slow with big data #462

set92 opened this issue Apr 14, 2021 · 7 comments
Labels
good first issue Good for newcomers Performance Performance-related changes UI Updates to the front-end

Comments

@set92
Copy link

set92 commented Apr 14, 2021

I love the interactivity of plotly to plot data and visualize it, but the problem is when you try to do it with medium to big datasets. In my case I'm trying with a dataset of (235395, 21), I was surprise because in general is fast, but I wanted to mention a couple points.

  • When I check the correlation matrix it appears a little bit squeeze, maybe have a button to see it in a new tab? Or redimension it directly in that view?
  • The main annoying thing I saw was in describe section
    • I expected to be able to use arrows keys to move up and down the columns and not use the mouse all the time, this could be another issue call Keyboard integration, as use ↑↓ to move across the columns and ←→ to move across "describe, histogram, categories..." sections
    • And also in the Q-Q plot I noticed some slowness, I can't really view the values of the chart because it freezes, so I thought in showing you https://github.com/serant/lenspy that worked for me in a case of plotting a big dataset, although not sure if it really active, I think it doesn't support all kinds of plots...
  • Is mainly for the developers but I saw in others github tag per issue, and that way we able to see quickly the improvements or bugs.
@aschonfeld
Copy link
Collaborator

First off, thank you so much for the feedback.

  • Hmm, I thought that the Correlations tab already opens in a new tab. But I was planning on adding that as something that opens in the sliding side panel similar to the "Missing Analysis" where it opens 75% of the width of the screen, but if you think it helps I can make it open 100% and then you just just either click the button to open it in a new tab or close it entirely
  • I really like this idea of making the arrow keys used for moving back and forth between columns! I can definitely work on adding this
  • As for the QQ plot, it creates so many points that it makes it hard for chart.js to handle them quickly. I will definitely look into lenspy though and if it renders images on the back-end similar to missingno I can integrate that
  • Not sure I understand what you mean by this? Do you mean I should tag a release for every issue? That would probably be a little overkill from a release standpoint but I can try tagging issues with a certain version number if you'd like and then you'd know they were included in that release (versus the current format where everything is listed in this CHANGES file)

Thanks again for the feedback these are great ideas!

@set92
Copy link
Author

set92 commented Apr 14, 2021

About the tags (seems they are called labels) I meant more to know about the issues before entering to it, for example https://github.com/pandas-profiling/pandas-profiling/issues I can check in a quick view and filter the issues of performance, bugs, the issues related with requesting features... idk maybe it doesn't really help much.

@aschonfeld
Copy link
Collaborator

aschonfeld commented Apr 14, 2021

That makes sense, i can work on (at the very least) adding tags to the open tickets

@aschonfeld aschonfeld added good first issue Good for newcomers Performance Performance-related changes UI Updates to the front-end labels Apr 15, 2021
@aschonfeld
Copy link
Collaborator

@set92 So I've implemented the ←→ and ↑↓ key handlers on the describe (side panel & popup) and I ended up converting the QQ chart over to a plotly scattergl. Its definitely faster but the time its taking now is to send all the data across from the server to the browser because it trying to render a scatter with the #rows in dataframe times 2 (1 for the scatter and 1 for the trendline). Here's a demo:

Screen.Recording.2021-04-16.at.12.43.42.AM.mov

Let me know what you think? I was also going to put some little captions by the column name header & the chart toggle to let users know that they use keys to navigate.

@set92
Copy link
Author

set92 commented Apr 16, 2021

It looks pretty cool, if you like it or want that could be extrapolated to all the interface, for example alt to select the top menu and the arrows to move around... etc to the point you are able to navigate around only with keyboard, but maybe is too much because not sure how many people will use this type of features. Maybe is better to wait until more people to request it and focus on other features.

But really nice job, this library has improved a lot since I saw it for the first time, and now I know why 😄.

@set92 set92 closed this as completed Apr 16, 2021
@aschonfeld
Copy link
Collaborator

@set92 I'll keep this open until I do the actual release. Just so people don't wonder why they don't see it yet 😉

I do really like the idea of having more keyboard navigation. I had added hotkeys a while back but they were mainly shortcuts to open/close popups. I'll keep digging on this stuff and will let you know when a new version out on pypi.

P.S. feel free to toss your ⭐ on the repo 🙏

@aschonfeld aschonfeld reopened this Apr 16, 2021
aschonfeld added a commit that referenced this issue Apr 17, 2021
@aschonfeld
Copy link
Collaborator

Updated in v1.43.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers Performance Performance-related changes UI Updates to the front-end
Projects
None yet
Development

No branches or pull requests

2 participants