Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-par autorange for scattergl #2404

Merged
merged 9 commits into from
Feb 28, 2018
Merged

Conversation

etpinard
Copy link
Contributor

fixes #2354 - to be merged into #2388

This gets scattergl autorange to parity with scatter by reusing its calc-step logic straight-up for data arrays of length less than 1e5. I made a few more improvements to Scattergl.calc that will be highlighted in 0aa0f5e.

I also made a linting commit 5316c47 mostly renaming container -> gd making scattergl/index.js look a little more like the rest of our codebase. I hope @dfcreative won't mind.

... replot is sufficient since regl push.
... to make things look a little more like
    the rest of plotly.js
- reuse scatter axis-expansion logic
- improve 'fast' axis expand routine (using average marker.size
  as pad value)
- use ax.makeCalcdata for all axis types (this creates a new array
  for linear axes, but makes thing more robust)
- add a few TODOs
- most notable change is in gl2d_axes_label2 that
  now shows the correct to-zero autorange.
@etpinard etpinard added bug something broken feature something new status: reviewable labels Feb 26, 2018
// regl-scatter2d uses NaNs for bad/missing values
//
// TODO should this be a Float32Array ??
var positions = new Array(count2);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dfcreative could we set up positions with a Float32Array here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float32 is not enough precision for some timedate and precise linear/etc ranges.


var x = xaxis.type === 'linear' ? trace.x : xaxis.makeCalcdata(trace, 'x');
var y = yaxis.type === 'linear' ? trace.y : yaxis.makeCalcdata(trace, 'y');
Copy link
Contributor Author

@etpinard etpinard Feb 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT with typed array support in mind (-> #2388), I made all traces pass through makeCalcdata unlike previously where x/y array on linear axes bypassed it. Please note that makeCalcdata creates a new array (i.e. x isn't the same as trace.x), unless trace.x is a typed array. So memory-conscious user should switch to using typed arrays.

Note that at 1e6 on my setup, one makeCalcdata call clocks in at roughly 25ms. So using typed arrays can save about 50ms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the extra arrays, memory consumption appears fine. I wouldn't having @dfcreative double-check though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are situations we accept now that would have broken previously (but only on linear axes) - the "junk" characters stripped by cleanNumber - worth a test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Good call!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added mock in c15722f.

On master, the new mock would have looked like:

image

@etpinard
Copy link
Contributor Author

For reference, here's a few benchmarks I took down while working on this PR:

At 1e6 points:

  • xy makeCalcdata: ~ 50ms (this can be 🔪 by using typed arrays)
  • generating the kdtree: ~ 200-300ms (which @dfcreative is hoping to 🔪 soon)
  • autorange (the fast/approximative routine): < 5ms
  • total first render time: 1200-1500ms (with set ranges or not)

At 1e5-1 points:

  • xy makeCalcdata: < 5ms
  • generating the kdtree: ~20ms
  • autorange (the slow but on-par with svg routine): < 100-120ms (can be 🔪 when setting axis ranges)
  • total first render time: 300ms

A few more notable benchmarks:

  • At 1e4 points, first render time for scattergl is ~150ms (vs ~800ms for svg scatter).
  • the slow but on-par with svg autorange routine would clock in at ~800ms at 1e6 points. So yeah, using a fast approximative version for big data array is still the right call.

I used Chrome 64 on a Lenovo t450s with Ubuntu 16.04.

@etpinard etpinard added this to the v1.35.0 milestone Feb 26, 2018
x[i] = i;
if(xa.type === 'log') {
for(i = 0; i < count2; i += 2) {
positions[i] = xa.d2l(positions[i]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐎 we've already been through makeCalcData so we should be able to do c2l here

Copy link
Contributor Author

@etpinard etpinard Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! Done in -> 98d2407

calcAxisExpansion(gd, trace, xa, ya, x, y, ppad);
} else {
if(markerOptions) {
ppad = 2 * (markerOptions.sizeAvg || Math.max(markerOptions.size, 3));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you find sizeAvg to work in practice? A little hard to say I guess, since we don't have a lot of real data > 1e5 points to play with... You're right that we don't want to just use sizeMax, it's worth accepting a bit of clipping in order to generally have less wasted space, and much of the time the largest point won't be on any edge... just wondering how that balance plays out in practice, whether we would be better off with something like halfway between the average and max.

Anyway perhaps we don't have a god way to answer that question right now. I'll just mention that in case we do want to try and do better later, we could find some heuristics that only add a little bit of computation, like binning data points into top/middle/bottom thirds, and only using the top third for the top padding... maybe even with a smooth weighting of the size based on how far it is from the edge. That would still be far faster than the full calculation but could do a better job reducing wasted space without too much clipping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> #2417

})
.then(function() {
expect(gd._fullLayout.xaxis.range).toBeCloseToArray(glRangeX, 'x range');
expect(gd._fullLayout.yaxis.range).toBeCloseToArray(glRangeY, 'y range');
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful test!


// prepare colors
if(multiMarker || Array.isArray(markerOpts.color) || Array.isArray(markerOpts.line.color) || Array.isArray(markerOpts.line) || Array.isArray(markerOpts.opacity)) {
if(multiSymbol || multiColor || multiLineColor || multiOpacity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

{name: 'annotations', fig: require('@mocks/gl2d_annotations.json')}
];

specs.forEach(function(s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely!

@etpinard etpinard mentioned this pull request Feb 28, 2018
21 tasks
@etpinard
Copy link
Contributor Author

@dfcreative @alexcjohnson all comments have been addressed.

@dy
Copy link
Contributor

dy commented Feb 28, 2018

LGTM! 💃

@alexcjohnson
Copy link
Collaborator

💃 from me too! 🎉

@etpinard etpinard merged commit a2fb88b into typed-arrays-support Feb 28, 2018
@etpinard etpinard deleted the scattergl-autorange branch February 28, 2018 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants