Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for custom metrics on each cluster #12

Closed
wants to merge 2 commits into from

Conversation

georgeke
Copy link

@georgeke georgeke commented Jun 8, 2016

Currently, the only metric we have for each cluster is the number of points in each cluster.

I ran into this "limitation" when wanting to display a number on each cluster which was an aggregation of a custom property on each of my points. For example, each point may have a population property, and I want each cluster to show the total population.

This PR allows points passed in to have an additional property specified by metricKey and a custom reducer function specified by metricReducer.

Now, one can compute metrics on clusters that are based on other dimensions (like population) and compute things like averages, min, max, etc.

@mourner
Copy link
Member

mourner commented Jun 9, 2016

Nice, thanks for the PR! I wonder whether we need to supply metric in addition to point_count, rather than just implementing point_count in terms of those options?

@georgeke
Copy link
Author

georgeke commented Jun 9, 2016

That could be possible if we define having metricKey=undefined mean we return metric: numPoints. I feel that it might be a stretch since metricKey doesn't have much to do with point_count.

There are a few advantages I see with keeping point_count

  1. In my mind, a point_count is a standard property that all clusters should have, whereas metric is more of a computed property that may not even be a number (could be a list, string, etc. depending on what you're doing). So keeping these separate could make more sense to the user.
  2. backwards compatibility. The same key is still there so people's code won't break.

I'm leaning towards keeping it separate, what are your thoughts?

georgeke added a commit to georgeke/caravel that referenced this pull request Jun 13, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
mistercrunch pushed a commit to apache/superset that referenced this pull request Jun 14, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
mistercrunch pushed a commit to apache/superset that referenced this pull request Jun 14, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
@georgeke
Copy link
Author

@mourner Any thoughts/updates on this? If it's not high priority maybe I can close the PR for now.

@mourner
Copy link
Member

mourner commented Jun 16, 2016

@georgeke it's not high priority but let's keep it open — I'll get to it.

@georgeke
Copy link
Author

@mourner thanks!

georgeke added a commit to georgeke/caravel that referenced this pull request Jun 17, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
georgeke added a commit to apache/superset that referenced this pull request Jun 17, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
georgeke added a commit to georgeke/caravel that referenced this pull request Jun 20, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
mistercrunch pushed a commit to apache/superset that referenced this pull request Jun 21, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
mistercrunch pushed a commit to apache/superset that referenced this pull request Jun 23, 2016
This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12
georgeke added a commit to georgeke/caravel that referenced this pull request Jun 24, 2016
use react-map-gl

superclustering of long/lat points

Added hook for map style, huge performance boost from bounding box fix, added count text on clusters

variable gradient size based on metric count

Ability to aggregate over any point property

This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12

Aggregator function option in explore, tweaked visual defaults

better radius size management

clustering radius, point metric/unit options

scale cluster labels that don't fit, non-numeric labels for points

Minor fixes, label field affects points, text changes

serve mapbox apikey for slice

global opacity, viewport saves (hacky), bug in point labels

fixing mapbox-gl dependency

mapbox_api_key in config

expose row_limit, fix minor bugs

Add renderWhileDragging flag, groupby. Only show numerical columns for point radius

Implicitly group by lng/lat columns and error when label doesn't match groupby

'Fix' radius in miles problem, still some jankiness

derived fields cannot be typed as of now -> reverting numerical number change

better grouping error checking, expose count(*) for labelling

Custom colour for clusters/points + smart text colouring

Fixed bad positioning and overflow in explore view + small bugs + added thumbnail
georgeke added a commit to apache/superset that referenced this pull request Jun 24, 2016
* simple mapbox viz

use react-map-gl

superclustering of long/lat points

Added hook for map style, huge performance boost from bounding box fix, added count text on clusters

variable gradient size based on metric count

Ability to aggregate over any point property

This needed a change in the supercluster npm module, a PR was placed here:
mapbox/supercluster#12

Aggregator function option in explore, tweaked visual defaults

better radius size management

clustering radius, point metric/unit options

scale cluster labels that don't fit, non-numeric labels for points

Minor fixes, label field affects points, text changes

serve mapbox apikey for slice

global opacity, viewport saves (hacky), bug in point labels

fixing mapbox-gl dependency

mapbox_api_key in config

expose row_limit, fix minor bugs

Add renderWhileDragging flag, groupby. Only show numerical columns for point radius

Implicitly group by lng/lat columns and error when label doesn't match groupby

'Fix' radius in miles problem, still some jankiness

derived fields cannot be typed as of now -> reverting numerical number change

better grouping error checking, expose count(*) for labelling

Custom colour for clusters/points + smart text colouring

Fixed bad positioning and overflow in explore view + small bugs + added thumbnail

* landscaping & eslint & use izip

* landscapin'

* address js code review
@ben657
Copy link

ben657 commented Aug 8, 2016

As a stepping stone to something like this, would it make sense to be able to get the points in a cluster, given its bounding box? I get this is doable just using the points and some other method to work out what's in the bounding box, but since they're already indexed here it might make sense to add.

This way we could get the points to go along with a cluster, then just iterate over them to calculate whatever needs calculating and add that to the clusters properties.

@redbmk
Copy link
Collaborator

redbmk commented Sep 14, 2016

@ben657 In theory if you've already aggregated a cluster, you shouldn't need to go through each of those points again. Say you have a cluster with 1000 points, and you already have the sum of some property for all those points, then you should only need to add the new point to the cluster instead of having to go through all of them again.

@aparshin
Copy link

aparshin commented Nov 9, 2016

Useful feature for general purpose clustering library!

@mourner It's a pity this PR is still not merged. Can I help somehow?

@redbmk
Copy link
Collaborator

redbmk commented Nov 9, 2016

@aparshin There's also some discussion about this in mapbox/DEPRECATED-mapbox-gl#26. I have some similar code that will help integrate into mapbox and be a little more flexible on the supercluster side of things, but could use some help writing tests and benchmarks, possibly making it faster, and I think the spec needs some work.

@johnlaur
Copy link

johnlaur commented Nov 9, 2016

I have been following the many forks implementing this feature or something similar for the past month or so, and I really like the implementation from @redbmk because it allows for at arbitrary algorithms in the clustering engine. I understand the restrictions preventing such things in the style definition, but I would prefer not to completely close the door to somehow exposing this to in the end user API or through some kind of plugin scheme in the future. In either case it's extremely useful to have for preprocessing clusters on the server-side.

@mourner mourner mentioned this pull request Jan 5, 2017
@lamuertepeluda
Copy link

Hi, I solved the conflicts of https://github.com/redbmk/supercluster with the current branch

See https://github.com/redbmk/supercluster/pull/2

I also enhanced the demo showing categories percentages on the tooltips on the map clusters.

If you could integrate I would be very happy (I'd need this feature so badly also in mapbox-gl)

@mourner
Copy link
Member

mourner commented Jan 27, 2017

@lamuertepeluda yes, I work on this currently — thanks for the help!

@lamuertepeluda
Copy link

@mourner I also updated the demo of the supercluster in order to show how pass a custom aggregator function defined into the main thread to the background worker via Blob.

See https://github.com/redbmk/supercluster/pull/3 and mapbox/mapbox-gl-js#2412 (comment)

@mourner
Copy link
Member

mourner commented Jan 30, 2017

Closing in favor of #36.

@mourner mourner closed this Jan 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants