Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New module @turf/clusters #787

Merged
merged 12 commits into from
Jun 18, 2017
Merged

New module @turf/clusters #787

merged 12 commits into from
Jun 18, 2017

Conversation

stebogit
Copy link
Collaborator

First draft of @turf/cluster, Ref: #33

var clusters = require('@turf/clusters')
var points = // featurecollection of points
var numberOfClusters = 4
var clustered = clusters(points, numberOfClusters)

// clustered is an object with the following attributes:
// - `points`: the input (featurecollection of) points with each point given a `cluster` number property
// - `centroids`: featurecollection of centroid points, each with its `cluster` number as property

screen shot 2017-06-13 at 1 22 42 am

Uses clusters for simplicity, but there might be a faster/better implementation.
Not sure if the output is ok, I tried to follow @tmcw's suggestion.

Open to suggestions and further testing.

@stebogit stebogit self-assigned this Jun 13, 2017
@rowanwins
Copy link
Member

ooo this looks nifty @stebogit

@DenisCarriere DenisCarriere added this to the 4.5.0 milestone Jun 15, 2017
@DenisCarriere
Copy link
Member

DenisCarriere commented Jun 16, 2017

  • Change the module name to cluster => so far all the TurfJS module names are singular.
  • 50% agree with the output type, at the moment I can't really think of a better way to do it 😕

update typescript definition
@DenisCarriere
Copy link
Member

DenisCarriere commented Jun 16, 2017

  • All properties are lost, when creating these clusters it would be useful to preserve the existing properties.

@stebogit stebogit changed the title New module @turf/clusters New module @turf/cluster Jun 16, 2017
@stebogit
Copy link
Collaborator Author

stebogit commented Jun 17, 2017

@DenisCarriere I replaced clusters dependency with skmeans which allows to compute incredibly faster (like thousands times!) and to conserve the input and all its properties.

@stebogit
Copy link
Collaborator Author

@DenisCarriere

so far all the TurfJS module names are singular.

The more I think about it the more I believe this should be called 'clusters', since it's returning (and really expected to return) multiple clusters. The majority of Turf modules are singular because they deal with or generate a singular "thing", but few of them do return multiple elements and they have plural names, like isobands, isolines, kinks, polygon-tangents, helpers(functions).

@DenisCarriere
Copy link
Member

👍 Ok that works, we can name it clusters.

@DenisCarriere
Copy link
Member

I replaced clusters dependency with skmeans which allows to compute incredibly faster (like thousands times!) and to conserve the input and all its properties.

That's great, yea it was really slow before... I'm glad it's a lot faster now

numberOfClusters = numberOfClusters || Math.round(Math.sqrt(count / 2));

// collect points coordinates
var data = featureReduce(points, function (prevValue, currentFeature) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using coordEach makes it about ~30-80% faster.

    // collect points coordinates
    var data = [];
    coordEach(points, function (coord) {
        data.push(coord);
    });

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good to know! I still don't master these methods 😕

@DenisCarriere
Copy link
Member

DenisCarriere commented Jun 18, 2017

  • Number of points must be greater or equal to numberOfClusters (add as input validation)
/Users/mac/Github/turf/packages/turf-clusters/node_modules/skmeans/main.js:13
                        var d = (v1[i]||0) - (v2[i]||0);
                                                ^

TypeError: Cannot read property '0' of undefined
    at eudist (/Users/mac/Github/turf/packages/turf-clusters/node_modules/skmeans/main
.js:13:28)
    at skmeans (/Users/mac/Github/turf/packages/turf-clusters/node_modules/skmeans/mai
n.js:63:23)
    at module.exports (/Users/mac/Github/turf/packages/turf-clusters/index.js:49:26)
    at Test.t (/Users/mac/Github/turf/packages/turf-clusters/test.js:50:17)
    at Test.bound [as _cb] (/Users/mac/Github/turf/packages/turf-clusters/node_modules
/tape/lib/test.js:66:32)
    at Test.run (/Users/mac/Github/turf/packages/turf-clusters/node_modules/tape/lib/t
est.js:85:10)

@stebogit stebogit changed the title New module @turf/cluster New module @turf/clusters Jun 18, 2017
@stebogit
Copy link
Collaborator Author

@DenisCarriere I added the validation on numberOfClusters

@DenisCarriere
Copy link
Member

@stebogit 👍 Looks good!

@DenisCarriere DenisCarriere merged commit 722846c into master Jun 18, 2017
@DenisCarriere DenisCarriere deleted the clusters branch June 18, 2017 17:08
@stebogit stebogit mentioned this pull request Jun 19, 2017
@cyrilchapon
Copy link

cyrilchapon commented Jun 19, 2017

Awesome, I've been waiting this for sooooo long.

@DenisCarriere
Copy link
Member

We should be pushing a new minor release soon with all these new modules being merged in the past week.

@cyrilchapon
Copy link

cyrilchapon commented Jun 20, 2017

What about the complexity on adding a maxDistance factor, as described in #33 ?

EDIT: this involves the ability to "switch" clustering algorithm, keeping same API

@stebogit
Copy link
Collaborator Author

@cyrilchapon I'm not an expert in clustering, even less in clustering algorithms, however it seems to me an algo requiring a maxDistance would be substantially different that the k-means one.
Even reading through #33 I don't quite get what kind of feature you're suggesting; could you please give an example of use and application for that clustering method?

@cyrilchapon
Copy link

cyrilchapon commented Jun 21, 2017

@stebogit yep.. I'm not an expert too in custering algos, but it seems indeed that it would be a very different algorithm.

Speaking of API, I would be suggesting something like

var clusters = require('@turf/clusters')
var points = // featurecollection of points

//kmeans strategy
var options = {
  numberOfClusters: 4
}
var kmeansClustered = clusters(clusters.consts.algorithms.KMEANS, points, options)

//OR

//maxDistance strategy
var options = {
  maxDistance: 3,
  maxDistanceUnit: 'km'
}
var kmeansClustered = clusters(clusters.consts.algorithms.SOMENAME, points, options)

or maybe

var clusters = require('@turf/clusters')
var points = // featurecollection of points

//kmeans strategy
var numberOfClusters = 4
var kmeansClustered = clusters.kmeans(points, numberOfClusters)

//OR

//maxDistance strategy
var maxDistance = 3
var maxDistanceUnit = 'km'
var kmeansClustered = clusters.somename(points, maxDistance, maxDistanceUnit)

But I'm not able to implement such a thing

@DenisCarriere
Copy link
Member

Both strategies shouldn't be implemented in the same module, they should be separate modules with unique names.

A good to read blog post from @mourner https://www.mapbox.com/blog/supercluster/ about Clustering.

Supercluster isn't meant for TurfJS, however some of the source code is good to look at for inspiration on how to tackle the distance strategy.

Another good library built by @mourner is kdbush which is used by Supercluster and this would most likely be needed in this new cluster library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants