Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clustering to provide low tile detalization #86

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

chekhovana
Copy link

New optional paramater is added, use_clustering. By default it's false, and tiles with number of instances greater than max_features_per_tile aren't rendered - as before. When use_clustering is true, tiles with number of instances greater than max_features_per_tile are generated with exactly max_features_per_tile instances, obtained in the following way:

  • instanced are clustered with number of clusters equal to max_features_per_tile. MiniBatchKMeans algorithm is used to perform clustering;
  • single instance is randomly picked from each cluster.

Related to #85

@bertt
Copy link
Member

bertt commented Nov 11, 2024

Looks very nice!

1] Testing without clustering: https://bertt.github.io/clustering/demo/nocluster/

nocluster

2] Testing with clustering enabled: https://bertt.github.io/clustering/demo/clustered/

cluster

Some general remarks (I still have to look at the code):

  • Can you add documentation to the readme (adding the options parameter and description of the workings of the function (+some performance timings)

  • Can you add unit tests (one for the clustering function, one including data in create_testdata.sql)

  • About the Accord.MachineLearning dependency: I'm a bit worried about this dependency as it seems the project is abandoned. Are there alternatives?

  • In the clustered demo tileset.json I've changed the refine parameter:

From: "refine": "ADD"

To

"refine": "REPLACE"

So in theory the clustered tiles should disappear when zooming in (and not add the clustered points to the instances)... Some more testing needed here.

@chekhovana
Copy link
Author

Hello @bertt
thanks for reviewing!

I'll try to take into account all your notes step by step. Starting with Accord.MachineLearning dependency:

I tried the following:

  1. ML.Net. With this framework I encountered the following issues:

    • it doesn't support double data type, only float. That leads to precision loss. Sometimes the error is raised that number of objects is less than number of clusters although actually that's not the case. I suggest that the reason lies in rounding error;
    • it doesn't have MiniBatchKmeans, only ordinal KMeans, which is extremely slow when number of objects is large (my real dataset contains 10^7 objects).
  2. Postgis function ST_ClusterKMeans. It's also very slow on large datasets.

This is my first expierience in c# (mainly I am the python programmer), and I am not familiar with machine learning frameworks ecosystem in c#. The only alternative I could find is agglomerative clustering framework: https://github.com/pedrodbs/Aglomera. It's not in read-only state, but last commit was 5 years ago, which is also a bit frustrating.

Should I try Aglomera instead of Accord.MachineLearning? Or maybe you could recommend other suitable frameworks, implementing either MiniBatchKmeans or Agglomerative clustering?

@bertt
Copy link
Member

bertt commented Nov 12, 2024

Should I try Aglomera instead of Accord.MachineLearning?

No for now we can keep Accord.MachineLearning, if in the future a better solution pops up we can consider a switch

@chekhovana
Copy link
Author

Hello @bertt,

Could you please clarify how exactly my tests should be implemented? Link to the examples, if any, would be of great help.

@bertt
Copy link
Member

bertt commented Nov 13, 2024

For unit testing the clustering function something like: https://github.com/bertt/Accord.MachineLearning.Demo/blob/main/src/Accord.MachineLearning.Demo/Accord.MachineLearning.Demo.TestProject/UnitTest1.cs#L7

For the integration test maybe you can add sample data to create_testdata.sql? I think the data of the small dataset with the blocks in issue #85 is sufficient

@chekhovana
Copy link
Author

Thanks for unit test, I appended it to the project. I also appended sql script to create and populate test table and updated README.md.

I don't know, should I change refine strategy depending on clustering usage in the code? Or some extra tests are needed here before doing something?

@bertt
Copy link
Member

bertt commented Nov 13, 2024

looks quite complete like this :-)

About the Refine method, I think I'll add another parameter so users can set the method (add/replace - default add).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants