Add clustering to provide low tile detalization #86

chekhovana · 2024-11-11T09:10:40Z

New optional paramater is added, use_clustering. By default it's false, and tiles with number of instances greater than max_features_per_tile aren't rendered - as before. When use_clustering is true, tiles with number of instances greater than max_features_per_tile are generated with exactly max_features_per_tile instances, obtained in the following way:

instanced are clustered with number of clusters equal to max_features_per_tile. MiniBatchKMeans algorithm is used to perform clustering;
single instance is randomly picked from each cluster.

Related to #85

bertt · 2024-11-11T16:21:59Z

Looks very nice!

1] Testing without clustering: https://bertt.github.io/clustering/demo/nocluster/

2] Testing with clustering enabled: https://bertt.github.io/clustering/demo/clustered/

Some general remarks (I still have to look at the code):

Can you add documentation to the readme (adding the options parameter and description of the workings of the function (+some performance timings)
Can you add unit tests (one for the clustering function, one including data in create_testdata.sql)
About the Accord.MachineLearning dependency: I'm a bit worried about this dependency as it seems the project is abandoned. Are there alternatives?
In the clustered demo tileset.json I've changed the refine parameter:

From: "refine": "ADD"

To

"refine": "REPLACE"

So in theory the clustered tiles should disappear when zooming in (and not add the clustered points to the instances)... Some more testing needed here.

chekhovana · 2024-11-12T05:51:57Z

Hello @bertt
thanks for reviewing!

I'll try to take into account all your notes step by step. Starting with Accord.MachineLearning dependency:

I tried the following:

ML.Net. With this framework I encountered the following issues:
- it doesn't support double data type, only float. That leads to precision loss. Sometimes the error is raised that number of objects is less than number of clusters although actually that's not the case. I suggest that the reason lies in rounding error;
- it doesn't have MiniBatchKmeans, only ordinal KMeans, which is extremely slow when number of objects is large (my real dataset contains 10^7 objects).
Postgis function ST_ClusterKMeans. It's also very slow on large datasets.

This is my first expierience in c# (mainly I am the python programmer), and I am not familiar with machine learning frameworks ecosystem in c#. The only alternative I could find is agglomerative clustering framework: https://github.com/pedrodbs/Aglomera. It's not in read-only state, but last commit was 5 years ago, which is also a bit frustrating.

Should I try Aglomera instead of Accord.MachineLearning? Or maybe you could recommend other suitable frameworks, implementing either MiniBatchKmeans or Agglomerative clustering?

bertt · 2024-11-12T09:59:52Z

Should I try Aglomera instead of Accord.MachineLearning?

No for now we can keep Accord.MachineLearning, if in the future a better solution pops up we can consider a switch

chekhovana · 2024-11-13T09:16:22Z

Hello @bertt,

Could you please clarify how exactly my tests should be implemented? Link to the examples, if any, would be of great help.

bertt · 2024-11-13T10:16:58Z

For unit testing the clustering function something like: https://github.com/bertt/Accord.MachineLearning.Demo/blob/main/src/Accord.MachineLearning.Demo/Accord.MachineLearning.Demo.TestProject/UnitTest1.cs#L7

For the integration test maybe you can add sample data to create_testdata.sql? I think the data of the small dataset with the blocks in issue #85 is sufficient

chekhovana · 2024-11-13T14:01:11Z

Thanks for unit test, I appended it to the project. I also appended sql script to create and populate test table and updated README.md.

I don't know, should I change refine strategy depending on clustering usage in the code? Or some extra tests are needed here before doing something?

bertt · 2024-11-13T14:34:14Z

looks quite complete like this :-)

About the Refine method, I think I'll add another parameter so users can set the method (add/replace - default add).

add clustering

2fb36ec

chekhovana added 3 commits November 13, 2024 14:34

add unit test for clustering

f85af66

update unit tests

ad84a2d

update doc

cccb8f3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add clustering to provide low tile detalization #86

Add clustering to provide low tile detalization #86

chekhovana commented Nov 11, 2024

bertt commented Nov 11, 2024 •

edited

Loading

chekhovana commented Nov 12, 2024

bertt commented Nov 12, 2024

chekhovana commented Nov 13, 2024

bertt commented Nov 13, 2024

chekhovana commented Nov 13, 2024

bertt commented Nov 13, 2024 •

edited

Loading

Add clustering to provide low tile detalization #86

Are you sure you want to change the base?

Add clustering to provide low tile detalization #86

Conversation

chekhovana commented Nov 11, 2024

bertt commented Nov 11, 2024 • edited Loading

chekhovana commented Nov 12, 2024

bertt commented Nov 12, 2024

chekhovana commented Nov 13, 2024

bertt commented Nov 13, 2024

chekhovana commented Nov 13, 2024

bertt commented Nov 13, 2024 • edited Loading

bertt commented Nov 11, 2024 •

edited

Loading

bertt commented Nov 13, 2024 •

edited

Loading