-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Description
Idea
The adjacency matrix aggregation currently has a default limit of 100 filters (index.max_adjacency_matrix_filters). This is a sensible value considering the quadratic memory complexity of the operation. For some use cases this is pretty restricting. Of course in these cases it is possible to increase this default level, but the memory pressure quickly becomes unfeasible.
I talked with Mark Harwood and we came to the conclusion that this limit could probably be increased if sparse data structures were used for storing intermediate results in cases of large matrices (> 100 filters), because the memory usage wouldn't grow as fast as it would with dense data structures. This would probably result in a space/time trade-off, but allow for new use cases.
API proposal
It's an option to not expose this in the API and apply a heuristic when to use which implementation (e.g. < 100 filters uses dense data structures, > 100 filters uses sparse data structures). The advantage of this is approach is that the user doesn't have to make a decision, but it could result in unexpected performance cliffs.
An alternative to that is to add a separate parameter to the request object (e.g. useSparseMatrix) to leave the decision to the user. In that case there should be a separate setting, e.g. index.max_sparse_adjacency_matrix_filters.
A cleaner way would be have a completely new aggregation sparse_adjacency_matrix to emphasize the distinction even more.
cc @markharwood