Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asymptotic complexity of keyed functions. #25

Open
wrengr opened this issue Dec 9, 2021 · 0 comments
Open

Asymptotic complexity of keyed functions. #25

wrengr opened this issue Dec 9, 2021 · 0 comments
Labels

Comments

@wrengr
Copy link
Owner

wrengr commented Dec 9, 2021

All the functions which reconstruct keys (traverseWithKey, mapBy, contextualMapBy, foldrWithKey, toList, keys, etc) suffer from a quadratic slowdown due to how we reconstruct the keys. I can make some changes to reduce the non-asymptotic factors; but so far as I can tell, actually fixing the bug is irreconcilable under the current design.

We could easily eliminate the slowdown by instead constructing a reversed variant of lazy bytestrings (reversed, because we need snoc-lists of strict bytestrings, whereas the standard lazy bytestrings are cons-lists). However, if the caller then converts those to standard bytestrings, doing so will incur the quadratic cost again. (Going from reversed-lazy-bytestrings to lazy-bytestrings is only quadratic in the number of chunks, whereas going to strict-bytestrings is quadratic in the length.) Thus, while this approach would technically solve our problems, it only does so by pushing the problem off onto the user, which is unacceptable.

I do, at least, have some ideas for how we could solve this by storing extra metadata in the trie. But so far I have no idea how great the overhead of that would be. Ideally, if the metadata can be made cheap enough to compute on the fly, then we could just compute it when we need it. Each call to the key-reconstructing functions would require two passes over the trie, but that's far better than the current approach. However, if the user wants to call these functions often, then it makes more sense to keep the metadata around; but that introduces the burden of updating it whenever changes are made to the trie. And if the cost of those updates is too great, then that pushes us towards forking the datatype so that users can decide which cost model they want —which I'd really rather not do, if it can be avoided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant