Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try using De Bruijn indices and persistent array for the environment #844

Closed
yannham opened this issue Sep 22, 2022 · 2 comments
Closed

Comments

@yannham
Copy link
Member

yannham commented Sep 22, 2022

Currently, getting an identifier far away in the layers of the environment (which is a linked list of hashmaps) can be costly. Getting the n-th one implies n hashmap lookup failures and to advance n layers through the linked list to finally retrieve the value.

Nix, on the other hand, uses a variant of De Bruijn indices, which should be more efficient. The idea is to statically determine, for each identifier, which layer it will be in, and what will be its exact position in the layer. We can then replace identifiers by a tuple of index (layer, offset) and use a vector instead of a hashmap. This is the current Nix implementation.

This requires to do a pass to determine the index of each identifier and update the AST accordingly. This also makes generating new scopes dynamically harder, but doing so can be usually fixed quite mechanically (typically, if the interpreter wraps an expression exp at some point as fun x y => %op% x y exp, one can add exp_wrapper = fun exp x y => %op% x y exp in an internal module of the stdlib and evaluate exp_wrapper exp instead, which doesn't introduce a new scope anymore)

On step further, we may use a better persistent data structure than a linked list, to avoid the linear cost of fetching the n-th layer. One possibility is random access list, with logarithmic lookup and constant time cons. Other possibilities includes RRB vectors, advertised to be constant time for those operations in practice (in theory, a log of a never-that-big quantity).

@fuzzypixelz did a first experiment in #807, but it needs more work.

A different route is to replace the whole environment by a persistent hashmap, which is tracked in #837.

@googleson78
Copy link

googleson78 commented Sep 23, 2022

Here's an article on various nameless representations and what tradeoffs you make with each one: https://jesper.sikanda.be/posts/1001-syntax-representations.html

I think it might be helpful to have. Although performance is not discussed very much unfortunately.

@yannham
Copy link
Member Author

yannham commented Nov 12, 2024

Same as #837: RFC007 (#2045) just gets rid of environments at runtime. We can still discuss what environment structure should be used in the compiler, but this is a slightly different discussion, and I don't think De Bruijn indices are worth it anyway for a one pass compiler.

@yannham yannham closed this as completed Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants