Try using De Bruijn indices and persistent array for the environment #844

yannham · 2022-09-22T09:10:56Z

Currently, getting an identifier far away in the layers of the environment (which is a linked list of hashmaps) can be costly. Getting the n-th one implies n hashmap lookup failures and to advance n layers through the linked list to finally retrieve the value.

Nix, on the other hand, uses a variant of De Bruijn indices, which should be more efficient. The idea is to statically determine, for each identifier, which layer it will be in, and what will be its exact position in the layer. We can then replace identifiers by a tuple of index (layer, offset) and use a vector instead of a hashmap. This is the current Nix implementation.

This requires to do a pass to determine the index of each identifier and update the AST accordingly. This also makes generating new scopes dynamically harder, but doing so can be usually fixed quite mechanically (typically, if the interpreter wraps an expression exp at some point as fun x y => %op% x y exp, one can add exp_wrapper = fun exp x y => %op% x y exp in an internal module of the stdlib and evaluate exp_wrapper exp instead, which doesn't introduce a new scope anymore)

On step further, we may use a better persistent data structure than a linked list, to avoid the linear cost of fetching the n-th layer. One possibility is random access list, with logarithmic lookup and constant time cons. Other possibilities includes RRB vectors, advertised to be constant time for those operations in practice (in theory, a log of a never-that-big quantity).

@fuzzypixelz did a first experiment in #807, but it needs more work.

A different route is to replace the whole environment by a persistent hashmap, which is tracked in #837.

The text was updated successfully, but these errors were encountered:

googleson78 · 2022-09-23T08:13:19Z

Here's an article on various nameless representations and what tradeoffs you make with each one: https://jesper.sikanda.be/posts/1001-syntax-representations.html

I think it might be helpful to have. Although performance is not discussed very much unfortunately.

yannham · 2024-11-12T17:58:46Z

Same as #837: RFC007 (#2045) just gets rid of environments at runtime. We can still discuss what environment structure should be used in the compiler, but this is a slightly different discussion, and I don't think De Bruijn indices are worth it anyway for a one pass compiler.

yannham mentioned this issue Sep 22, 2022

Improve environment lookup times #807

Closed

yannham mentioned this issue Jul 28, 2023

Performance improvement ideas: tracking issue #1484

Open

9 tasks

yannham closed this as completed Nov 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Try using De Bruijn indices and persistent array for the environment #844

Try using De Bruijn indices and persistent array for the environment #844

yannham commented Sep 22, 2022

googleson78 commented Sep 23, 2022 •

edited

Loading

yannham commented Nov 12, 2024

Try using De Bruijn indices and persistent array for the environment #844

Try using De Bruijn indices and persistent array for the environment #844

Comments

yannham commented Sep 22, 2022

googleson78 commented Sep 23, 2022 • edited Loading

yannham commented Nov 12, 2024

googleson78 commented Sep 23, 2022 •

edited

Loading