You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, getting an identifier far away in the layers of the environment (which is a linked list of hashmaps) can be costly. Getting the n-th one implies n hashmap lookup failures and to advance n layers through the linked list to finally retrieve the value.
Nix, on the other hand, uses a variant of De Bruijn indices, which should be more efficient. The idea is to statically determine, for each identifier, which layer it will be in, and what will be its exact position in the layer. We can then replace identifiers by a tuple of index (layer, offset) and use a vector instead of a hashmap. This is the current Nix implementation.
This requires to do a pass to determine the index of each identifier and update the AST accordingly. This also makes generating new scopes dynamically harder, but doing so can be usually fixed quite mechanically (typically, if the interpreter wraps an expression exp at some point as fun x y => %op% x y exp, one can add exp_wrapper = fun exp x y => %op% x y exp in an internal module of the stdlib and evaluate exp_wrapper exp instead, which doesn't introduce a new scope anymore)
On step further, we may use a better persistent data structure than a linked list, to avoid the linear cost of fetching the n-th layer. One possibility is random access list, with logarithmic lookup and constant time cons. Other possibilities includes RRB vectors, advertised to be constant time for those operations in practice (in theory, a log of a never-that-big quantity).
@fuzzypixelz did a first experiment in #807, but it needs more work.
A different route is to replace the whole environment by a persistent hashmap, which is tracked in #837.
The text was updated successfully, but these errors were encountered:
Same as #837: RFC007 (#2045) just gets rid of environments at runtime. We can still discuss what environment structure should be used in the compiler, but this is a slightly different discussion, and I don't think De Bruijn indices are worth it anyway for a one pass compiler.
Currently, getting an identifier far away in the layers of the environment (which is a linked list of hashmaps) can be costly. Getting the n-th one implies n hashmap lookup failures and to advance n layers through the linked list to finally retrieve the value.
Nix, on the other hand, uses a variant of De Bruijn indices, which should be more efficient. The idea is to statically determine, for each identifier, which layer it will be in, and what will be its exact position in the layer. We can then replace identifiers by a tuple of index
(layer, offset)
and use a vector instead of a hashmap. This is the current Nix implementation.This requires to do a pass to determine the index of each identifier and update the AST accordingly. This also makes generating new scopes dynamically harder, but doing so can be usually fixed quite mechanically (typically, if the interpreter wraps an expression
exp
at some point asfun x y => %op% x y exp
, one can addexp_wrapper = fun exp x y => %op% x y exp
in an internal module of the stdlib and evaluateexp_wrapper exp
instead, which doesn't introduce a new scope anymore)On step further, we may use a better persistent data structure than a linked list, to avoid the linear cost of fetching the n-th layer. One possibility is random access list, with logarithmic lookup and constant time
cons
. Other possibilities includes RRB vectors, advertised to be constant time for those operations in practice (in theory, a log of a never-that-big quantity).@fuzzypixelz did a first experiment in #807, but it needs more work.
A different route is to replace the whole environment by a persistent hashmap, which is tracked in #837.
The text was updated successfully, but these errors were encountered: