Performance improvement ideas: tracking issue #1484

yannham · 2023-07-28T09:43:02Z

Performance is a tricky matter. We want to avoid spending time on implementing changes wishfully improving performance without:

A concrete use-case showing that a performance issue is either an actual problem in practice, or is subpar when compared to other languages/tools
Principled measurements pointing to functions to be improved, and after implementing a performance-related change, confirming the gain

Still, with time, we've spotted many potential points of congestion and candidates for performance improvements. This issue serves to record those insights, and might be used as a starting point when investigating a performance issue or looking for performance improvements for specific parts of the Nickel pipeline.

This issue isn't concerned with big paradigm shifts such as incremental evaluation or moving towards a bytecode compiler + virtual machine approach. Such topics deserve their own issue/notes/RFC/paper. The ideas here are closely tied to the current implementation, and we want to avoid losing them or ignoring them while they could be relevant.

Feel free to add your own, that I'll try to edit into this description.

Interpreter

Get rid of most generated variables #1647
Cache lazy contract application. Since RFC005, and previous work on lazy array contracts, Nickel basic data structures (arrays and records) now apply contracts lazily. That is, applying an array contract or a record contract just attaches data to those structures, but doesn't evaluate anything. When merging records, contracts are still accumulated lazily, without being applied. Only when a field, or an array element is requested (either directly, or via transformations, such as mapping a function over), the contracts are actually applied. However, we currently have no place to cache this contract application. This means that extracting a field or an array element several times will apply pending contracts as many times, while we could cache this result in a thunk and guarantee that a contract application is only evaluated once. A simple solution would be, for arrays and records, to store a pair of closures (original, cached) for each element, where original corresponds to the initial value without contracts applied, and cached is a fresh thunk (not aliased by anyone else when allocating the array or record) and can serve to store the field with contracts applied.
Use contract equality to avoid double contract application. We implemented contract equality (limited, of course, it's undecidable in general) for typechecking purpose. It wouldn't be very hard to re-use this code to decide contract equality at runtime, and thus avoid double-contract application e.g. at record fields (which lazily record contracts in a vector) or arrays. It might even be adapted to handle stand-alone, inline contract applications, but for that we would need to not evaluate annotated term away to a simple contract application, which we do currently (Contract deduplication for records #1646).
[ ]Use a De Bruijn indices for the environment. The environment is a linked list of hashmaps, where the linked list mirrors more or less scopes and sharing points. We can make it a big better (in practice, not asymptotically), by using a linked list of arrays and pre-computing the index of each allocated variables, as in De Bruijn indexing (Try using De Bruijn indices and persistent array for the environment #844)
Use a persistant HashMap for the environment. Alternative to De Bruijn indices for faster lookups in the environment (Retry using HAMT as Environment if the closurization model changes. #837)

Typechecker

Make types shareable. Currently, the AST for typ::Type is using owned pointers (Box). This means that subtypes aren't shared, and unification of say a -> b with T -> U needs to perform a deep copy of both T and U. Possible solutions are to use Rc instead of Box (as we do for terms), or to use an arena, see below.
Use an arena to allocate types. Related to the previous item: instead of using Rc, we could also allocate all types in an arena. Types (or, rather, unification types) have currently a well delimited lifetime: they are allocated as part of a statically typed block, and can't escape this block. Thus, an arena would be created when entering a top-level statically typed block (not included in another one), and dropped when exiting this block.
Implement a proper union-find data structure, and in particular use direct pointers for unification variables. Currently, unification variables are indices inside a vector. But they could be direct pointers, eliminating an additional indirection, and eliminating the need to thread the unification table everywhere as well.
Implement path compression. When resolving a unification variable, remember all the variables encountered while following the link in the unification table/union-finnd structure and sets them all to the final result. Doing so, the next resolution will succeed in only one step.

Program transformation

Store direct pointers to nodes to transform. Some transformations, like imports resolution, concerns statistically only very few nodes. Still, we currently traverse the whole term to look for e.g. an import. We could in fact record pointers to imports at parsing time, and process them directly later. The premise isn't entirely true: we have to perform a traversal anyway (in fact, at least two, one bottom-up, and one top-down), for other transformations (such as the free-var transformation or the share normal form, which are probably better to do as a normal traversal). But it might be the case that at least one full traversal could be saved, if we find out that all of the steps of this traversal can be rather performed by recording the nodes to process during parsing instead, which might be attractive if, like import resolution, the set of nodes to process is expected to be small.

The text was updated successfully, but these errors were encountered:

jneem · 2024-02-08T16:01:17Z

LSP

Faster reference finding. Definition-finding in LSP is fairly fast, but reference-finding might be slow, as it currently works by identifying many possible references, and then running definition-finding on each one. This could be potentially be improved by running an extra analysis pass that caches all the definition-finding information. Alternatively, we could perform this pass lazily so that the first reference-finding is a little slow but subsequent ones are faster.

This was referenced Sep 21, 2023

Investigate degradation of performance for medium-size codebase (5~10 kLoc) #1622

Closed

Implement contract deduplication optimization #1631

Merged

yannham mentioned this issue Sep 29, 2023

Get rid of most generated variables #1647

Closed

yannham mentioned this issue Feb 6, 2024

Improved (but possibly expensive) reference-finding #1800

Merged

yannham mentioned this issue Feb 14, 2024

Enum pattern compilation #1817

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement ideas: tracking issue #1484

Performance improvement ideas: tracking issue #1484

yannham commented Jul 28, 2023 •

edited

Loading

jneem commented Feb 8, 2024

Performance improvement ideas: tracking issue #1484

Performance improvement ideas: tracking issue #1484

Comments

yannham commented Jul 28, 2023 • edited Loading

Interpreter

Typechecker

Program transformation

jneem commented Feb 8, 2024

LSP

yannham commented Jul 28, 2023 •

edited

Loading