-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize TypeEngine
for garbage collection
#6603
Comments
The idea of shareable types is interesting and not something I really considered before outside the builtin core types. Also I get a bit unsure when you say that they should never be GC'd. First lets say we consider all
A potential idea me and @JoshuaBatty previously discussed before would be the possibility of having separate per-module engines, an id would have a reserved space to contain the engine id. This would allow a O(1) removal of the engine data when the module is collected. |
Thanks for the early feedback and excellent questions @tritao! This issue description indeed requires more elaboration including the proper definitions and more concrete examples of measurements that justify proposed design. This first issue description is primarily created so that I can link to it in several TODOs left in the code that call for future changes in the |
## Description This PR implements a major rewrite of the `TypeEngine`. The rewrite: - significantly improves overall compilation performance. The compilation time of the real-world [Spark Orderbook workspace](https://github.com/compolabs/orderbook-contract) got **reduced from 8.99 to 5.94 seconds**, and the memory consumption from **3.32 to 1.78 GB**. - provides robust API for inserting types, related to the usage of the `source_id` (see: #5991). The new API forbids not providing `source_id`s or providing semantically questionable or non-optimal ones. - introduces simpler, much less verbose, API for inserting types. The PR also removes the obsolete and unused `TypeInfo::Storage`. The PR **does not address** the following: - The `TypeEngine`'s "garbage-collection-(un)friendlines" (see: #6603). - The handling of the `TypeInfo::Custom` and `TypeInfo::TraitType` types within the `TypeEngine` (see: #6601). - The number of interactions with the `TypeEngine`. E.g., removing unnecessary insertions after `resolve()`ing or `monomorphyze()`ing types will be done as a part of `DeclEngine` optimization. Closes #5991. ## Shareable types The PR formalizes the notion of a _shareable type_ within the `TypeEngine` (strictly speaking, a shareable `TypeInfo`). A shareable type is a type that is both: - _unchangeable_: it, or any of its parts, cannot be replaced during the unification or monomorphization. - and _undistinguishable by annotations_: it doesn't carry any additional information (annotations) that could differ it from another instance of `TypeInfo` that would, purely from the type perspective, be a same type. E.g., `u64` or `MyStruct<u64, bool>` are unchangeable while `Numeric`, `Unknown` or `MyStruct<T1, T1>` are changeable. E.g., in this example, `a` and `b` have the same type, `[u64; 2]` but those two types differ in spans assigned to the "u64"s and the "2"s, and are treated as different within the `TypeEngine`, and thus as non-shareable. ``` let a: [u64; 2] = [1, 1]; let b: [u64; 2] = [2, 2]; ``` Shareability of a type is crucial for reusing the `TypeSourceInfo` instances within the type engine. Shareable types can be given different `TypeId`s without the need to newly allocate a `TypeSourceInfo`. ## Performance improvements The cummulative effect of the performance improvements on the compilataion of the real-world [Spark Orderbook workspace](https://github.com/compolabs/orderbook-contract) is given below. The compilation means the frontend compilation, up to the IR generation (`forc check`). **Compilation time** ``` Before: Time (mean ± σ): 8.995 s ± 0.297 s [User: 7.126 s, System: 1.289 s] Range (min … max): 8.675 s … 9.262 s 3 runs After: Time (mean ± σ): 5.945 s ± 0.517 s [User: 4.749 s, System: 0.768 s] Range (min … max): 5.349 s … 6.280 s 3 runs ``` **Memory consumption** ``` --------------------------------------------------------- total(B) useful-heap(B) extra-heap(B) --------------------------------------------------------- Before: 3,316,786,808 3,237,317,383 79,469,425 After: 1,784,467,376 1,743,772,406 40,694,970 ``` ### Applied optimizations: - **Replacement of expensive `insert` calls with compile time constants for built-in types.** Built-in types like `!`, `bool`, `()`, `u8`, etc. are inserted into the engine at its creation at predefined slots within the `slab`. The `id_of_<built-in-type>` methods just return those predefined `TypeId`s, effectively being compiled down to constants. The calls like `type_engine.insert(engines, TypeInfo::Boolean, None)` are replaced with maximally optimized and non-verbose `type_engine.id_of_bool()`. - **Elimination of extensive creation of `TypeSourceInfo`s for `TypeInfo::Unknown/Numeric`s.** `Unknown` and `Numeric` are inserted into the engine ~50.000 times. Each insert used to create a new instance of `TypeSourceInfo` with the `source_id` set to `None`. The optimization replaces those ~50.000 instances with two predefined singleton instances, one for `Unknown` + `None` and one for `Numeric` + `None`. (Note that when implementing #6603, we will want to bind also `Unknown`s and `Numeric`s to `source_id`s different then `None`, but we will still want to reuse the `TypeInfo` instances.) - **Elimination of extensive temporary heap-allocations during hash calculation.** The custom hasher obtained by `make_hasher` required a `TypeSourceInfo` to calculate the hash and it was called every time the `insert` was called, ~530.000 times. Getting the `TypeSourceInfo` originally required cloning the `TypeInfo`, which depending on the concrete `TypeInfo` instance could cause heap allocations, and also heap-allocating that copy within an `Arc`. Hash was calculated regardless of the possibility for the type to be stored in the hash map of reusable types. The optimization removed the hashing step if the type is not shareable, removed the cloning of the `TypeInfo` and introduced a custom `compute_hash_without_heap_allocation` method that produced the same hash as the `make_hasher` but without unnecessary temporary heap-allocations of `TypeSourceInfo`s. - **Replacement of `TypeSourceInfo`s within the hash map with `Arc<TypeSourceInfo>`s.** The hash map unnecessarily required making copies of reused `TypeSourceInfo`s. - **Introducing the concept of a _shareable type_ and rolling it out for all types.** Previously, the engine checked only for changeability of types by using the `TypeInfo::is_changeable` method. The implementation of that method was extremely simplified, e.g. treating all structs and enums with generic arguments as changeable even if they were fully monomorphized. This resulted in over-bloating the engine with type instances that were actually unchangeable, but considered to be changeable. Also, strictly seen, the unchangeability (during unification and monomorphization) is not the only necessary criteria for reuse (or sharing) a type within the engine. Another important aspect is, as explained above, the _differentiability by annotations_. The PR introduces the notion of a _shareable type_ which is both unchangeable and not differentiable by annotations. The optimization takes advantage of such types by storing them only once per `source_id`. - **Elimination of extensive unnecessary inserts of new types during `replace()` calls.** When `replace()`ing types during unifications, a new `TypeSourceInfo` instance was created for every replacement, ~46.000 times. This meant a heap-allocation of the `TypeSourceInfo` (16 bytes) and the `Arc`-contained `TypeInfo` (232 bytes), even if the replaced type was shareable and already available in the type engine. The optimization now reuses an already existing shareable type if available. ## Robustness related to `source_id`s The issues we had with properly providing the `source_id` in the `insert` method are explained in #5991. This PR removes the `source_id` from the new public API and calculates it internally within the engine, based on the type being inserted. This makes inserting of types both more robust and less verbose and eliminates the possibility of providing a semantically wrong `source_id`. Note that the calculation of an optimal `source_id` done within the engine fully corresponds to the "calculations" we currently have at call sites. E.g., when inserting enums, previously we always had to write `type_engine.insert(engines, TypeInfo::Enum(decl_id), enum_decl.span.source_id())`. Fetching the `source_id` from the enum declaration is now done within the `insert_enum` method: `type_engine.insert_enum(engines, decl_id)`. Note that for certain types we will want to change the current behavior and either provide a `source_id` or pick a more suitable one. E.g, even when inserting `Unknown`s, we will want to have `source_id`s, if possible. This will be done in #6603, but again, together with providing a robust API that will be difficult to misuse. ## Simplicity As already mentioned in some of the examples above, the new `id_of_<type>()` and `insert_<type>` methods provide much simpler and less verbose API to use. Introducing those methods remove a lot of boilerplate code from the callers. Also, the individual `insert_<type>` methods are additionally optimized for inserting the particular type. The common `insert` method is intended to be used only in cases where the inserted `TypeInfo` is not known at the call site. Here are some examples of the new API versus the existing one: |Before|After| |------|-----| | `type_engine.insert(engines, TypeInfo::Tuple(vec![]), None)` | `type_engine.id_of_unit()` | | `type_engine.insert(engines, TypeInfo::Unknown, None)` | `type_engine.new_unknown()` | | `type_engine.insert(engines, TypeInfo::UnsignedInteger(IntegerBits::SixtyFour), None)` | `type_engine.id_of_u64()` | For more complex types, like, e.g., `TypeInfo::Tuple`, the difference in inserting is even more prominent, as can be seen from the diffs of numerous code lines deleted in this PR. ## Checklist - [x] I have linked to any relevant issues. - [x] I have commented my code, particularly in hard-to-understand areas. - [ ] I have updated the documentation where relevant (API docs, the reference, and the Sway book). - [ ] If my change requires substantial documentation changes, I have [requested support from the DevRel team](https://github.com/FuelLabs/devrel-requests/issues/new/choose) - [x] I have added tests that prove my fix is effective or that my feature works. - [ ] I have added (or requested a maintainer to add) the necessary `Breaking*` or `New Feature` labels where relevant. - [x] I have done my best to ensure that my PR adheres to [the Fuel Labs Code Review Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md). - [x] I have requested a review from the relevant team or maintainers. --------- Co-authored-by: Joshua Batty <joshpbatty@gmail.com> Co-authored-by: João Matos <joao@tritao.eu>
NOTE: This is the draft version of the proposal. It lists the three major points that a "garbage-collection-friendly"
TypeEngine
should support, but still not in enough detail. The concrete architectural proposal is also missing, as well as the results of measurements from a real-world project that support the reasoning behind the proposal.The numbers given in the below description are coming from the compilation of the Spark Orderbook workspace, a realistic real-world project.
The solution will also solve #6687.
Shareable
TypeInfo
s and their lifetimeWe use the term shareable type as defined in #6613.
Examples of shareable types are all built-in types like, e.g.,
u64
, but also types like, e.g.,Option<u64>
orMyStruct
orMyGenericStruct<bool>
.The number of those types is orders of magnitude smaller then the overall number of types inside of the
TypeEngine
, and they can be heavily reused across all the projects and modules, which is currently not the case. (TODO: Add data from measurements.)Currently, the
TypeEngine
optimized in #6613 reuses theTypeSourceInfo
s of shareableTypeInfo
s and not theTypeInfo
s themselves. This is sub-optimal because shareable types that cannot be changed in between garbage collections are still assigned asource_id
and are being garbage collected. This table shows the content of theshareable_types
hash map. It is evident that types like!
,()
,bool
, etc are stored unnecessarily many times.Those types should "live forever" and never be garbage collected.
Also, the shareable types like
Option<u64>
should be shareable across modules, means for differentsource_id
s we should still point to a single sharedOption<u64>
instance. Note that this instance should be GCed if the definition ofOption
gets changed.Distribution of
TypeInfo
s acrosssource_id
sTypeInfo
s that should get garbage collected should be distributed towards the leafs of project and module dependency tree. This means, the types should have assigned, if possible of course,source_id
s of the files that are likely to be changed, and those are always the files in the projects that developers actually work on (the leafs), and not the files in the dependencies, especially the standard library dependencies.E.g., currently, if we have
fn my_function<A>() -> Option<A>
in the code we are editing (a leaf!), the non-shareableOption<A>
type will get assigned thesource_id
of the originalOption
declaration, the one from the standard library. Which means it will never be GCed, although it should be GCed whenever we GC the current module the programmer is changing.E.g., currently, all of the
TypeInfo::Unknown
types do not havesource_id
assigned, and not all of them getreplace()
ed in the engine. Out of ~41.000 insertedUnknown
s, some ~4.500 remain unreplaced, and, not havingsource_id
assigned, can never be GCed. This produces a constant small "memory leak" within theTypeEngine
.Time-performant garbage collection
TypeEngine
should, ideally, offer anO(1)
removal of files during the GC.E.g., before the optimization done in #6613, the
ConcurrentSlab
use to hold ~510.000 elements. All those elements needed to be traversed during GC, to remove just a handful of types. After the optimization, theslab
contains ~230.000 elements which speeds up the traversal, but sill has anO(n)
complexity.Architectural proposal
TODO: Write proposal.
The text was updated successfully, but these errors were encountered: