You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sharing some thoughts / notes on a possible code smell in our compiler architecture.
Our compiler stores type information internally using the Type enum data structure. Throughout the compiler, we reference and pass around these Type structures (as well as some other data structures, like SymbolEnv and Namespace) fairly freely using raw pointers throughout the compiler. To facilitate this, we use a custom type named UnsafeRef:
TypeRef is defined as an alias to UnsafeRef<Type>.
The unsafe implementation of the Deref trait for UnsafeRef allows anyone to dereference a TypeRef value (e.g. *some_typeref) to access the underlying type. This pattern is generally "unsafe" in Rust since it's possible to write code like this:
In the code above, we create an UnsafeRef using a pointer to a local variable. t goes out of scope at the end of the function, so the pointer is no longer valid, and accessing it would be a dangling pointer dereference, causing undefined behavior.
The way we avoid this class of issue today is by convention always creating TypeRef objects through a helper method on the Types singleton struct. The helper method uses Box to store each Type object on the heap, and then stores all of these Box objects within a Vec field on the Types struct. An UnsafeRef object with a pointer to the value on the heap is returned. We also ensure that the Types singleton is always kept in scope through the entire life of the compilation or language server - we don't let it go out of scope. By pointing to objects on the heap that we never deallocate, we've guaranteed that all of these raw pointers will always continue to be valid.
The downsides of this approach, besides using unsafe, include:
No way to free memory for types that are no longer being used or referenced. This is most relevant for the LSP.
Potential for data races if in the future we wanted to parallelize work in the compiler with threads.
It may harder or impossible to implement optimizations like type interning. (Today, a new Type may be allocated on the heap for each Array<str> modeled in a user program. Interning would allow these to be de-duplicated.)
Pragmatically, this isn't a pressing issue for the compiler today, but it's worth keeping an eye on.
One alternative approach could be to use Rc<Type> instead of UnsafeRef<Type>. Rc reference counts usages of the pointer so that the underlying Type would be automatically deallocated when it reaches 0. But an obstacle implementing this today is that we have many places throughout the compiler today where it's expected to be able to mutate the internal structures referred to by Type (see as_class_mut, as_struct_mut, and update_known_inferences as examples). Rc only makes its contents readable, so we'd have to find ways to avoid these mutations. (For example, perhaps a unique StructId is stored inside Type, and the actual contents with a list of mutable fields is stored in a separate side table).
The text was updated successfully, but these errors were encountered:
An option would be to avoid pointers and instead use an ID to reference the data. Normally this isn't a super safe idea but luckily compilers have very static(-ish) lifetimes.
The simplest approach would be some arena built around storing items in a list. See id-arena for a nice example of that. It's both very simple and fast, although it still has the downside that individual elements can't be freed. That's where things get complicated.
use slotmap::{new_key_type,SlotMap};new_key_type!{pubstructTypeId;}#[derive(Debug,Eq,PartialEq)]pubenumType{String,Number,Option(TypeId),}fnmain(){// Create a new arena and keep it alive throughout compilationletmut types = SlotMap::<TypeId,Type>::with_key();// Allocate typeslet number = types.insert(Type::Number);let optional_number = types.insert(Type::Option(number));// Sometime later, retrieve an `Type` from the arenaifletType::Option(mut outer_type) = types[optional_number]{// The types are mutable herematch&mut types[outer_type]{Type::Number => println!("Number"),
_ => println!("Not a number"),}}}
Sharing some thoughts / notes on a possible code smell in our compiler architecture.
Our compiler stores type information internally using the
Type
enum data structure. Throughout the compiler, we reference and pass around theseType
structures (as well as some other data structures, likeSymbolEnv
andNamespace
) fairly freely using raw pointers throughout the compiler. To facilitate this, we use a custom type namedUnsafeRef
:wing/packages/@winglang/wingc/src/type_check.rs
Lines 57 to 83 in 9176564
TypeRef
is defined as an alias toUnsafeRef<Type>
.The unsafe implementation of the
Deref
trait forUnsafeRef
allows anyone to dereference aTypeRef
value (e.g.*some_typeref
) to access the underlying type. This pattern is generally "unsafe" in Rust since it's possible to write code like this:In the code above, we create an UnsafeRef using a pointer to a local variable.
t
goes out of scope at the end of the function, so the pointer is no longer valid, and accessing it would be a dangling pointer dereference, causing undefined behavior.The way we avoid this class of issue today is by convention always creating
TypeRef
objects through a helper method on theTypes
singleton struct. The helper method usesBox
to store each Type object on the heap, and then stores all of theseBox
objects within aVec
field on theTypes
struct. AnUnsafeRef
object with a pointer to the value on the heap is returned. We also ensure that theTypes
singleton is always kept in scope through the entire life of the compilation or language server - we don't let it go out of scope. By pointing to objects on the heap that we never deallocate, we've guaranteed that all of these raw pointers will always continue to be valid.The downsides of this approach, besides using
unsafe
, include:Array<str>
modeled in a user program. Interning would allow these to be de-duplicated.)Pragmatically, this isn't a pressing issue for the compiler today, but it's worth keeping an eye on.
One alternative approach could be to use
Rc<Type>
instead ofUnsafeRef<Type>
.Rc
reference counts usages of the pointer so that the underlying Type would be automatically deallocated when it reaches 0. But an obstacle implementing this today is that we have many places throughout the compiler today where it's expected to be able to mutate the internal structures referred to byType
(seeas_class_mut
,as_struct_mut
, andupdate_known_inferences
as examples).Rc
only makes its contents readable, so we'd have to find ways to avoid these mutations. (For example, perhaps a unique StructId is stored inside Type, and the actual contents with a list of mutable fields is stored in a separate side table).The text was updated successfully, but these errors were encountered: