-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new builtin: @typeId #19858
Comments
One small request: it would be really nice if this returned |
It could also take inspiration from the |
There is actually an at-comptime solution for fn typeId(comptime T: type) u32 {
return @intFromError(@field(anyerror, @typeName(T)));
} Bad status quo solutions have helped back changes such as this one before so just wanted to share. :) |
@sno2 Using RLS here is a bit tricky. There are two options:
If this proposal is accepted, I definitely think the returned integer should have a fixed size (probably 32 bits). 32 bits is a sweet spot: 16 is too few to represent the number of types which might exist in a large application, but 64 (or usize) is very excessive (in fact, the canonical compiler implementation won't let you have more than 2^32 distinct types today, and I am beyond certain nobody will ever hit this limitation). |
I was more thinking of the compiler counting how many types we have defined and log2 that into an integer type as the Also, |
I can't think of a use case I would ever use this builtin for (it's a bit at odds with my fundamental design philosophy), but for everyone here who seems to have use cases:
That multiplies into a really big number. I would expect many use cases actually only use this builtin when serializing a rather small set of types (and maybe their fields' types, recursively) over particular interfaces. (Then again, maybe this is more of an ergonomics feature than a performance-oriented one? |
Could someone give a solid use-case for this, I have nothing in my head. And never came across situation I need this even remotely. |
The one reason I've wanted it in the past is for safety on const AnyPtr = struct {
type_id: usize, // alternatively, [*:0]u8
ptr: *anyopaque,
pub fn from(item: AnyPtr, comptime T: type, ptr: *T) AnyPtr {
return .{
.type_id = @typeId(T), // alternatively @typeName(T).ptr
.ptr = @ptrCast(@alignCast(ptr)),
};
}
pub fn readAs(item: AnyPtr, comptime T: type) *T {
if(item.type_id != @typeId(T)) unreachable; // alternatively `item.type_id != @typeName(T).ptr`
return @ptrCast(@alignCast(item.ptr));
}
}; |
Basically type checking (see linked any-pointer project) when doing type erasure, then see #19859 where you need to store a user-defined type in a non-generic datastructure (think |
As @MasterQ32 indicated in the original issue, his Of course, all of these issues can be solved with userspace hacks but:
To not use any hacks while obtaining unique type identifiers, you can do something like this, but:
In my opinion, any sort of RTTI-ish solution would greatly benefit from this builtin. I imagine Felix sees it the same way, thus why he opened this issue. About implementation details @rohlem, check out my PR to see how easy it is to implement from the InternPool. In short, the InternPool stores types (and other deduplication-dependent data like default values, memoized calls, etc. though this is not important for this explanation) by inserting them into a |
This made me wonder about how the technical implementation would solve something like this (I'm pretending like fn SelfReferentialStruct(comptime T: type) type {
return struct {
const Self = @This();
const array: [@typeId(Self)] u32 = undefined;
};
} edit: Nevermind. I just checked and there already is a check for similar transitive failures in the compiler. :) |
Am I correct that the idea os basically to split pointer to If that's correct, that's very interesting feature. But I would like to extend it even further. If it's stored separately, we can save this information to disk and restore back. But only if we have stable guarantee not only within one build. Do I'd to take into account this feature too with this proposal. |
After sharing my first terrible pub fn typeId(comptime T: type) u32 {
_ = T;
const fn_name = @src().fn_name;
return std.fmt.parseInt(u32, fn_name[std.mem.lastIndexOfScalar(u8, fn_name, '_').? + 1 ..], 10) catch unreachable;
} This one even exposes the |
This is not practical or even really possible. You can already assign explicit IDs to types manually, through a variety of methods, which is a much better option for serialization usecases. |
I don't necessarily care about having an ID for every single type in the program, because there are also going to be lots of comptime utility types for which it's not necessarily helpful to have an ID for. And I definitely don't need IDs for almost all of std. For the specific examples you cited, yes I would want all power-of-two sized integer types signed and unsigned (and some non-power-of-two, but not the whole 64k spectrum, that'd be a little crazy). Var and const pointers and slices for any used type, yes, arrays of varying lengths yes, optionals definitely. A u8 is definitely not enough to cover my needs, I can easily see needing at least a few thousand IDs to cover all the type variations (working from an "unadorned" set of 200-300 declared types). These would mainly be keys in tables, so even if they were pointer-sized that'd be fine by me. If I want a compressed type ID for sending over the wire or for cutting down on memory usage, I can do that in userspace on top of a builtin
This is definitely about ergonomics and "work scaling" (as in, scaling how many people are working on a codebase and reducing LOC needed to implement functionality) for me. A chonky type ID is the cost of doing business, so if it's straightforward to just use an ID from the InternPool and it's always u32 or u64 or w.e., that totally serves my needs. Using something abnormally large like u128 seems excessive (is Rust just using UUIDs? mehhhh) but I'd learn to live with it. The biggest deal to me is ensuring it's consistent between comptime and runtime. With the current method of taking the address of storage you have to add a wrapper function to not mess that up, it can be tricky if you're not just directly using someone else's userland |
For ref, here's how I handle keeping runtime and comptime consistent (the type id stays a pointer, and I have an extra enum type to convert to/from for smuggling purposes): pub const TypeID = *const Type;
pub const TypeIntID = enum(usize) {
invalid = 0,
_,
pub fn from(tid: TypeID) @This() {
return @enumFromInt(@intFromPtr(tid));
}
pub fn toTypeID(self: @This()) TypeID {
return @ptrFromInt(@intFromEnum(self));
}
};
pub const Type = opaque {
pub const id: fn (comptime T: type) TypeID = struct {
inline fn tid(comptime T: type) TypeID {
const TypeIDSlot = struct {
var slot: u8 = undefined;
comptime {
_ = T;
}
};
return @ptrCast(&TypeIDSlot.slot);
}
fn typeID(comptime T: type) TypeID {
return comptime tid(T);
}
}.typeID;
pub fn toIntId(self: *const Type) TypeIntID {
return TypeIntID.from(self);
}
}; I've been using it for a while, I think InK or Vexu helped me arrive at this based on ikskuh's typeId. |
Is the intention that the returned value is comptime or runtime? If the former, this essentially introduces undeterminism in the type system. I'd suggest to make the ID at least stable between compilations of the same source code, to ensure reproducibility of builds, and not forcibly require randomness here for debugging purposes. |
But what about microcontrollers, for example 8 bit? usize is universal. |
Ehhhh I'd be careful about the absolute language here. 64bit vs 32bit identifiers can totally show up in memory profiles depending on how they're used. Common memory optimization pattern: given two 32bit identifiers comprising some sort of combined category+id tag value, if you are reasonably certain you can mask & shift them onto a single 32bit field, and you have tens of thousands of instances of structs containing one or more of these, you're looking at multiple megabytes of memory saved. I care about saving multiple megabytes, and I'm not even writing for microcontrollers. Sometimes you need to save memory, and good targets aren't even the things that take up the most memory overall, but can give you a short enough haircut to meet your target without having to suffer other tradeoffs. If we know the type ID value space is dense and predictably fills up from the bottom, those kinds of optimizations are possible even if the width is usize. But I'd be inclined to treat the value space here as a black box unless somebody on the core team says otherwise. |
|
After some extra thinking, Im completelt against possible nondeterminism at compile time. For a runtime value I think that it's fine provided that the result is stable across compiler runs (ideally even across different computers). It would be fine if the ID can be spec'd somehow, but that would probably be quite hard / can be done just as well in user code. I suggest to make the return value runtime instead to avoid that issue. |
@Snektron Making the return value of a this builtin runtime-only would hinder major use cases for this feature which are currently already possible via userland hacks. My need for this feature is to be able to store type IDs at comptime while building lookup structures and then use those values later to cross-reference types and as keys in hashtables. Ideally I'd also like to be able to create lookup tables at comptime. If this restriction could be: the actual typeID values can be stored and compare for equality but are otherwise opaque until runtime, that's a restriction I could live with (that's the status quo for my current typeID hack), provided the stored typeIDs are fixed-up in the final stage of compilation to avoid the case where typeID equality stays consistent between comptime and runtime. It's easy to break that property currently with the userland hacks if you aren't careful. I can understand not wanting non-deterministic compilation. Where would the non-determinism come from when allowing these values to exist at comptime? |
If the value is determinstic across runs, compiler versions, etc, I dont have a problem with it. In Auguste's proof of concept, the value is derived from a compiler-internal type database key, and is depended on the order that the compiler processes the code. This is an implementation detail. My main concern is that when the compiler is parallelized further, this processing becomes unstable. @ikskuh pointed out to me that intFromError has the same issue (and this is also what makes the current hack work), and I think that's a problem too. |
That makes sense. I guess incremental compilation would also introduce non-determinism. I wouldn't really care about IDs shifting around between subsequent builds of the same source, but like I said I can see how that's a desirable property. I wonder how many uses of What about the constraint I mentioned: typeIDs can be checked for equality at comptime, but otherwise can't be observed or compared--including logging out the std.fmt representation, which could just be an opaque thing like That would be a big loss since it would mean we can't hash them at comptime, but it would at least provide a "blessed" version of the current hacks which wouldn't be at risk of breaking since it would be an explicit language feature. |
I really love that. It has the best of both worlds:
|
|
I need to compare and store those identifiers into const decls at comptime and then compare them later at runtime, so comparing type equality doesn't work. I need a consistent correlation that goes across the comptime/runtime boundary. |
At first sight Rust's approach of using 1.
|
Generating type ids using hash functions is definitely the way to go, because we get a stable interface between platforms. So, for example, if you export your library as a dynamic one, you can make an interface that provides a type id, and these type ids will match between different libraries. Here is the code I used: fn typeId(comptime T: type) u128 {
const Type = struct {
const id: u128 = result: {
const a = std.hash.Wyhash.hash(3832269059401244599, @typeName(T));
const b = std.hash.Wyhash.hash(5919152850572607287, @typeName(T));
break :result @bitCast([2]u64{ a, b });
};
};
return Type.id;
} |
That code isn't portable between projects, as |
It doesn't depend on the structure of your project, but on how other developers have structured their libraries. const Test1 = struct {
value: u64,
pub fn doSomething1() void {}
};
const Test2 = struct {
value: u64,
pub fn doSomething2() void {}
}; These types have the same structure, so by hashing the type representation we would get the same hash, but these are different types. |
Using a hash of the Imagine how many conflicts with Additionally, there may be library name conflicts, which may be resolved by giving dependencies or even the module imports different names in your build.zig.zon and your bulid.zig, so perfect cross-library type sharing can't be achieved solely through The use cases I have require reliably unique type IDs within a single compilation unit independent of file structure or build configuration. I also do use some I think the most reliable way forward for type compatibility/reconciliation across artifacts like you're looking for is probably a convention where you put some kind of UUID pub decl in the type or an exported function that returns a type's cross-lib ID or lets you look up cross-lib IDs (like COM but less irritating), and then that can be associated with the value produced from the proposed |
Another way of generating type id is by hashing the representation of a type But I would argue that this is not the way to create actual type ids, because for example creating structs with the same field layout but different names and functions would give the same type id. On the other hand, if we had combined the hash of a library with the hash of a struct and its relative path to the root file, we would get ids that are more unique in a sense. Another way to generate type ids is to keep track of the number of types currently being created by a compiler, I would call this a local id because it's local to your project and would have a different id in another project. This would not work for cross-library communication, and I would argue that you don't really need it. Instead, we can simply create an enum for the types we want to identify at runtime, and have it stored somewhere in struct. Or, if we want to automatically generate such ids during comptime, we would need global comptime variables to keep track of the ids (which we currently do not have). Why To create
They've never actually used field types to generate their type ids. We can test this by creating a struct, printing its type id, then changing the type of any field and printing the type id again, we should get the same id between compilations. |
After working on a codebase which necessitates local type IDs in the range of 500-1000+ distinct types with IDs, I have to strongly disagree that there's no value in the local type IDs. Local IDs are what the proposal here was explicitly discussing in the first place. If it's under 100 types I'd agree a manually maintained enum is good enough, but any more than that you are going to be more prone to maintainability issues, especially on larger teams. I agree checking the structural compatibility of types with hashing is not really the desire here, although it's useful on its own for different purposes like caching. Again, just to be clear: the scope initially presented here is explicitly not globally unique cross-build immutable type identifiers a la COM. The desire here was for a local type ID for uniquely identifying types within a single compilation unit, and for possibly building cross-compilation-unit runtime type identifiers if you combine it with an identifier for the compilation unit a type originates from. Given that Rust's attempts to globally uniquely identify every nominal type still results in some conflicts and associated hand wringing, it seems to me that it's probably not worth pursuing vs local type IDs. |
If we had global comptime variables, this wouldn't be an issue. We don't need to count all of the types to identify the specific types we use. If you're talking about ecs, this is probably also the case, in ecs we don't have to worry about types we don't use inside ecs, such as std types or other types outside the project. It's just that managing them via enum is not possible. |
you're not getting comptime variables |
Not quite truncating, but deeply-nested values within type names are replaced with Never rely on the output of
Yeah, no. These are rejected for a reason -- actually, for a lot of reasons. We're not sacrificing parallel compilation, incremental compilation, and a simple language specification, so that you can write messy and unintuitive comptime logic.
The draft PR implementation essentially does just that; and really, this or something like it is the only sane approach. Such a solution is necessarily implementation-defined, non-deterministic in a parallelized compiler, and unstable in an incremental compiler. If As @Snektron has said, we cannot have nondeterminism at So, if
However, I would like to pose a question to those who want this proposal. The status-quo solution in the original issue is almost sufficient; the issue is that it doesn't work at comptime. Well, since the type ID should always be an opaque value anyway, why not just use... an actual pointer? For instance: const std = @import("std");
const TypeId = *const struct {
_: u8,
};
pub inline fn typeId(comptime T: type) TypeId {
return &struct {
comptime {
_ = T;
}
var id: @typeInfo(TypeId).pointer.child = undefined;
}.id;
}
pub fn main() !void {
@compileLog(typeId(u8) == typeId(u8)); // comptime-known
@compileLog(typeId(u8) == typeId(u16)); // and works correctly
@compileLog(typeId(u16) == typeId(u16));
passAtRuntime(typeId(u8), typeId(u16));
}
fn passAtRuntime(a: TypeId, b: TypeId) void { // and you can pass them at runtime too
_ = a;
_ = b;
} |
I've been using this for a few years now. It feels like an unstable hack, but for platforms I care about, this is ultimately fine for my use cases so long as it keeps working and isn't later regressed to make the compiler go faster or something. The appeal of a Reposting edited points from discord for posterity:
And to further elaborate on why this is a tool worth having: Wanting some IDs to associate with types is a commonly helpful thing in large applications where part of the work you do is adding many many new types during development (think 100+ programmers all adding and extending new types every few months). Having some sort of type identifier to associate different type-related functions/data with some reified reflection data is helpful: serializers, editor drawers, subtypes of objects in undo/redo stacks. All my examples are game engine & tooling focused since that's my domain, but surely other major desktop applications in which users author complex data have similar problems and solutions. There is a lot of overlap there. Prior art in large games often revolves around building large preprocessing tools with macro tricks or IDLs or libclang-based tooling (e.g. Unreal Header Tool or clReflect). Having used a variant of that userland typeId in a Zig project combined with the language-native reflection functionality for the past few years, I've yet to see the need to introduce a preprocessing step for this purpose. This is a huge benefit of the language. Yes, this is like having an enum where you write all the types you care about, except the point is to not have to maintain an enum with 500-1000+ fields that's constantly changing. I've had enough problems with hand-maintained type id enums with fewer than 100 fields. |
This won't work as soon as you want to have no generics and using foreign packages that need to handle
@mlugg the problem with this solution is that it takes up space in If we chose to put a implementation into stdlib (which is what i'd say), we should modify it such that we store the |
Add a new builtin called
@typeId
:This builtin returns a unique integer for each type passed, and will return the same integer for the same type.
The return value must not be consistent inbetween builds, so a second build might return completly different numbers
for the same types.
An alternative variant might return
u32
oru64
to have a stable interface between different platforms.Use cases
Prior art:
User-land implementation
The following version is runtime only, as we can't perform intFromPtr at compiletime:
The text was updated successfully, but these errors were encountered: