-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: Uuid data type #86084
Comments
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and motivationThere are 2 ways to organize the binary representation of Uuid:
Currently, the .NET Base Class Library only includes System.Guid. This is a data structure that implements the second method of binary representation. If you use From this, it follows that calling the constructor with a byte array or with a string also produces different results. This can lead to situations where, for example, the log records the string representation, but the database stores the binary representation. And if you decide to find an object whose identifier you saw in the logs, you need to perform the same conversion that is done inside The above examples demonstrate the difficulties in working with That's why I suggest adding a data structure called API Proposalnamespace System
{
[StructLayout(LayoutKind.Sequential)]
public readonly struct Uuid
: ISpanFormattable,
IComparable,
IComparable<Uuid>,
IEquatable<Uuid>,
ISpanParsable<Uuid>,
IUtf8SpanFormattable
{
public static readonly Uuid Empty;
public Uuid(byte[] b)
public Uuid(ReadOnlySpan<byte> b)
public Uuid(string u)
public static Uuid Parse(string input)
public static Uuid Parse(ReadOnlySpan<char> input)
public static bool TryParse([NotNullWhen(true)] string? input, out Uuid result)
public static bool TryParse(ReadOnlySpan<char> input, out Uuid result)
public byte[] ToByteArray()
public bool TryWriteBytes(Span<byte> destination)
}
} And also all interface methods, comparison operators, equality operators API Usagevar input= new byte[]
{
0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77,
0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF
};
var uuid = new Uuid(input);
var uuidBytes = uuid.ToByteArray(); // output: 0x00,0x11,0x22,0x33,0x44,0x55,0x66,0x77,0x88,0x99,0xAA,0xBB,0xCC,0xDD,0xEE,0xFF
var uuidString = uuid.ToString(); // output: 00112233445566778899aabbccddeeff Alternative DesignsNo response RisksNo response
|
Why can't//shouldn't this just be an explicit set of APIs on This feels like adding a nearly identical type for slightly different semantics that really only matter at the serialization/deserialization boundaries. |
Great question! That's exactly where the problem lies! For instance, the deserializer can parse a
Depending on the way the value was deserialized, the contents of the
For instance, if I need to use such a value as a parameter in an SQL query, I must also know how the database driver converts a An example of this is the MySqlConnector, which has a setting in the ConnectionString that can take 7 (seven!) different values. Depending on how serialization and deserialization are actually implemented, I as a developer am obliged to use different values of this parameter. When using Guid as a primary key in a table in MySQL, the byte order in such a column is extremely important. The first 8 bytes must be reversed time-low and time-high parts of UUIDv1, which ensures the monotonic increase of the primary key. To achieve this, MySQL 8.0 even added a special function called UUID_TO_BIN. If you work with a table containing tens of billions of records and start sending Guid bytes in the wrong order, it would be faster to buy a new server and deploy a backup of the database on it, rather than waiting for the BTREE to be rebuilt whose balance was disrupted because some serializer started deserializing Guid differently. The situation becomes even worse if the service uses several different serializers that handle By using the proposed data structure with the same string and binary representation, it is impossible to make a mistake because regardless of what is input - hexadecimal string or Span/byte array - the output will be the same. In the scenario described above, using UUID eliminates the possibility of doing something wrong. |
How will On top of that we have like 100 different data readers and writers, 10-15 year old, and proprietary formats that read Guid either big endian, or little endian, or component wise and each component big/endian-endian ... in short it's a mess. So, I understand it can be problematic. I lost a lot of hair over it. Usually, I just treat it as a 128-bit number and don't even bother with its representation. It's a number, nothing else. I usually nowadways use MemoryMarshal.Cast to directly pull the value as a 128-bit numeric value, and keep it there. But, how will Uuid solve all this? The way Guid is serialized is fixed and defined in dotnet. It doesn't affect what other languages, frameworks, systems do. It also adjusts for system endianness to get the same result on big- and little endian machines. Within the dotnet ecosystem. But overall, it is not uniquely defined how such a 128-bit number is represented as a string. At most, it is recommended.
Yes, that is correct. And if you use another database driver, the expected string representation will be different. Then you would need UuidEx and UuidExEx and Uuid3 and so on. Maya uses he first dword as little-endian, the 2nd and 3rd word as big-endian, and the final 12-bytes as 3x dword little-endian. Houdini has the whole 128-bit value swapped (full big-endian). There are 100, if not 1000 different ways to present that 128-bit value as a string. And rest assured, there is at least a dozen software implementations out in the world for each combinatorical serialization. Also, in many scenarios, a generated Guid/Uuid must be sent through a seeded hasher. It cannot simply contain a timestamp or (as Microsoft once did many decades go) contain the MAC address of a physical network card or any other identifiable characteristics. It's a security and privacy risk. |
See also #53354 |
The data structure described in this API Proposal is simply a container for data. Its API implies working in a 'what came in is what came out' format, regardless of whether binary data or strings were provided as input. In this case, you can be guaranteed that the original byte order will always be preserved, regardless of how this data structure was constructed.
This is a great abstraction when it doesn't matter how exactly the content of Guid is represented. Or when the software is completely controlled by one team. If you alone own all the data throughout its lifecycle - then you control everything. But that's not always the case.
Yes, but the problem is that as a developer, I need to know exactly how (de)serialization occurred. Because depending on the chosen way of (de)serialization of Guid, the data in it may differ.
No, I won't need another data type, since the proposed API, as I mentioned above, works on the principle of "what goes in, comes out." This provides the ultimate opportunity to construct such a data structure from both binary and string representations without worrying about what transformations are happening and where they are happening. Because they simply won't happen.
Uuid implies the interpretation of input and output data as a sequence of 16 bytes. This is exactly what is written in RFC4122. The way the contents of these 16 bytes can be interpreted by specific software, either as int/short/short/byte/byte..byte or in some other way, and whether to use big-endian or little-endian, is not important for Uuids. It is simply a container with data. A black box that will deliver data from the sender to the receiver intact and unaltered, without interfering with the contents. Because Guid is not such a black box. Guid is a box that needs to be packed and unpacked from the same side if there is a need to get the same data that was originally packed into it.
I'm not suggesting any specific way to generate Uuid content. This is not in the API Proposal. I'm suggesting adding Uuid as a container for data that has the property of "what goes in, comes out". The .NET community is able to write generators on their own or use existing ones (which many probably already have). |
@huoyaoyuan Thank you for the link! The current API Proposal may be a way to solve the problem described there. |
@vanbukin .... I'm sorry, I still really don't understand... for me System.Guid is also just a data container that stores 16-bytes of data in some way that has no real relevance outside. A new Uuid in effect would use Int128 (or sixteen Int8), instead of one Int32, two Int16 and eight Int8. But today it's not possible to access these internal fields in Guid anyway. It's a black box. If there weren't the problem of legacy binary serialization that addresses internal fields by name, Guid would use an Int128 as internal storage field today.
So does Guid. It is stored in internal fields that are of different size but that is internal logic not exposed to consumers. A round trip from a blob (binary large object) to Guid back to blob results in the same sequence of bytes. The issue is the Guid to String representation. But that is done by different systems and different frameworks in different ways, fully unrelated to dotnet. the byte sequence 11 22 33 44 55 66 77 88 99 00 AA BB CC DD EE FF
I just don't understand how this new Uuid struct would solve or address the different ways of presenting the byte sequence as string (make it human readable). Or how it would prevent human error. If any database is storing Guid/Uuid in its own mangled binary format, and the command issued is text based via SQL, how does the new Uuid class know how to format that string to prevent human error? I do understand tho that Uuid would represent the string in a different way, but that could also be achieved by adding a different ToString() format specifier (same way DateTime formats time, RFC3339). |
This is not just a container, it is a container where the same value can only be obtained if it is extracted in the same way it was placed there. For example, the result of the ToString call will be written to the logs. When deserializing JSON, the property containing the Guid will be parsed as a string value. In this scenario, we construct a Guid based on the string from JSON and serialize it back to a string representation when writing to logs. Therefore, the value we passed in JSON and in the logs will be the same. But if we access the database, read 16 bytes from there, put them into the Guid constructor, and write such a Guid to the log, or take the Guid obtained from JSON and write it to the database in binary format, the values will start to differ. As a result, there is a situation where there is a value in the logs, but there is no exactly the same value in the database. This can be overcome by using roundtrip, which will rearrange the bytes inside the Guid in such a way as to match either the string or binary representation of the source. Earlier I mentioned how this situation becomes worse when you have both string and binary representations as your data source.
That's where the problem lies. The string and binary representations differ from each other. And not only on output, but also on input. It would be helpful to have a built-in data type in .NET where these representations match. And that's exactly what I suggest adding to this API proposal.
Having such a data structure, you don't have to worry about which representation to take as the source of truth - the string or binary one.
The issue is not only with ToString, but also with TryParse. |
Let's take a look at Java and invoke both constructors (from string and from byte array) of a similar data structure import java.lang.reflect.Constructor;
import java.lang.reflect.InvocationTargetException;
import java.util.HexFormat;
import java.util.UUID;
public final class Main {
public static void main(String[] args) throws InvocationTargetException, InstantiationException, IllegalAccessException, NoSuchMethodException {
Constructor<UUID> constructor = UUID.class.getDeclaredConstructor(byte[].class);
constructor.setAccessible(true);
UUID a = UUID.fromString("00112233-4455-6677-8899-AABBCCDDEEFF");
UUID b = (UUID)constructor.newInstance(HexFormat.of().parseHex("00112233445566778899AABBCCDDEEFF"));
System.out.println(a);
System.out.println(b);
}
} To build and run javac Main.java
java --add-opens=java.base/java.util=ALL-UNNAMED Main And as a result, we will get the following output.
Regardless of whether the input is a byte array or a hexadecimal string representing those bytes, the output will always be the same. |
Similar situation also exists in GoLang. Regardless of what was entered - bytes or a string with their hexadecimal representation - the output is the same. |
Similarly, in Python 3. import uuid
a = uuid.UUID('00112233445566778899AABBCCDDEEFF')
b = uuid.UUID(bytes=b'\x00\x11\x22\x33\x44\x55\x66\x77\x88\x99\xAA\xBB\xCC\xDD\xEE\xFF')
print(a)
print(b) We will get in the console. 00112233-4455-6677-8899-aabbccddeeff
00112233-4455-6677-8899-aabbccddeeff If we want to get behavior similar to how Guid behaves by default, then we must explicitly specify bytes_le.
Then the console output will change to the following. 00112233-4455-6677-8899-aabbccddeeff
33221100-5544-7766-8899-aabbccddeeff |
It looks like there is a common use of a data structure that works on the principle of "what goes in is what comes out" everywhere, but in .NET there is only Guid, which was originally a structure necessary for interop with COM/OLE/WinAPI, but due to the lack of alternatives, has become widely used. |
There are multiple variants, multiple versions, and ultimately multiple layouts for the different The fundamental issue called out here is effectively in how For The APIs that care about the difference in how the raw bytes are represented are largely things that are interoperating outside of .NET and thus are being used as a form of serialization/deserialization. Such APIs already must consider that the bytes might be interpreted as a different format on the consumer side and so must already take into account things like endianness. The same is true for all primitives, for complex structures where padding may differ, endianness, etc. Given that, this would be solvable in the same way we already solve it for the primitive types or other types such as I do not see the need to introduce an entirely new type just to handle this minor difference in endianness that is only relevant when serializing/deserializing raw bytes. We do not do this for any other type and it is a non-issue for other types in general. I would be fine with simultaneously proposing the obsoletion of |
That is precisely why I am proposing a data structure that effectively serves as a container for data, functioning on the principle of "what goes in is what comes out." The interpretation of the values will remain on the side that accepts such values. It is for this very reason that this API Proposal does not include APIs specific to any of the Uuid variants described in any of the RFCs. And for the same reason, I am not proposing to add any algorithms for generating any of the variants or verifying that the data contained in the Uuid is Uuidv1, Uuidv4, or anything else.
However, modern development involves the integration of various components with each other. There are not many applications where absolutely everything is written only using .NET - without using databases, without using third-party native libraries (or wrappers around them), without integration with any third-party services, as well as without using RPC or something similar.
Indeed, the data may exist in binary form - in files, databases, or base64 strings. The presence of a constructor that accepts an array of bytes as a parameter or a method for converting the content to an array of bytes is convenient, appropriate, and necessary in such scenarios. The data source is not always a string. Also, the data source is not always an array of bytes. That is why I suggest that regardless of whether the source was a set of bytes or a hexadecimal string representing their value, the resulting binary or string representation should match the original data regardless of the data source.
This does not negate the need for me as a developer to know exactly how the structure was constructed - whether through string parsing or through a constructor that accepts an array of bytes. I don't even want to think about it. But I am forced to do so because of the way the Guid API works. However, I understand that there is an enormous amount of software out there, and therefore breaking the API is not an option. That is why I suggest adding a new data structure.
Any integration with an external component always involves (de)serialization. .NET does not exist in a vacuum - it integrates with countless other components in a huge number of different ways. Adding an API that explicitly specifies the byte order used implies that the authors of libraries in the .NET ecosystem will need to provide an API for controlling this behavior. However, whether they will provide it or choose a standard value without the ability to change it is entirely up to them. After all, this is a task that requires effort and time, and not everyone may be willing to do it. Having a data container that works on the principle of "what goes in is what comes out" does not require any alternative ways of working with it. If you need to shuffle the bytes, you can do so in a pre-allocated buffer. The method of creating such a container is not important - whether it is a hexadecimal string or an array of bytes - the result is always deterministic and the same. |
NB Similar situation in Rust use uuid::Uuid;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let bytes = [
0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee,
0xff,
];
let str = "00112233-4455-6677-8899-aabbccddeeff";
let bts = Uuid::from_bytes_ref(&bytes);
let stb = Uuid::parse_str(str)?;
println!("string from bytes:");
println!("{}", bts.hyphenated().to_string());
println!("bytes from string:");
println!("{:X?}", stb.as_bytes());
return Ok(());
} which produces
P.S. Note that all other platforms use UUID designation. If someday .NET provides UUID v5/v6/v7 generation, it will be less surprising to find such methods in |
I now understand what you want to achieve, you want a BigEndian version of Guid. I agree, that is useful.
But this statement doesn't make sense to me, I don't understand what you want to say. var A = new Guid("11223344-5566-7788-9900-AABBCCDDEEFF").ToString().ToUpperInvariant();
var B = new Guid(new byte[] { 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0x00, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF }).ToByteArray();
var C = new Guid(0x11223344, 0x5566, 0x7788, 0x99, 0x00, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF).ToByteArray();
though... quote "the resulting binary or string representation ... match[es] the original data"
But that's not specific to Guid. That applies to unsigned integers as well var A = (uint.TryParse("11223344", out var V) ? V : 0).ToString();
var B = BitConverter.GetBytes(BitConverter.ToUInt32(new byte[] { 0x11, 0x22, 0x33, 0x44 })); Both round trips work. Still A and B are different but visually look the same. For numbers, the string representation of a binary data source is arbitrary. It can be decimal, hexadecimal, octal, binary or hexavigesimal (26, a-z). Or Base64. And if someone fails to specify the used representation, it can cause problems anywhere. Like with SQL statements when you like generate a string this way: I think the misunderstanding I have is that you believe
is the same; but it just isn't. One is an array of characters, and the other an array of bytes. There are 2 mappings applied; the first is a position-independent byte to double-char mapping, and the second is a position to position mapping. The string length is 32+4 characters, the byte array length is 16. You want a Guid implementation, named Uuid that has a 1-to-1 position mapping, and I agree that is very useful. But, "need for me as a developer to know exactly how the structure was constructed" would not change with Uuid in any way. You always need to know what the definition in any platform is between string and binary. Uuid would also not prevent human error. |
This doesn't work because it fails
Yes, and that's all the more reason to not bifurcate the ecosystem with an identical type that only differs in behavior at the serialization boundary.
How the data is represented internally doesn't matter. Just as it does not matter for What matters is that the producer/consumer contract is followed. If the database requires Bytes are reversed to follow the data contract all throughout the computer. This is true for network packets, for interpreting file metadata (ZIP, PE, ELF, JPG, PNG, even UTF-16 text, etc). You must follow the contract at the boundaries, not doing so is a bug. Having the APIs to ensure the data is emitted in the desired format and read in the intended format, regardless of how the data is represented in the type system, is how everything else works. |
Great! I suggest looking at this from a slightly different perspective. I propose considering constructing from a byte array and converting back to a byte array as binary serialization. And constructing from a hexadecimal string and converting back to a string as string serialization. Now let's take a look at the situation through the eyes of a developer who is not familiar with the nuances of Guid behavior. For example, he needs a UUID because the UUID is used as a primary key in the database he is working with. He goes to Google and finds a recommendation that for working with UUIDs in .NET, use System.Guid. At this point, the association System.Guid == UUID arises in his mind. Okay, let's say he needs binary serialization - he looks into the API documentation and finds the constructor that takes bytes, and the ToByteArray method that returns them. Perfect. Now let's imagine another developer who needs string serialization. He finds the constructor that takes a hexadecimal string and the ToString method that returns it. Wonderful. And both of them don't have any problems until the first one needs to convert his data to a string, and the second one to a byte array. But the reason is not that they are doing something wrong, but because Guid does not consider the data it works with as 16 separate octets described in RFC4122, section 4.1. It considers them with respect to its internal structure, which is not 16 separate octets, but rather an int, 2x short, and 8 bytes. As a result, Guid provides an API based on its internal structure. In fact, the constructor that takes bytes expects the bytes to be pre-shuffled - that's exactly what it's designed for. But this is not obvious from the documentation. This can only be learned by looking at the source code of System.Guid, where it becomes clear that, for example, the constructor performs a reinterpret_cast-like operation on the input byte array using the MemoryMarshal.Read method (but with respect to endianess, with a fallback implementation for big-endian). The ToByteArray method essentially allocates a result array of 16 bytes, performs a similar reinterpret_cast-like operation on it, interpreting it as a Guid, and then copies the value of the current Guid into it. The resulting array is then returned from the method. Essentially, this is a dump of the Guid structure (but with respect to endianess, with a fallback implementation for big-endian). That means the calling code must take these nuances of Guid's operation into account. You cannot simply take 16 bytes passed from outside and work with them. If you use the constructor from a byte array or conversion to a byte array – you need to prepare the input and output binary data before passing them outside your application. From all of this it follows that both developers in the example above are mistaken because Guid is not equivalent to Uuid. It cannot be used as a drop-in replacement. It is a data structure designed to solve specific tasks. And that's okay. But what should they do to solve their own tasks? There are several options:
All of the listed options have drawbacks. Option 1 is precisely the place where the human factor can come into play, when someone fails to perform the preliminary conversion and hands over the data 'as is'. Option 2 will only work in small projects where everything is under your control. In large projects where dozens of teams work with one solution, we end up with a hybrid of options 1 and 2, with the drawbacks of the first option. In option 3, instead of 16-byte structures in memory, we store strings of 32 characters or byte arrays that require comparators. In the case of string comparisons, for example, case sensitivity is important. In option 4, our scalability is limited by the extensibility provided by the ecosystem, and the existence of corresponding APIs depends entirely on the willingness of the owners of these projects (for example, the API of the library may prevent the creation of an alternative implementation due to the presence of internal access modifiers on its classes). In this case, it is necessary to communicate with the author of the library, trying to prove the necessity of such an API, and if he agrees, prepare a PR. Alternatively, if time is pressing or the architecture of the library does not allow for implementing the required changes in a reasonable amount of time, a fork must be made, which needs to be built, released, updated, and maintained independently. |
I'm not sure that it will be useful, but this is a haskell code that has the same behavior:
output: |
Yes. Perhaps I poorly formulated my sentence about the changes. These are popular languages that compete with .NET overall and C# in particular (although I doubt about Haskell). We have, without exaggeration, the best development tools in the world, a powerful ecosystem, and a large community. However, in all these languages, there is a special data type for working with UUID in the standard library (or in the package that is de facto standard for this language), but we do not have one. Think about a developer who decided to switch to the .NET ecosystem from any of these languages. Most likely, he will not immediately encounter the problems I am writing about, but they will become a very unpleasant surprise for him, leaving not the best impressions of the platform in which, for 21 years, such a data type could not be made part of the standard library. |
This is true of all bytes for all types.
Yes, again true for all of our types. You cannot take raw byte data and simply construct a type out of it without accounting for where that byte data came from and the format it was in.
Right.
This is akin to saying that
Another option is the one I've proposed and which we already know works well for other primitive types. You have 1 type, You then deal with |
Put in a slightly different perspective. Let's take this entire conversation and replace To me, the ask for |
This does not in any way hinder other programming languages. Why should this hinder us?
Any interaction with the external world always involves (de)serialization. .NET does not exist in a vacuum, it interacts with countless components in many different ways. Generally, this works the same way everywhere, except in .NET. We have widely adapted System.Guid to work with such data, which was inherited from COM, and several alternative solutions, none of which completely solves the problems that arise when using System.Guid as a replacement for Uuid.
I completely agree. However, the problem is that Uuid is defined in the specification as 16 octets, so the public API for working with it should interpret input and output data as 16 octets. System.Guid is used in the .NET ecosystem as a replacement for Uuid, but not all methods of its public API interpret input and output data as 16 octets. Therefore, it cannot be used as a replacement for Uuid. That's why I suggest adding Uuid. In other words, I suggest adding a data type whose public API behaves similarly to how it behaves in other languages.
Yes.
If you look at the bigger picture, beyond just working with the database, what will help prevent the author of a serializer from using the binary constructor or ToByteArray in Guid? What will prevent suppressing a warning on obsolete? |
Everything changes drastically when the user is not you, but the author of the library you are using. I have described what happens next above.
Then why not lay down a red carpet in the form of a separate data type, which exists in any other popular language? |
Because the path, to me at least, is not as gold or red carpet as it may appear to you. To me, adding a new type (especially to the system namespace) is very "expensive" and already a hard sell to API review. Add on top that the type is basically just a minor semantic difference in two methods as to how Then I have to consider how users (new and old) might interpret or misuse the type and that they may misuse it in the same way that people are experiencing bugs around To me, it looks shiny at a glance but quickly becomes fraught with much deeper issues/concerns that are hard to justify in favor of simply improving the relatively minor issues with our existing type. |
I understand that adding such an API and even fixing places where it would be much better than Guid is an extremely large, complex and time-consuming task.
They are similar, but designed for different purposes. Guid is an excellent data type for interop with WinAPI / COM / OLE. It is the best for solving such problems. But why was it created in the first place if, say, an array of bytes could be used instead? And when working with Guids, the user experience is worse than in other languages.
Is this perhaps a technical debt that we should start paying off before it's too late? If Uuid doesn't appear, nothing will change. |
Not only user experience, server performance matters too. Guid in database is plain useless beyond hello world scale databases, as a table with guid primary key does not scale beyond 100k records. The reason is simple - read queries are mostly time based, and the lookups for primary keys for random (Guid.NewGuid()) values scatter all over the index, so there will be constant cache misses, every record in a time-based query will end up in cache miss, which leads to a buffer flush. I would say - Guid should be deprecated by that reason alone. Arguably you can't build anything scalable with it. |
In this case, the proposed difference between For other types in the BCL such a difference is handled via explicitly named helper APIs. For primitives such as
That's not true. Simply exposing the new APIs I suggested above will introduce the necessary stuff for people to do the right thing and should be overall more discoverable as its on the existing surface. Such APIs would be polyfillable downlevel so the same solution could work on codebases targeting .NET Standard or .NET Framework. I have seen no evidence given so far that updating |
That's why I suggest not touching it at all. There are code bases where it works perfectly fine and there is no reason to introduce any breaking changes there.
Absolutely agree.
Okay, I suggest taking a look at a very similar situation in the world of .NET. Date and time. What prevented end-users from using DateTime without specifying the time and TimeSpan to emulate the functionality provided by recently added DateOnly and TimeOnly? After all, existing data structures and their APIs can be reused to solve the same tasks. Because user experience matters. Using existing DateTime and TimeSpan to solve a range of tasks is not convenient. Let's check the announcement where DateOnly and TimeOnly were introduced.
Type safety. That's what is mentioned as the first item in the list of advantages for DateOnly. Let's see what is written about TimeOnly:
And again, the main reason is type safety: But it could have gone a different way, for example, by adding a couple of new APIs to TimeSpan and DateTime, so that people could continue to use them to solve tasks related to situations where only the date or only the time within a day is needed. But instead, separate data types were created. Why? For the sake of convenience. .NET Standard and .NET Framework? |
The current GUID type is, indeed, pretty messy w.r.t. (de)serialization. By adding a new API based on same GUID with scary words like The idea of a new UUID type as a pure wrapper around 16 bytes with no legacy burden, no scary words in the API and sane (bear with me here: by "sane" I mean portable between languages, databases, runtimes and various schools of thought) byte order is a must for a modern successful programming ecosystem (and I believe we all in this thread share the same opinion that .NET is and should be this kind ecosystem in the future). We are in a nice position because it is still possible to distance from the whole word "GUID", add a new nice UUID-based API and deprecate UUIDs are pretty ubiquitous in modern software (and I truly believe anything becomes slightly better after it gets its own UUID), so it may be a good idea to start radically improving the corresponding .NET API for better everyday use in mixed-tool environments (pretty much every environment these days). |
After further discussion with other API review members. The general consensus is that introducing a new type is undesirable and improving the existing type is the general way we approach these scenarios in .NET. As such, #86798 has been created and represents the general direction we'd like to move forward in this area. This comes about for many reasons including that all of the concerns raised around confusion and chance for user-error are not exclusive to the proposed improvements for We believe that versioning |
@tannergooding Thank you for not listening to the community and deciding to create an inconvenient API. |
As indicated, I discussed this in depth with several other API review members. The general consensus is that a new type is ultimately more inconvenient and introduces more problems than it solves. Having a new type to represent the same thing, only differing in how bytes are serialized/deserialized on boundary conditions is not something we do in .NET. |
It is challenging to imagine something that is "ultimately more inconvenient" than the current API in System.Guid. |
If you are working with binary serialization, endianness is not a concept you can just ignore. I think it would be a bad idea to attempt to hide this from the API consumer -- that ambiguity is partially what caused the serialization of |
There will be a publicly available live-streamed discusison where community can join in via Youtube when the other proposal goes to API review. The offline discussion was one simply to determine the general premise and whether it was worth marking this proposal as "ready-for-review" or if it we instead wanted to solve this the way I proposed above based on my many years of experience working on the libraries team and with API review around what API review will and will not approve; and what is and is not recommended by the
This comment is insulting and not being made in good faith. Not only have the general discussions on the pro's/con's been made in public here, but various other API review members have given thumbs up on the comments I made above and the general summation of the two points was given in the new proposal, including linking to this thread to ensure context was not lost.
Endianness is not "scary formulations". They are a basic concept that must be considered across a myriad of stacks and technologies. This ultimately comes down to:
Edit: The spec does largely detail itself following variant 1 and describes it as "network order". With most of the callouts to variant 0/2 being noted as |
To maybe give just a tiny bit more clarification on why this proposal was not considered the right approach What you've effectively asked is that the .NET BCL expose: public readonly struct Guid{ }
public readonly struct Uuid { } You could give these any number of names: public readonly struct UuidLittleEndian { }
public readonly struct UuidBigEndian { }
public readonly struct UuidVariant1 { }
public readonly struct UuidVariant2 { }
etc The This is not how .NET exposes types in the BCL today, and its not something that we want to do moving forward either. We want to grow and expand existing types to support new scenarios instead. |
I apologize if my statements appeared to you as not being made in good faith.
However, it will be a different API Proposal, not the current one.
Not only I, but also my team, my colleagues, many of my acquaintances and even strangers, have many years of experience using these libraries in production, created by the libraries team. That's why this API Proposal exists.
Thank you for leaving a link to the current API Proposal in the new one, so that the context is not lost.
This may hold significance when working with native APIs. However, it is not something that an average developer would want to be concerned with in a managed environment while developing on .NET
As we discussed before, this structure must be implemented "somehow" internally, and the API should provide methods that give consistent string and binary representations. In the case of binary representation, it should be an array that corresponds to the string that was passed as input.
It is rather unfair to close an issue and instead open the one that you consider better (and immediately label it as "api-ready-for-review"), based on your own experience, without even allowing the current API Proposal to be reviewed in the public API Review. Afterward, appealing to the alternative API Proposal opened by you as an argument for why the current API Proposal should not even be considered.
Uuid is usually used as a primary key in databases, and in this particular case, it is really important because it is a data structure that serves as an identifier of an entity and can go through hundreds (!) of serialization and deserialization iterations during its lifecycle.
That is an indicator that such a data type should have been made 20 years ago, but it has not happened yet. However, if it were to appear, it would be a reason to move forward and start addressing the technical debt that has accumulated over these two decades.
Yes, that is the primary use case of this data type
This would only be relevant if we are reading or writing the binary representation of this data type 'as is.' However, the specific implementation of the public API for this data type does not have to be implemented in that way, especially in a managed environment. |
Whether you are working with native APIs or not doesn't change the fact that endianness is a fundamental part of binary serialization, so I'm not sure why native APIs are relevant here. Endianness is a well-documented concept that is exposed all over the .NET API surface for binary serialization. |
The managed environment tries to hide the nuances of dealing with endianess from us. |
This is not true. Endianness is not hidden from you in C# and .NET any more than it is in C, C++, Rust, etc. The proposed If you are serializing to binary, you NEED to agree on the endianness on both sides: serialization and deserialization. Being explicit about the endianness is how you ensure that consistency is maintained here. |
API review explicitly discusses alternatives and linked issues. We will not review the other proposal without bringing up the fact that users originally asked for this. Many API reviewers are also already familiar with the context, as per my callout of the internal checks before I closed this issue in favor of the new one.
That is not how any type across the BCL works. Binary layout and string layout are not consistent and explicitly do not match by default for the vast majority of hardware that exists. Most modern machines (x86, x64, Arm32, Arm64, RISC-V, etc) are exclusively or at least primarily
That is not how API review in .NET works. It is the responsibility of the area owners, me in this case, to make an initial determination on whether something is even worth bringing to API review in the first place. We get literally thousands of API proposals, from all kinds of users, covering all ranges of scenarios. It would be impossible to truly review them all in depth. This was a case where I had my own initial feeling of how API review would react and given the number of users asking for it, I did an initial offline check to confirm my suspicions. The new proposal was then opened based on the feedback from the API review members that a new type is indeed something we would not be willing to do; particularly given the scenario involved, how .NET has handled similar scenarios up until this point, etc.
Yes, which also means that it could pass through many APIs in .NET. Some of which would take The proposed APIs to be exposed on
If we were doing this today, with no concern of back-compat. We would likewise have 1 type. We would likely call it
Raw sequences of bytes fundamentally must be told what order they are in to be read correctly. If you don't want to work with raw bytes, use strings. If you do work with raw bytes, you must understand the endianness of them or you risk the wrong thing happening on mismatch. The same issue would be present for one application calling It introduces a magnitude of additional complexity, considerations, integration concerns, and general failure points above and beyond simply having overloads on |
Yes, and the problem lies in how the binary and string representations work.
The problem is that during binary (de)serialization, there is a "raw dump" of the internal representation of the Guid. In such a case, what happens when ToString is called does not correspond at all to what was passed in the constructor that takes an array of bytes. This is because the constructor that takes bytes expects the binary representation of the Guid, taking into account the details of its internal structure, and not the binary representation of the hex string, which is accepted by the constructor that takes a string.
All this would not be a problem if System.Guid did not have methods to construct it from an array of bytes or to return its contents as an array of bytes. |
The same fundamental problem would still exist, if you are serializing to binary then endianness cannot be avoided. I agree it would be less confusing though, because it would likely have dedicated serialization methods on |
This is not the problem nor is it a "raw dump". For example, when running on a machine such as an IBM System z9 (which is one of the few Big Endian machines), the raw byte sequences as read from memory will not match what is emitted by
It does not take the binary representation of the
It still would exist and likely in a worse setup. Not only would users who need binary serialization try to do it themselves, they would be manually trying to read/write the bytes and so they would have to take a dependence on the internal layout rather than it being abstracted as it is today. |
I completely agree, but System.Guid has methods for both binary and string representation. And these methods take endianess of the machine they are running on into account. And from the very beginning, I took into account that methods for working with binary representation cannot be removed. That's why I didn't suggest doing it, neither in the first post nor in further discussion.
Great constructive suggestion that deserves discussion. In this case, System.Uuid can be considered as a container, whose public API allows working only with string representation, and the work with binary representation can be moved to BinaryPrimitives, which may be the best place to implement an API that takes endianess into account. |
Thank you for the clarification. It makes the processes more transparent.
But as correctly noted above, System.Guid is special. It takes care of endianess itself when its methods are called.
I understand that. The problem is that Guid takes into account the endianess of its components. If we keep the current implementation, then if its internal representation were implemented as, say, int128, or as 16 separate bytes, then there would be no issues with endianess. You simply reverse all the bytes if necessary and there are no issues. But it behaves like a mixed-endian structure, which also exposes a public API for working with its binary representation.
I completely agree.
As discussed earlier, it may be worth considering Sysytem.Uuid as an alternative to Guid that will not have a public API for working with its binary representation and instead will have methods in
Thank you very much to you personally, as well as to the API Review team, and everyone who is involved in improving the .NET ecosystem. This is a voluminous and complex job.
The problem with the proposed solution is that it can be done in two ways:
So, would it be worth designing this type as something that you would do today and systematically migrating the .NET ecosystem to its usage? I understand that this is not a quick process and it will definitely take years. The final result, in the best case, we will see in .NET 12 or .NET 14, but maybe it's worth starting to address this technical debt.
I agree. The problem is that the raw sequence, which Guid works with, needs to take into account its internal layout. If I have an array of bytes The Guid constructor that takes in a byte array, as well as the ToByteArray method, have existed since the .NET Framework 1.0, which was introduced in 2002. I'm afraid to imagine how much code would break if they were removed. And if they are not removed, adding new methods to the public API of System.Guid will only make things worse than they are now.
It's difficult to control across hundreds of repositories within a company with thousands of employees to ensure that nobody, anywhere, ever calls the byte constructor or the ToByteArray method. Analyzers (which can be not installed or disabled, regardless of whether it's intentional or due to human error), deprecation warnings (which can break many codebases with TreatWarningsAsErrors, which can also be disabled with #pragma) - these are all half-measures that provide an illusion of certainty (фnd we still haven't addressed the question of what it would cost to deprecate an API that has been around for 20 years).
Within .NET, this could be resolved by byte shuffling, but in such cases, it might be worth considering adding two overloads with different parameters.
That didn't stop the addition of DateOnly and TimeOnly, though.
Yes, but that's all because nobody did anything about this problem for 20 years. It might be a good idea to include a public API review to discuss whether this type should be added, what the cost of adding it would be, how to adapt existing codebases to this type of data if it were to be added, and whether it is even necessary to do so. It could be worth considering adding it simply as a 'data box' with no API that could allow end users to make mistakes. Alternatively, a decision may be made to convert all of .NET to use this data type, and to designate Guid as 'legacy'. Perhaps the best solution would be to add separate static methods and perform some minor reworking of Guid itself. |
@tannergooding |
It doesn't necessarily "take care of endianess". It has an internal format and exposes a set of APIs for dealing with serialization. Those APIs are currently incomplete and only let you work with little endian data, even if you may have big endian data.
The same issue still exists, regardless of the internal layout of the data. The fundamental problem is that users need to be able to serialize to and from byte sequences representing We would end up having exposed
This would be the same as simply marking the APIs on We would most likely take such a breaking change over introducing a new type if API review determines its as big of a concern as some users in this thread are surfacing.
This is effectively what #86798 is proposing. It is exposing all the APIs for working with the two possible binary representations of a
A new type, 2 is only an issue in terms of making it visible that some users want to pass 1 is only really desirable if we believe that the existing serialization APIs on With 1 or 2, they only have to consider
That is what #86798 is doing. It is exposing the missing overloads and it matches what we would expose today, minus the name where it remains
The same is true for
Yes, or with #86798 you would pass in the array of bytes
The byte shuffling would also exist for the proposed What the internal implementation does to ensure correct behavior for a given method doesn't matter for the exposed public API surface. We are going to do whatever is most efficient, within the limits of the platforms we run on and general needs of the type.
It will not make it worse than adding an entirely new type and having to explain all the subtle differences between We do not have this problem for any of the other types we expose big and little endian serialization APIs for, several (but not all) of which are more broadly used than
It was a general consideration for the types and the impact it may have. The number of advantages of exposing them were directly weighed against the drawbacks and there was enough evidence to justify it. That has not been the case for
It is my job, as area owner, to be the interface between the community and API review here. The community is allowed to engage directly with API review via YouTube chat if and when a given proposal goes to API review. That is what I am doing, that is what I am explaining. If you go back and look at the people who have chimed in or thumbs up'd my responses above, you'll find many managers, principal level engineers, and even the runtime architect. I can tell you with absolute certainty that I am not misrepresenting the stance of API review here, many of which are included in those that have reacted to my posts above. I have attempted to, repeatedly, explain why API review is going this route. You'll just be trading off "Tanner is saying 'x'" for "Alternative API Review Member is also saying 'x'". When API review does happen, I expect it will effectively boil down to:
1 is effectively a guarantee. This has already been covered in my offline checks and I have iterated the reasons above. We'll just formalize the decision on live stream and reiterate the reasons again there. Given that the runtime architect asked 2, we'll likely spend a bit of time debating it. I'm currently pushing for us to do something given the sheer number of people that upvoted the ask for Those numbers would then indicate that we should do something and we would then opt for the approach I laid out in the new proposal given that it follows the If API review determines it is not worth exposing the overloads after more in depth discussion, then we'll end up exactly where we are today and we'll recommend that users roll their own helper API. Such a helper API is trivial to write and is about 7 lines of code. Someone from the community could provide that functionality via a NuGet package (binary or source based). I do expect that API review will say it is worth exposing the helpers, however. So we'll then discuss 3 to determine if we believe the concerns around users choosing the wrong overload are justified or not. I expect API review will say they aren't and that users that are impacted by this will quickly find out via testing and docs that things are wrong and that they should be passing in |
Background and motivation
There are 2 ways to organize the binary representation of Uuid:
Currently, the .NET Base Class Library only includes System.Guid. This is a data structure that implements the second method of binary representation.
If you use
System.Guid
as a unique identifier for objects, it is necessary to be extremely careful.For example, if it is used as a parameter in a database query. Due to the implementation specifics of
System.Guid
, calling the methodSystem.Guid::ToByteArray()
and callingSystem.Guid::ToString()
with subsequent conversion of the resulting hexadecimal string to a byte array will produce different results.From this, it follows that calling the constructor with a byte array or with a string also produces different results.
If the constructor that accepts a string was called, the result of calling
System.Guid::ToString()
will match the value of the string passed to the constructor, but the result of callingSystem.Guid::ToByteArray()
will not match.If the constructor that accepts a byte array was called, the opposite situation arises - the result of calling
System.Guid::ToByteArray()
matches the constructor parameters, butSystem.Guid::ToString()
does not match.This can lead to situations where, for example, the log records the string representation, but the database stores the binary representation. And if you decide to find an object whose identifier you saw in the logs, you need to perform the same conversion that is done inside
System.Guid
.The above examples demonstrate the difficulties in working with
System.Guid
that arise due to differences in string and binary representations.That's why I suggest adding a data structure called
System.Uuid
with a simple API that will have the same string (hexadecimal) and binary representation. The algorithms for generating a sequence of 16 bytes to construct this structure can be left as a space for creativity for the .NET community. Adding such a data type to the base class library would provide a solid foundation for the .NET ecosystem to freely use the first option of the binary representation (as 16 separate octets), without worrying about how the data is serialized - whether by converting to a string or binary format.API Proposal
And also all interface methods, comparison operators, equality operators
API Usage
Alternative Designs
No response
Risks
No response
The text was updated successfully, but these errors were encountered: