Conversion to and from mixed-endian byte strings #48

ploeh · 2019-05-23T11:09:27Z

Microsoft tends to encode UUIDs in a mixed-endian format.

"Other systems, notably Microsoft's marshalling of UUIDs in their COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian."

Source: Wikipedia

There's plenty of evidence of this. Ask me how I know 😉

It'd be useful if the uuid library also provided conversions to and from this format. I created this conversion to ByteString:

toMixedEndianByteString :: UUID -> ByteString
toMixedEndianByteString uuid =
    case BS.unpack $ toByteString uuid of
      [w0,w1,w2,w3, w4,w5, w6,w7, w8,w9, wa,wb,wc,wd,we,wf] ->
        BS.pack [w3,w2,w1,w0, w5,w4, w7,w6, w8,w9, wa,wb,wc,wd,we,wf]
      _ -> BS.empty

I've yet to attempt the reverse conversion, but I think it'll look similar.

Is there any interest in getting this into the library? If so, I'll be happy to attempt a pull request.

The text was updated successfully, but these errors were encountered:

hvr · 2019-05-23T17:34:52Z

I've been pointed to https://docs.microsoft.com/en-us/previous-versions/aa379358(v%3Dvs.80) which claims

typedef struct _GUID {
    unsigned long Data1;
    unsigned short Data2;
    unsigned short Data3;
    unsigned char Data4[8];
} GUID,  UUID;

so your code above would only be correct when the host order is little-endian.

What is your use-case for this serialization format? is it for C FFI purposes or something else?

ploeh · 2019-05-23T19:59:59Z

My use case is reading column data from SQL Server. For that, I'm using the odbc package. This package has, however, no particular representation of a UUID, so instead, for UNIQUEIDENTIFIER columns, you just get a ByteString. The same applies when saving data to such a column: you must supply a ByteString value.

I've noticed that when I use toByteString to convert a UUID value, when I save it to the database, the bytes in first three parts are reversed.

Other people have made corroborating observations.

The explanation could be that

"The first 4 parts are either 2 or 4 bytes long and are therefore probably stored as a native type (ie. WORD and DWORD) in little endian format. The last part is 6 bytes long and it therefore handled differently (probably an array)"

and

"since the last 8 bytes are stored as a byte array, I think this identifies the behaviour you are seeing."

Source: https://stackoverflow.com/q/10190817/126014

When I convert the bytes using the above toMixedEndianByteString function the value gets correctly stored in the database.

hvr · 2019-05-24T11:14:59Z

@ploeh I see; however in this case I'd advocate that it should be the database library's responsibility to know how to decode/encode the types supported by the respective database; and in fact, that's what e.g. postgresql-simple does. However, I can't bring this up myself at https://github.com/fpco/odbc/issues as I've been banned by FPComplete.

ploeh · 2019-05-27T19:34:50Z

I don't mind taking the issue to odbc instead. Ultimately, I can just keep my working solution in my own code base, where it already works. I did think that I'd ask here first, though, since this might be a problem with UUID values marshalled via any Microsoft-based system.

As the Wikipedia entry suggests, this could be an issue with any UUID you receive via COM/OLE, so it's likely to be much wider than exclusive to interacting with SQL Server. I haven't tried, but it's possible one might run into similar problems when interacting with, say, Microsoft Office, Exchange, or many other older systems of that type.

As I did spend a few hours figuring all this out, I thought I'd offer the solution at the place where it'd be most generally available to other users, thereby saving others from similarly wasted time.

hvr · 2019-05-28T09:38:50Z

If you get this encoding via OLE/COM, this means via FFI, now? In that case you'd typically not get it via a ByteString but rather as a Ptr and then we should rather talk about the Storable API. I'd like to see more real-world use-cases beyond ODBC to better inform how to design and add this into the uuid package.

ploeh · 2019-05-29T07:58:14Z

That's a good point; I hadn't thought that through. It's true that when interacting with the odbc package, I take advantage of the feature that already turns SQL Server's native UNIQUEIDENTIFIER into a ByteString. The code that does that, however, does get the data via a Ptr.

ploeh closed this as completed May 27, 2019

ploeh mentioned this issue May 29, 2019

Instances for UUID fpco/odbc#27

Open

johnnovak mentioned this issue Feb 28, 2020

better endians nim-lang/Nim#13463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversion to and from mixed-endian byte strings #48

Conversion to and from mixed-endian byte strings #48

ploeh commented May 23, 2019

hvr commented May 23, 2019

ploeh commented May 23, 2019

hvr commented May 24, 2019

ploeh commented May 27, 2019

hvr commented May 28, 2019

ploeh commented May 29, 2019

Conversion to and from mixed-endian byte strings #48

Conversion to and from mixed-endian byte strings #48

Comments

ploeh commented May 23, 2019

hvr commented May 23, 2019

ploeh commented May 23, 2019

hvr commented May 24, 2019

ploeh commented May 27, 2019

hvr commented May 28, 2019

ploeh commented May 29, 2019