Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion to and from mixed-endian byte strings #48

Closed
ploeh opened this issue May 23, 2019 · 6 comments
Closed

Conversion to and from mixed-endian byte strings #48

ploeh opened this issue May 23, 2019 · 6 comments

Comments

@ploeh
Copy link

ploeh commented May 23, 2019

Microsoft tends to encode UUIDs in a mixed-endian format.

"Other systems, notably Microsoft's marshalling of UUIDs in their COM/OLE libraries, use a mixed-endian format, whereby the first three components of the UUID are little-endian, and the last two are big-endian."

There's plenty of evidence of this. Ask me how I know 😉

It'd be useful if the uuid library also provided conversions to and from this format. I created this conversion to ByteString:

toMixedEndianByteString :: UUID -> ByteString
toMixedEndianByteString uuid =
    case BS.unpack $ toByteString uuid of
      [w0,w1,w2,w3, w4,w5, w6,w7, w8,w9, wa,wb,wc,wd,we,wf] ->
        BS.pack [w3,w2,w1,w0, w5,w4, w7,w6, w8,w9, wa,wb,wc,wd,we,wf]
      _ -> BS.empty

I've yet to attempt the reverse conversion, but I think it'll look similar.

Is there any interest in getting this into the library? If so, I'll be happy to attempt a pull request.

@hvr
Copy link
Collaborator

hvr commented May 23, 2019

I've been pointed to https://docs.microsoft.com/en-us/previous-versions/aa379358(v%3Dvs.80) which claims

typedef struct _GUID {
    unsigned long Data1;
    unsigned short Data2;
    unsigned short Data3;
    unsigned char Data4[8];
} GUID,  UUID;

so your code above would only be correct when the host order is little-endian.

What is your use-case for this serialization format? is it for C FFI purposes or something else?

@ploeh
Copy link
Author

ploeh commented May 23, 2019

My use case is reading column data from SQL Server. For that, I'm using the odbc package. This package has, however, no particular representation of a UUID, so instead, for UNIQUEIDENTIFIER columns, you just get a ByteString. The same applies when saving data to such a column: you must supply a ByteString value.

I've noticed that when I use toByteString to convert a UUID value, when I save it to the database, the bytes in first three parts are reversed.

Other people have made corroborating observations.

The explanation could be that

"The first 4 parts are either 2 or 4 bytes long and are therefore probably stored as a native type (ie. WORD and DWORD) in little endian format. The last part is 6 bytes long and it therefore handled differently (probably an array)"

and

"since the last 8 bytes are stored as a byte array, I think this identifies the behaviour you are seeing."

When I convert the bytes using the above toMixedEndianByteString function the value gets correctly stored in the database.

@hvr
Copy link
Collaborator

hvr commented May 24, 2019

@ploeh I see; however in this case I'd advocate that it should be the database library's responsibility to know how to decode/encode the types supported by the respective database; and in fact, that's what e.g. postgresql-simple does. However, I can't bring this up myself at https://github.com/fpco/odbc/issues as I've been banned by FPComplete.

@ploeh
Copy link
Author

ploeh commented May 27, 2019

I don't mind taking the issue to odbc instead. Ultimately, I can just keep my working solution in my own code base, where it already works. I did think that I'd ask here first, though, since this might be a problem with UUID values marshalled via any Microsoft-based system.

As the Wikipedia entry suggests, this could be an issue with any UUID you receive via COM/OLE, so it's likely to be much wider than exclusive to interacting with SQL Server. I haven't tried, but it's possible one might run into similar problems when interacting with, say, Microsoft Office, Exchange, or many other older systems of that type.

As I did spend a few hours figuring all this out, I thought I'd offer the solution at the place where it'd be most generally available to other users, thereby saving others from similarly wasted time.

@ploeh ploeh closed this as completed May 27, 2019
@hvr
Copy link
Collaborator

hvr commented May 28, 2019

If you get this encoding via OLE/COM, this means via FFI, now? In that case you'd typically not get it via a ByteString but rather as a Ptr and then we should rather talk about the Storable API. I'd like to see more real-world use-cases beyond ODBC to better inform how to design and add this into the uuid package.

@ploeh
Copy link
Author

ploeh commented May 29, 2019

That's a good point; I hadn't thought that through. It's true that when interacting with the odbc package, I take advantage of the feature that already turns SQL Server's native UNIQUEIDENTIFIER into a ByteString. The code that does that, however, does get the data via a Ptr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants