Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateWindowEx lpClassName union? #623

Closed
marler8997 opened this issue Aug 29, 2021 · 13 comments
Closed

CreateWindowEx lpClassName union? #623

marler8997 opened this issue Aug 29, 2021 · 13 comments

Comments

@marler8997
Copy link
Contributor

According to the documentation for CreateWindowEx, the lpClassName parameter can be:

A null-terminated string or a class atom created by a previous call to the RegisterClass or RegisterClassEx function.

However in the metadata, the lpClassName parameter is just a PSTR:

image

The C header also just declares this parameter as LPCSTR, however, I wonder if the metadata should consider enhancing this to some sort of union type like this?

typedef union  {
    LPCSTR String;
    ATOM Atom;
} CREATE_WINDOW_CLASS_NAME;

Does the current definition cause any issues for the other projections when someone wants to pass an ATOM for lpClassName?

cc @AArnott @kennykerr

@kennykerr
Copy link
Contributor

Using an atom for the class name is very uncommon in my experience. I would prefer not to complicate the definitions. A developer can always coerce the value if needed, but I would prefer if the definition remained faithful to the original declaration in the headers (as it is now).

@sotteson1
Copy link
Contributor

Thanks for the idea @marler8997, but I agree with @kennykerr. I think we should leave the definition the way it is to represent what the headers model.

@marler8997
Copy link
Contributor Author

marler8997 commented Aug 30, 2021

A developer can always coerce the value if needed

Actually not in the Zig case. In the zig case the CreateWindowExW function declares the lpClassName as a pointer to an array of u16. The problem here is that Zig enforces pointer alignment. A user reported that they were passing an ATOM to this argument (the return value from RegisterClassExW) and they got an alignment error when they casted it to a u16 pointer because it wasn't aligned like a u16 pointer would be.

Of course we can add special handling for this function in Zig's projection to declare it as an unaligned pointer [*]align(1) u16, but maybe there's something we can do in the metadata to indicate this behavior? In fact there are other functions that will be susceptible to this problem. For example, here is a list of constants that are passed to various pointer parameters but are not aligned like those pointer types should be:

https://github.com/marlersoft/zigwin32gen/blob/3169b00a1eaf5522c625e79092fbccc1f898af56/src/genzig.zig#L1135

    // skip these because these constants are integer values being cast to string pointers, however,
    // those string pointers need to be aligned.  The types that accept these constants should probably
    // be properly declared as union types.
    .{ "RT_CURSOR", .{} },
    .{ "RT_BITMAP", .{} },
    .{ "RT_ICON", .{} },
    .{ "RT_MENU", .{} },
    .{ "RT_DIALOG", .{} },
    .{ "RT_FONTDIR", .{} },
    .{ "RT_FONT", .{} },
    .{ "RT_ACCELERATOR", .{} },
    .{ "RT_MESSAGETABLE", .{} },
    .{ "RT_VERSION", .{} },
    .{ "RT_DLGINCLUDE", .{} },
    .{ "RT_PLUGPLAY", .{} },
    .{ "RT_VXD", .{} },
    .{ "RT_ANICURSOR", .{} },
    .{ "RT_ANIICON", .{} },
    .{ "RT_HTML", .{} },

One idea here is if we don't want to express the full union type that these parameters can accept, maybe the metadata could just flag them as some sort of opaque type. This would signal to projections like Zig that enforce pointer alignment not to do so for parameters like this. Maybe an attribute named something like OpaqueUnion or UnalignedPtr or something?

Also could we reopen this issue? @AArnott has not had a chance to say whether this causes an issue with the C# projection. Is the C# projection able to pass an ATOM here for the lpClassName parameter? I realize @kennykerr said this is uncommon, but this is coming from a user who has a real use case for it.

@marler8997
Copy link
Contributor Author

Ping, are we able to reopen this issue?

@marler8997
Copy link
Contributor Author

Just checking in again, could we reopen this issue so we can continue discussion about potential solutions to the problem? cc @sotteson1

@kennykerr
Copy link
Contributor

Sorry, Steve's on vacation. He should be back next week and will check in.

@weltkante
Copy link

weltkante commented Sep 9, 2021

I don't think the win32 ABI enforces pointer alignment, does it? As such a projection adding enforcement of things not in the ABI is necessarily faulty (even if it may work for the majority of cases).

The resource APIs of the win32 API come to mind, which use macros like MAKEINTRESOURCE, which aliases string pointers and 16 bit unsigned integers in a string-pointer-typed value. This is safe because the first page is not mapped so no (valid) pointer will point there.

I don't think its feasible to make an exhaustive list of places where ATOMs and resource "names" are allowable? This wouldn't just need to include function signatures, also struct members. And there may very well be other places where these "tricks" of aliasing values into unused ranges of pointers are used.

@marler8997
Copy link
Contributor Author

marler8997 commented Sep 10, 2021

I don't think the win32 ABI enforces pointer alignment, does it?

I'm not sure how or why the win32 ABI would do such a thing. Pointer alignment is typically a requirement of the processor rather than enforced by ABIs. But the problem here is discussing cases where parameters declared as pointers aren't actually pointers, so there would be no reason to enforce them to be valid pointers.

As such a projection adding enforcement of things not in the ABI is necessarily faulty (even if it may work for the majority of cases).

The problem here is not that the Zig projection is enforcing things that are not in the ABI, the problem is that the types being declared are incomplete/incorrect. Currently functions are saying their parameters are pointers to objects, which if actually interpreted that way in the implementation would result in misaligned access to data that would cause various kinds of runtime errors depending on platform/cpu/etc. But in these cases they are not interpreted as pointers, so no such errors occur. So we have parameters that are declared as pointers but aren't actually pointers.

This is a problem for Zig because in a debug build, values casted to pointers are checked at runtime for alignment and panic if they aren't. This helps catch misaligned access errors earlier at their source instead of later when those bad pointers are dereferenced. The solution for Zig is to correct the type definitions so the compiler understands what they actually are. I believe the most correct representation is a union type, but an unaligned pointer could also be used. At minimum the Zig projection will need to know all the cases where this occurs so it can disable the pointer alignment enforcement by correcting the type.

I don't think its feasible to make an exhaustive list of places where ATOMs and resource "names" are allowable? This wouldn't just need to include function signatures, also struct members. And there may very well be other places where these "tricks" of aliasing values into unused ranges of pointers are used.

Whether or not it's feasible is probably up for debate, but irrelevant since it's required for the Zig projection. Unless I'm missing something. Do you have any advice/guidance or ideas on how the Zig projection could avoid the need for this extra metadata? Short of removing these pointer alignment checks in the language I'm not sure how this extra metadata could be avoided.

@weltkante
Copy link

weltkante commented Sep 10, 2021

I'm not sure how or why the win32 ABI would do such a thing. Pointer alignment is typically a requirement of the processor rather than enforced by ABIs.

The ABI specifies how the OS uses the processor when passing data around. Obviously the OS has to respect the processor capabilities, but at least x86 is pretty flexible with unaligned pointers, so its up to the OS to specify in its ABI whether there are any additional alignment constraints.

As far as I'm aware the x86 ABI for Windows is pretty non-restrictive with its alignment requirements? But really, the headers and C/C++ language define what the API means, so a projection "assuming" alignment for pointers when C/C++ and the OS doesn't seems inherently wrong to me. What are you going to do if you happen to get an unaligned pointer (i.e. a real pointer, not one with a numeric meaning) returned from somewhere?

I believe the most correct representation is a union type, but an unaligned pointer could also be used.

Using a union type risks that the ABI treats them different than primtive pointers. I know for a fact that this happens with structs (returning a struct containing a pointer and returning a pointer typedef gets treated differently in the ABI) - not aware how unions behave but I'd not be surprised if they behave like structs. If its a primitive type in the ABI then you can't just turn it into a struct or union, it would mean something different.

Of course you can invent a "new" kind of union-like annotation, but generally the simplest and most correct representation for a pointer on x86 seems to be allowing any value in it, because thats simply what C/C++ does, so the projection should (probably) use unaligned pointers?

I'm not against annotating pointers that are "more than pointers", but its probably error prone because its not in the headers, and I'm not sure how the team is going to treat changes post "1.0 release" of the metadata, because it will probably be a breaking change to fix them later. Last I asked (which granted, was like a year ago) they said they wouldn't fix metadata errors if they were a breaking change and the previous version was usable. Maybe the policy changed, but leaving a mess of half-complete annotations "frozen" because it is what was released doesn't seem very appealing to me.

@marler8997
Copy link
Contributor Author

marler8997 commented Sep 10, 2021

As far as I'm aware the x86 ABI for Windows is pretty non-restrictive with its alignment requirements? But really, the headers and C/C++ language define what the API means, so a projection "assuming" alignment for pointers when C/C++ and the OS doesn't seems inherently wrong to me.

C/C++ classify unaligned access as "undefined behavior". But actual unaligned pointer access isn't the issue at hand here. The issue is with the win32 API is declaring parameters as pointers when they accept more than just pointers. There's no actual unaligned access going on as far as I'm aware.

What are you going to do if you happen to get an unaligned pointer returned from somewhere?

If win32 returned unaligned pointers to objects, this would be "undefined behavior" and could cause issues with toolchains/cpus/platforms/sanitizers. My guess is this doesn't happen, at least not often. If it does then C/C++/Zig can declare the pointer as unaligned, which would inform the toolchain to generate the correct machine code to access it without triggering UB.

Using a union type risks that the ABI treats them different than primtive pointers. I know for a fact that this happens with structs (returning a struct containing a pointer and returning a pointer typedef gets treated differently in the ABI) - not aware how unions behave but I'd not be surprised if they behave like structs. If its a primitive type in the ABI then you can't just turn it into a struct or union, it would mean something different.

Interesting I was not aware of any case where wrapping a type in a bare struct would change how it's passed in the ABI. This is quite surprising to me because I know I've done this exact thing before and have never had issues. I'd be curious to learn more about this, do you have an example or a pointer to where I could learn more about this?

...generally the simplest and most correct representation for a pointer on x86 seems to be allowing any value in it, because thats simply what C/C++ does, so the projection should (probably) use unaligned pointers?

I don't think this is correct. C/C++ define unaligned pointer access as UB. If you enabled ubsan in a project and started accessing an ATOM value that happened to be unaligned through a pointer alias then you're going to have a bad time. But since it's UB, it could just do what you expect in other cases, that's the beauty of UB :) Saying one implementation of UB is the "correct" one is by definition false. It's "undefined" so there's no "correct" or "incorrect".

@weltkante
Copy link

weltkante commented Sep 10, 2021

I've nothing to add to the discussion as far as the other points are concerned; I'm probably technically inexact with some of my statements, so I acknowledge you correcting me, but I've stated my concerns and at this point am happy to let the team decide what they consider best.

Interesting I was not aware of any case where wrapping a type in a bare struct would change how it's passed in the ABI. This is quite surprising to me because I know I've done this exact thing before and have never had issues. I'd be curious to learn more about this, do you have an example or a pointer to where I could learn more about this?

in COM (which uses a combination of thiscall and stdcall calling conventions), at least on x86, returning a primitive is done via register, returning a struct of the same size is done via stack. There are a few COM interfaces that return structs by value, for example in Direct2D and Direct3D12.

.NET CLR had a longstanding bug where it treated primitive and structs the same and returned via register. WPF (among others) used this to define HRESULT as a struct instead of an integer (to get richer APIs). When the bug in the calling conventions was fixed (to properly distinguish between returning a struct and a primitive as the calling convention mandates) this broke WPF and they had to restore it as a quirk. This means in .NET interop you cannot return structs by value, you have to work around it and return them as an out-parameter (looks similar to what #636 describes)

dotnet/coreclr#23974 (original discussion)
microsoft/CsWin32#167 (me noting that the C# projection is doing it wrong)
#636 (probably the same issue, because plain C doesn't have a concept of thiscall, so using the same signature is treated differently by a stdcall-only calling convention)

@marler8997
Copy link
Contributor Author

Wow

@marler8997
Copy link
Contributor Author

Thanks for that example @weltkante. I think I have a way to move forward here.

I'm abandoning my idea to use unions in place of pointer types. I don't like the idea of making a change that has no guarantee on ABI compatibility. Your example proves there is no such guarantee.

So I think the solution for Zig is to identify the parameters/fields and constants that are union types rather than pointers like they are declared, and just mark them in some way. The Zig projection will translate these as unaligned pointers.

I can keep this extra metadata in the Zig projection for now, and if another projection wants the same data we can discuss upstreaming this extra data into the metadata project as an attribute of some kind. All the work done in the Zig projection can be leveraged in an upstream change if we decide to do that so this seems like a good way to move forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants