Custom preprocessor #5

squeek502 · 2023-09-20T05:51:32Z

This would be a solution for the "Unavoidable divergences from the Windows RC compiler" which are all preprocessor related.

Most notably, it would be necessary for UTF-16 support.

This would likely be a large undertaking. A starting point might be to fork arocc.

EDIT: Experimenting with this here: https://github.com/squeek502/preresinator

squeek502 · 2023-10-21T03:21:07Z

Some notes on potential UTF-16 handling I had locally that are relevant:

Convert UTF-16 files to UTF-8

Surround converted files with

#pragma code_page(65001)
// file contents
#pragma code_page(DEFAULT)

Need to ensure that the generated #pragma code_page lines aren't ignored due to them being part of an included file

Could do something like using #line 1 "<built-in>" and then documenting that such lines should not be ignored even if they are 'within' an included file
Can't really do something like putting the generated #pragma code_page's in the 'root' file since that wouldn't handle the case of an included file including a UTF-16 file

Could do something like (where root-file.rc is the root file and file.rc is any arbitrary included file)

#line 1 "root-file.rc" // note that the 1 here is a totally fake line number since the following line is not part of the original file at all
#pragma code_page(65001)
#line 1 "file.rc"
// file.rc contents

Also need to keep https://squeek502.github.io/resinator/windows/input-and-output-code-pages.html in mind. What rc.exe does is encode the output of the preprocessor as UTF-16, having already handled all the #pragma code_pages and done things like replaced invalid codepoints with <U+FFFD>, etc. This type of approach may end up being necessary to reach full compatibility without resinator specific behavior in the preprocessed output.

mehrdadn · 2025-02-11T08:45:22Z

Hi, just wanted to make a suggestion: would it be possible to simply handle the cases that are easy, and leave the more difficult ones for later? Particularly, being able to parse UTF-16 resource files that don't have any problematic #pragma code_page directives would be a great step forward and quite helpful, because it would allow the usage of build steps that output UTF-16 lacking such directives.

squeek502 · 2025-02-11T09:07:48Z

Unfortunately, the baseline would be a C preprocessor that can handle UTF-16 encoded files as input, which is not necessarily easy on its own (for context, the Microsoft compilers are the only existing ones I'm aware of that support UTF-16 [or, at least, I know clang and gcc don't]).

Beyond that, decisions need to be made about the encoding of the output of the C preprocessor, and how that interacts with the resource compiler (this is where #pragma code_page starts being relevant). The 'easiest' strategy here would be for the C preprocessor to output UTF-16 and for the resource compiler to also be able to ingest UTF-16 (this is how rc.exe works; the preprocessor always outputs UTF-16 and so the resource compiler only has to care about ingesting UTF-16), but I don't necessarily like that solution too much.

mehrdadn · 2025-02-12T03:21:37Z

Unfortunately, the baseline would be a C preprocessor that can handle UTF-16 encoded files as input, which is not necessarily easy on its own (for context, the Microsoft compilers are the only existing ones I'm aware of that support UTF-16 [or, at least, I know clang and gcc don't]).

I feel I'm missing something, but what about just taking converting UTF-16 files to UTF-8 and then feeding them to your C preprocessor?

squeek502 · 2025-02-12T07:16:37Z

That would be one way to go, but it would be the C preprocessor that would need to do the conversion, since that's what's handling #include, etc.

It would also need some way to let the resource compiler know how the output should be interpreted, either:

Consume UTF-16 and output UTF-8, and then mark that section as being UTF-8 in some way
Consume UTF-16 and output UTF-16 (internally converting to UTF-8 and then back to UTF-16), and either also mark this or let the resource compiler infer the encoding of UTF-16 sections
Handle all #pragma code_page directives and output as a single standardized encoding

Consider this example:

#pragma code_page(1252)
#include "windows1252.rc"

// no need to set code page, the UTF-16 encoding is inferred
#include "utf16.rc"

// code page 1252 is still active
#include "windows1252_again.rc"

Running rc.exe /p test.rc (to only run the preprocessor) results in:

#pragma code_page 1252

<contents of windows1252.rc interpreted as Windows-1252 and outputted as UTF-16>

<contents of utf16.rc interpreted as UTF-16 and outputted as UTF-16>

<contents of windows1252_again.rc interpreted as Windows-1252 and outputted as UTF-16>

This approach of rc.exe simplifies things from the perspective of the resource compiler (it can ignore the #pragma code_page since everything has already been converted to UTF-16), but means that the C preprocessor is the one dealing with the #pragma code_page directives, which complicates the C preprocessor side of things.

(as an aside, note also that there are rc.exe quirks that are caused by the preprocessor 'speaking' UTF-16, e.g. this and this)

mehrdadn · 2025-02-12T07:52:51Z

Ahh... I see. Thanks for the explanation, that's definitely annoying!

squeek502 · 2025-02-12T09:12:45Z

No worries, writing this out has helped clarify my thoughts about the problem. I'm thinking this might be the most viable path to getting initial UTF-16 support:

[Make the C preprocessor] consume UTF-16 and output UTF-16 (internally converting to UTF-8 and then back to UTF-16), and either also mark this or let the resource compiler infer the encoding of UTF-16 sections

That would allow the preprocessor to not have any rc.exe-specific stuff, and resinator is decently equipped to handle the input being partially UTF-16 encoded.

squeek502 added the enhancement New feature or request label Sep 20, 2023

squeek502 mentioned this issue Sep 22, 2023

Add a .rc -> .res compiler to the Zig compiler ziglang/zig#17069

Merged

4 tasks

This was referenced Oct 16, 2023

Support for UTF-16 files when /:no-preprocess is specified #6

Open

zig rc: UTF-16 (LE) encoding is not supported ziglang/zig#17557

Open

squeek502 mentioned this issue Jul 20, 2024

Allow byte order mark at the start of a file #12

Open

squeek502 mentioned this issue Jan 9, 2025

Stringify operator discrepancy with rc.exe's preprocessor #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom preprocessor #5

Custom preprocessor #5

squeek502 commented Sep 20, 2023 •

edited

Loading

squeek502 commented Oct 21, 2023 •

edited

Loading

mehrdadn commented Feb 11, 2025

squeek502 commented Feb 11, 2025 •

edited

Loading

mehrdadn commented Feb 12, 2025

squeek502 commented Feb 12, 2025 •

edited

Loading

mehrdadn commented Feb 12, 2025

squeek502 commented Feb 12, 2025 •

edited

Loading

Custom preprocessor #5

Custom preprocessor #5

Comments

squeek502 commented Sep 20, 2023 • edited Loading

squeek502 commented Oct 21, 2023 • edited Loading

mehrdadn commented Feb 11, 2025

squeek502 commented Feb 11, 2025 • edited Loading

mehrdadn commented Feb 12, 2025

squeek502 commented Feb 12, 2025 • edited Loading

mehrdadn commented Feb 12, 2025

squeek502 commented Feb 12, 2025 • edited Loading

squeek502 commented Sep 20, 2023 •

edited

Loading

squeek502 commented Oct 21, 2023 •

edited

Loading

squeek502 commented Feb 11, 2025 •

edited

Loading

squeek502 commented Feb 12, 2025 •

edited

Loading

squeek502 commented Feb 12, 2025 •

edited

Loading