Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid all compiler optimization on embedded apphost hash #110554

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 26 additions & 5 deletions src/native/corehost/corehost.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@
#define EMBED_HASH_LO_PART_UTF8 "74e592c2fa383d4a3960714caef0c4f2"
#define EMBED_HASH_FULL_UTF8 (EMBED_HASH_HI_PART_UTF8 EMBED_HASH_LO_PART_UTF8) // NUL terminated

void to_non_volatile(volatile const char* cstr, char* output, size_t length)
{
for (size_t i = 0; i < length; i++)
{
output[i] = cstr[i];
}
}

bool compare_memory_nooptimization(volatile const char* a, volatile const char* b, size_t length)
{
for (size_t i = 0; i < length; i++)
{
if (*a++ != *b++)
return false;
}
return true;
}

bool is_exe_enabled_for_execution(pal::string_t* app_dll)
{
constexpr int EMBED_SZ = sizeof(EMBED_HASH_FULL_UTF8) / sizeof(EMBED_HASH_FULL_UTF8[0]);
Expand All @@ -48,18 +66,21 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
// Contains the EMBED_HASH_FULL_UTF8 value at compile time or the managed DLL name replaced by "dotnet build".
// Must not be 'const' because std::string(&embed[0]) below would bind to a const string ctor plus length
// where length is determined at compile time (=64) instead of the actual length of the string at runtime.
static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8; // series of NULs followed by embed hash string
volatile static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8; // series of NULs followed by embed hash string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what happens if you change:

    static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8; // series of NULs followed by embed hash string

to:

    static const char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;
    char embed[EMBED_MAX];
    to_non_volatile(const_embed, embed, EMBED_MAX);

And keep the remainder of the file as it were (except for adding to_non_volatile).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says we can't use const.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this change as the full diff:

diff --git a/src/native/corehost/corehost.cpp b/src/native/corehost/corehost.cpp
index 6de7acfbd08..d0c90f4bfcf 100644
--- a/src/native/corehost/corehost.cpp
+++ b/src/native/corehost/corehost.cpp
@@ -40,6 +40,24 @@
 #define EMBED_HASH_LO_PART_UTF8 "74e592c2fa383d4a3960714caef0c4f2"
 #define EMBED_HASH_FULL_UTF8    (EMBED_HASH_HI_PART_UTF8 EMBED_HASH_LO_PART_UTF8) // NUL terminated
 
+void to_non_volatile(volatile const char* cstr, char* output, size_t length)
+{
+    for (size_t i = 0; i < length; i++)
+    {
+        output[i] = cstr[i];
+    }
+}
+
+bool compare_memory_nooptimization(volatile const char* a, volatile const char* b, size_t length)
+{
+    for (size_t i = 0; i < length; i++)
+    {
+        if (*a++ != *b++)
+            return false;
+    }
+    return true;
+}
+
 bool is_exe_enabled_for_execution(pal::string_t* app_dll)
 {
     constexpr int EMBED_SZ = sizeof(EMBED_HASH_FULL_UTF8) / sizeof(EMBED_HASH_FULL_UTF8[0]);
@@ -48,11 +66,14 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
     // Contains the EMBED_HASH_FULL_UTF8 value at compile time or the managed DLL name replaced by "dotnet build".
     // Must not be 'const' because std::string(&embed[0]) below would bind to a const string ctor plus length
     // where length is determined at compile time (=64) instead of the actual length of the string at runtime.
-    static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
+    volatile static char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
 
     static const char hi_part[] = EMBED_HASH_HI_PART_UTF8;
     static const char lo_part[] = EMBED_HASH_LO_PART_UTF8;
 
+    char embed[EMBED_MAX];
+    to_non_volatile(const_embed, embed, EMBED_MAX);
+
     if (!pal::clr_palstring(embed, app_dll))
     {
         trace::error(_X("The managed DLL bound to this executable could not be retrieved from the executable image."));

The compiler embeds the string twice:

++ grep --byte-offset --only-matching --text c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca ./artifacts/obj/coreclr/linux.x64.Release/Corehost.Static/CMakeFiles/singlefilehost.dir/home/omajid/runtime/src/native/corehost/corehost.cpp.o
320:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca
4816:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using const static char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8; produces the exact same result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above says we can't use const.

Like this, it is also working around a compiler optimization. I thought we may tackle both together.

Using const static char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8; produces the exact same result.

That is: you also tried without the volatile?

I'm surprised it found the need to duplicate const data which shouldn't be optimized for a specific usage.

Thanks for giving it try!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like this?

diff --git a/src/native/corehost/corehost.cpp b/src/native/corehost/corehost.cpp
index 6de7acfbd08..f77fbe63347 100644
--- a/src/native/corehost/corehost.cpp
+++ b/src/native/corehost/corehost.cpp
@@ -40,6 +40,14 @@
 #define EMBED_HASH_LO_PART_UTF8 "74e592c2fa383d4a3960714caef0c4f2"
 #define EMBED_HASH_FULL_UTF8    (EMBED_HASH_HI_PART_UTF8 EMBED_HASH_LO_PART_UTF8) // NUL terminated
 
+void to_non_volatile(volatile const char* cstr, char* output, size_t length)
+{
+    for (size_t i = 0; i < length; i++)
+    {
+        output[i] = cstr[i];
+    }
+}
+
 bool is_exe_enabled_for_execution(pal::string_t* app_dll)
 {
     constexpr int EMBED_SZ = sizeof(EMBED_HASH_FULL_UTF8) / sizeof(EMBED_HASH_FULL_UTF8[0]);
@@ -48,11 +56,14 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
     // Contains the EMBED_HASH_FULL_UTF8 value at compile time or the managed DLL name replaced by "dotnet build".
     // Must not be 'const' because std::string(&embed[0]) below would bind to a const string ctor plus length
     // where length is determined at compile time (=64) instead of the actual length of the string at runtime.
-    static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
+    volatile static const char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
 
     static const char hi_part[] = EMBED_HASH_HI_PART_UTF8;
     static const char lo_part[] = EMBED_HASH_LO_PART_UTF8;
 
+    char embed[EMBED_MAX];
+    to_non_volatile(const_embed, embed, EMBED_MAX);
+
     if (!pal::clr_palstring(embed, app_dll))
     {
         trace::error(_X("The managed DLL bound to this executable could not be retrieved from the executable image."));

This leads to:

$ grep --byte-offset --only-matching --text c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca ./artifacts/obj/coreclr/linux.x64.Release/Corehost.Static/CMakeFiles/singlefilehost.dir/home/omajid/runtime/src/native/corehost/corehost.cpp.o
256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca
4752:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const vs non-const doesn't make a difference, as far as I can see:

diff --git a/src/native/corehost/corehost.cpp b/src/native/corehost/corehost.cpp
index 6de7acfbd08..7c6c1244815 100644
--- a/src/native/corehost/corehost.cpp
+++ b/src/native/corehost/corehost.cpp
@@ -40,6 +40,14 @@
 #define EMBED_HASH_LO_PART_UTF8 "74e592c2fa383d4a3960714caef0c4f2"
 #define EMBED_HASH_FULL_UTF8    (EMBED_HASH_HI_PART_UTF8 EMBED_HASH_LO_PART_UTF8) // NUL terminated
 
+void to_non_volatile(volatile const char* cstr, char* output, size_t length)
+{
+    for (size_t i = 0; i < length; i++)
+    {
+        output[i] = cstr[i];
+    }
+}
+
 bool is_exe_enabled_for_execution(pal::string_t* app_dll)
 {
     constexpr int EMBED_SZ = sizeof(EMBED_HASH_FULL_UTF8) / sizeof(EMBED_HASH_FULL_UTF8[0]);
@@ -48,11 +56,14 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
     // Contains the EMBED_HASH_FULL_UTF8 value at compile time or the managed DLL name replaced by "dotnet build".
     // Must not be 'const' because std::string(&embed[0]) below would bind to a const string ctor plus length
     // where length is determined at compile time (=64) instead of the actual length of the string at runtime.
-    static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
+    volatile static char const_embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
 
     static const char hi_part[] = EMBED_HASH_HI_PART_UTF8;
     static const char lo_part[] = EMBED_HASH_LO_PART_UTF8;
 
+    char embed[EMBED_MAX];
+    to_non_volatile(const_embed, embed, EMBED_MAX);
+
     if (!pal::clr_palstring(embed, app_dll))
     {
         trace::error(_X("The managed DLL bound to this executable could not be retrieved from the executable image."));
$ grep --byte-offset --only-matching --text c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca ./artifacts/obj/coreclr/linux.x64.Release/Corehost.Static/CMakeFiles/singlefilehost.dir/home/omajid/runtime/src/native/corehost/corehost.cpp.o
256:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca
4752:c3ab8ff13720e8ad9047dd39466b3c8974e592c2fa383d4a3960714ca

Copy link
Member

@tmds tmds Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My conclusion is that I neither understand why all of these don't work, and why what is in the PR does work. 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize the duplication comes from the compares against the parts and not from the complete embed...

I wonder if introducing overlap in the parts wouldn't eliminate the issue too.

diff --git a/src/native/corehost/corehost.cpp b/src/native/corehost/corehost.cpp
index 6de7acfbd08..e3d0ca72891 100644
--- a/src/native/corehost/corehost.cpp
+++ b/src/native/corehost/corehost.cpp
@@ -36,9 +36,10 @@
  *        o https://en.wikipedia.org/wiki/Comparison_of_file_systems
  *          has more details on maximum file name sizes.
  */
-#define EMBED_HASH_HI_PART_UTF8 "c3ab8ff13720e8ad9047dd39466b3c89" // SHA-256 of "foobar" in UTF-8
-#define EMBED_HASH_LO_PART_UTF8 "74e592c2fa383d4a3960714caef0c4f2"
-#define EMBED_HASH_FULL_UTF8    (EMBED_HASH_HI_PART_UTF8 EMBED_HASH_LO_PART_UTF8) // NUL terminated
+#define EMBED_HASH_PART_1_UTF8 "c3ab8ff13720e8ad9047dd39" // SHA-256 of "foobar" in UTF-8
+#define EMBED_HASH_PART_2_UTF8 "466b3c8974e592c2fa383d4a"
+#define EMBED_HASH_PART_3_UTF8 "3960714caef0c4f2"
+#define EMBED_HASH_FULL_UTF8    (EMBED_HASH_PART_1_UTF8 EMBED_HASH_PART_2_UTF8 EMBED_HASH_PART_3_UTF8) // NUL terminated
 
 bool is_exe_enabled_for_execution(pal::string_t* app_dll)
 {
@@ -50,8 +51,8 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
     // where length is determined at compile time (=64) instead of the actual length of the string at runtime.
     static char embed[EMBED_MAX] = EMBED_HASH_FULL_UTF8;     // series of NULs followed by embed hash string
 
-    static const char hi_part[] = EMBED_HASH_HI_PART_UTF8;
-    static const char lo_part[] = EMBED_HASH_LO_PART_UTF8;
+    static const char hi_part[] = (EMBED_HASH_PART_1_UTF8 EMBED_HASH_PART_2_UTF8);
+    static const char lo_part[] = (EMBED_HASH_PART_2_UTF8 EMBED_HASH_PART_3_UTF8);
 
     if (!pal::clr_palstring(embed, app_dll))
     {
@@ -73,9 +74,9 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
     // So use two parts of the string that will be unaffected by the edit.
     size_t hi_len = (sizeof(hi_part) / sizeof(hi_part[0])) - 1;
     size_t lo_len = (sizeof(lo_part) / sizeof(lo_part[0])) - 1;
-    if (binding.size() >= (hi_len + lo_len)
+    if (binding.size() >= (EMBED_SZ - 1)
         && binding.compare(0, hi_len, &hi_part[0]) == 0
-        && binding.compare(hi_len, lo_len, &lo_part[0]) == 0)
+        && binding.compare(EMBED_SZ - 1 - lo_len, lo_len, &lo_part[0]) == 0)
     {
         trace::error(_X("This executable is not bound to a managed DLL to execute. The binding value is: '%s'"), app_dll->c_str());
         return false;

Copy link
Member

@tmds tmds Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the above to would work because the two parts that the compiler optimizes against with the overlap are less likely to form the whole embed due to the compiler optimizations than they currently are. The embed is split in two parts to avoid them forming the whole, and this is adding to that approach.

If this works, it also means that the to_non_volatile logic that is in the PR is not needed because the above is only about the comparison against the parts.

The difference between compare_memory_nooptimization and the above is that the former leaves out any possibility of the compiler optimizing the comparison (and therefore potentially reintroducing the issue as compilers get "smarter").


static const char hi_part[] = EMBED_HASH_HI_PART_UTF8;
static const char lo_part[] = EMBED_HASH_LO_PART_UTF8;

if (!pal::clr_palstring(embed, app_dll))
Copy link
Member Author

@omajid omajid Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing pal::clr_palstring per-platform to handle volatile correctly looked much more involved that creating a non-volatile copy instead.

Strangely enough, even the non-volatile copy still needed the no-opt memory compare to stop creating duplicate instances of the embedded hash. I wonder if the compiler ignored volatile data when it started optimizing the non-volatile copy?

Copy link
Member

@jkotas jkotas Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that the compiler can see all operations on the value and that the value does not escape, and thus volatile can be safely ignored.

It may help to move the value to the global scope and mark it as extern or extern "C". It should prevent the compiler from making assumptions the value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omajid Have you tried applying extern?

Copy link
Member

@jkotas jkotas Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it won't help after thinking about it some more. The main problem that this PR is trying to solve is suppressing undesirable optimizations in whole program optimization mode. The compiler can see all uses of the symbol in this mode. extern won't prevent it from making this observation. We would have to artificially export the symbol from the binary to prevent the optimizations, but that is undesirable for other reasons.

char working_copy_embed[EMBED_MAX];
to_non_volatile(embed, working_copy_embed, EMBED_MAX);

if (!pal::clr_palstring(&working_copy_embed[0], app_dll))
{
trace::error(_X("The managed DLL bound to this executable could not be retrieved from the executable image."));
return false;
}

std::string binding(&embed[0]);
std::string binding(&working_copy_embed[0]);

// Check if the path exceeds the max allowed size
if (binding.size() > EMBED_MAX - 1) // -1 for null terminator
Expand All @@ -74,8 +95,8 @@ bool is_exe_enabled_for_execution(pal::string_t* app_dll)
size_t hi_len = (sizeof(hi_part) / sizeof(hi_part[0])) - 1;
size_t lo_len = (sizeof(lo_part) / sizeof(lo_part[0])) - 1;
if (binding.size() >= (hi_len + lo_len)
&& binding.compare(0, hi_len, &hi_part[0]) == 0
&& binding.compare(hi_len, lo_len, &lo_part[0]) == 0)
&& compare_memory_nooptimization(binding.c_str(), hi_part, hi_len)
&& compare_memory_nooptimization(binding.substr(hi_len).c_str(), lo_part, lo_len))
{
trace::error(_X("This executable is not bound to a managed DLL to execute. The binding value is: '%s'"), app_dll->c_str());
return false;
Expand Down
Loading