Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make write_debug_image MSAN-safe #5691

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Make write_debug_image MSAN-safe #5691

wants to merge 2 commits into from

Conversation

steven-johnson
Copy link
Contributor

If you build an msan-enabled test, you can end up with stack space allocated by the caller marked as "invalid" (e.g. if they allocate a decent buffer on the stack and only use a few bytes of it). If this test calls halide_debug_to_file(), the stack space used internally by that function can overlay the stack space from the caller (which is fine), but since our runtime code is never compiled with msan enabled, the writes to this stack space in our code never set the shadow bits... which is fine until a system call (like fwrite) is called, and then msan blows up.

The "solution" for this is to just mark the things handed to fwrite (etc) as explicitly initialized; note that everything calling this is going thru a stack-based buffer of some sort (we never use heap memory directly).

If you build an msan-enabled test, you can end up with stack space allocated by the caller marked as "invalid" (e.g. if they allocate a decent buffer on the stack and only use a few bytes of it). If this test calls halide_debug_to_file(), the stack space used internally by that function can overlay the stack space from the caller (which is fine), but since our runtime code is never compiled with msan enabled, the writes to this stack space in our code never set the shadow bits... which is fine until a system call (like fwrite) is called, and then msan blows up.

The "solution" for this is to just mark the things handed to fwrite (etc) as explicitly initialized; note that *everything* calling this is going thru a stack-based buffer of some sort (we never use heap memory directly).
@steven-johnson steven-johnson marked this pull request as ready for review February 2, 2021 00:02
void *const f;
ALWAYS_INLINE ScopedFile(void *user_context, const char *filename, const char *mode)
: user_context(user_context), f(fopen(filename, mode)) {
halide_msan_annotate_memory_is_initialized(user_context, &f, sizeof(f));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining that args to a syscall that live on the stack must be annotated when msan is on. I guess the filename and mode always come from .rodata? Otherwise this code is baffling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the problem. The filename and mode come from code with msan enabled, so they are annotated correctly. The problem here is that fopen wasn't built with msan, so the returned value looks uninitialized to msan.

void *const f;
ALWAYS_INLINE ScopedFile(void *user_context, const char *filename, const char *mode)
: user_context(user_context), f(fopen(filename, mode)) {
halide_msan_annotate_memory_is_initialized(user_context, &f, sizeof(f));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's the problem. The filename and mode come from code with msan enabled, so they are annotated correctly. The problem here is that fopen wasn't built with msan, so the returned value looks uninitialized to msan.

}
ALWAYS_INLINE ~ScopedFile() {
if (f) {
fclose(f);
}
}
ALWAYS_INLINE bool write(const void *ptr, size_t bytes) {
halide_msan_annotate_memory_is_initialized(user_context, ptr, bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one necessary? We should annotate at the call site of this function when needed, not here. Writing uninitialized memory to a file does seem like it should in fact be an msan failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the memory isn't uninitialized, but msan's interceptor for fwrite() thinks it is:

  • Calling app allocates a large stack buffer, only uses a small amount of it, so msan adds a 'shadow' record noting that most of this stack area is uninitialized
  • Calling app later calls into Halide and calls this function
  • This function always copies from heap memory into a 4k stack buffer for I/O purposes (so that it can ignore variations in strides)
  • If we get (un)lucky, that stack buffer overlaps with the area of the buffer from step Bounds checking on input images #1; since our runtime isn't compiled with MSAN enabled, the 'shadow' records aren't updated
  • Call into fwrite(), which is intercepted by msan, sees that there is a shadow record for this memory range, sees it has uninitialized bytes, and goes boom

That said... this is a gigantic hack and I probably should rethink this; it assumes/relies on the fact that everything we pass to fwrite() goes through a stack allocation of some sort (which is true) but doesn't actually enforce it (so code changes could easily break this).

Alternate approaches that might make sense:

  • Move all buffering for fwrite() into the ScopedFile class, so we can have ironclad assurance that the memory is something safe to mark properly. (Less hacky code, but would still let us write potentially-random bytes to file)
  • Do msan-specific compiles for all runtime modules for msan, so that our runtime code isn't a special flower. (Downside is that our embedded bitcode ~doubles in size for a rarely-used case, and potentially dramatically slows down runtime code in this case too, though expecting decent performance in an MSAN run is not a thing in the first place)
  • Just declare that this function can't be used in conjunction with MSAN (and add some checks to ensure that the failure is reliable)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simple solution would be to use __msan_check_mem_is_initialized to check the memory before we buffer it, and then annotate it as initialized after we buffer it?

@steven-johnson steven-johnson marked this pull request as draft February 11, 2021 17:36
@steven-johnson
Copy link
Contributor Author

Converting this to draft until I have time to circlebackto it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants