Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a better strategy on Windows for main thread detection #4006

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

notgull
Copy link
Member

@notgull notgull commented Nov 23, 2024

We can get a list of the threads in the process, and determine which
thread came first. This thread will be the one who called the main
function. So use this instead of the current strategy if it's available.

This aims to resolve #3999. @PJB3005 would you be able to test this in the dylib configuration?

  • Tested on all platforms changed
  • Added an entry to the changelog module if knowledge of this change could be valuable to users
  • Updated documentation to reflect any user-facing changes, including notes of platform-specific behavior
  • Created or updated an example program if it would help users understand this functionality
  • Updated feature matrix, if new features were added or implemented

Copy link
Member

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO all this is all a bit much for just a debug check to make developing on other platforms easier...

Why, again, isn't this just a warning that can be turned off?

@notgull
Copy link
Member Author

notgull commented Nov 23, 2024

Why, again, isn't this just a warning that can be turned off?

It's to make platforms consist. Ideally every platform would act the exact same way to prevent differences in platforms from becoming too large. Not to mention, it can be turned off via the any_thread method.

Comment on lines 651 to 665
let mut slot = mem::MaybeUninit::uninit();
let result = if self.first_entry {
self.first_entry = false;
unsafe { toolhelp::Thread32First(self.handle.as_raw_handle() as _, slot.as_mut_ptr()) }
} else {
unsafe { toolhelp::Thread32Next(self.handle.as_raw_handle() as _, slot.as_mut_ptr()) }
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to initialize the dwSize field of THREADENTRY32, right now this call just errors and no threads get iterated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@PJB3005
Copy link

PJB3005 commented Nov 23, 2024

Yeah this seems to work and is probably watertight enough. Though I have to point out it's possible for it to false negative if you terminate the real main thread by doing something like calling ExitThread() manually on it, then it might think another thread is the main thread. (https://devblogs.microsoft.com/oldnewthing/20100827-00/?p=13023)

I do still want to point out though that this really should be put behind cfg(debug_assertions). This is a lot of code to be running for a developer debug assert and there is no good reason to check any of this on release builds. Best case scenario you save some modding community or ~user with weird technical issue~ 5 years in the future.

fn new(process_id: u32) -> Result<Self, EventLoopError> {
// Take a snapshot.
let handle =
unsafe { toolhelp::CreateToolhelp32Snapshot(toolhelp::TH32CS_SNAPTHREAD, process_id) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass 0 to CreateToolhelp32Snapshot, no need to pass the current process ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the process ID in other places, so I figure why not.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the docs:

This parameter can be zero to indicate the current process.

So it's not really necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure it's slightly clearer to whoever is reading who may not be familiar with this API.

@notgull
Copy link
Member Author

notgull commented Nov 23, 2024

Though I have to point out it's possible for it to false negative if you terminate the real main thread by doing something like calling ExitThread() manually on it, then it might think another thread is the main thread.

This is weird enough and requires enough unsafe code that I'm fine if someone gets around it like this.

I do still want to point out though that this really should be put behind cfg(debug_assertions). This is a lot of code to be running for a developer debug assert and there is no good reason to check any of this on release builds. Best case scenario you save some modding community or user with weird technical issue 5 years in the future.

Like I said, all platforms should have uniform behavior. Differences between platforms should be considered a bug, save for features not being supported.

We can get a list of the threads in the process, and determine which
thread came first. This thread will be the one who called the main
function. So use this instead of the current strategy if it's available.

Signed-off-by: John Nunley <dev@notgull.net>
@PJB3005
Copy link

PJB3005 commented Nov 23, 2024

Like I said, all platforms should have uniform behavior. Differences between platforms should be considered a bug, save for features not being supported.

I understand that, however the only reason "consistency between platforms" is even being enforced here is for development reasons.

There is no situation where library consumers ever want this code to be running on a release build. Best case scenario the bomb never goes off. Worst case scenario there there is another scenario that can break our assumption, and it blows up in somebody's face on a shipped game 5 years in the future. Maybe that's triggered by some compat layer like Wine. Maybe some modder is trying to do something funny. Regardless, this check serves no positive purpose once the application leaves the developers' machine, and that is best ensured by removing it.

As a developer I myself would not like this kind of code to be present when I am shipping a game. There is no good reason to have this risk factor in a production-grade library. "You can turn it off with any_thread" sure, but how many developers are aware of this? The fact is that devs trust libraries like this to not put landmines in their application in the first place.

Think of this with a thought experiment:

  • Would you have agreed to put it behind cfg(debug_assertion) if we were not able to improve the detection over the original method, which we know is flawed in practice.
  • Can we prove that this current check is watertight and will never malfunction in the future? In 5 years? 10? 15?

By removing this you're reducing moving parts, you're removing unnecessary work (seriously, why would my game, on startup on any person's PC, seriously need to iterate all threads in the OS?), you're just reducing the chance of things going wrong.

Copy link

@PJB3005 PJB3005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think keeping the main_thread_id_via_crt at this point is pointless. I assume we can envision no scenario in which main_thread_id_via_snapshot will ever fail, but we do know scenarios in which main_thread_id_via_crt fails. Because of this we should at least remove it, so that if main_thread_id_via_snapshot fails we do not end back up with the original issue unaddressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants