You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A procedurally generated open world RPG was the original goal -- who knows what the final form will take. Originally, I was simply opening a port using UPNP class; since it sleeps the thread until it finishes.. got a warning that said I "had to call wait_to_finish", and I dug deeper...
Describe the problem or limitation you are having in your project
Encountered a bug in a warning in Thread; seen C++ standard library threads being used and wept for Windows user's performance and "black box voodoo" nature of C++ library since it's different on each system.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
Win32 threads have multiple ways to create them; and only one way is technically valid thanks to a bug. Who knows what the C++ library is going to use? _beginthreadex is the only safe version. CreateThread and others can overwrite static memory internally from a second call to CreateThread, thus returning invalid handles. We can hope and pray that the standard library is thoroughly hardened and tested for this -- or we can write it ourselves -- which we should do anyway because of the mutex potential speedup.
Win32 mutex is implemented at the kernel level. Every time a lock/unlock happens, a jump to ring level 0 happens, and this time is quite significant. This is most likely what std::mutex will use. Win32 has CriticalSection which acts very much like a mutex, however it is a spin lock. There's now a great chance of not needing to enter the kernel ring, thus saving huge amounts of time in naively written code, and worst case is as slow as a Win32 mutex.
I did not have to write an example (thank you, StackOverflow)
... for 1,000,000 uncontended acquires and releases, a mutex takes over one second. A critical section takes ~50 ms for 1,000,000 acquires. ...
this is a >20x speedup -- and with Juan's recent post about AAA-readiness, it makes a lot of sense to me to take advantage of speed where able, even if it's not likely to be used much by hobbyists.
Further, I've been informed that it previously was written in an OS-specific manner before, however it was decided that it would be better to trust the black-box that is the standard C++ library for this, despite it also not offering a way to cancel/kill a thread, and despite the likely speed concerns. It seems quite backwards to throw out working code just to rewrite it using something that ends up being a more poor fit for the job, and if this proposal is accepted, it will be a third time.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Most platforms support pthreads. Use them everywhere except Windows. Windows: use _beginthreadex for threading, Win32 CriticalSection for mutex, Win32 Semaphore for semaphore.
#defineWIN32_LEAN_AND_MEAN1
#include<Windows.h>
#include<process.h>
#include<synchapi.h>
#include<cstdint>//declaration (headers)//os.hpptypedefvoid* NativeHandle;
structThread {
typedefvoid* (*Func)(void*);
NativeHandle handle;
Func func;
void *data;
Thread();
~Thread();
boolcan_start() const;
boolstart(Func, void*);
boolhas_started() const;
boolcan_join() const;
booltry_join();
void* join();
boolcan_detach() const;
voiddetach();
boolhas_detached() const;
boolcan_cancel() const;
voidcancel();
};
structMutex {
NativeHandle handle;
Mutex();
~Mutex();
voidlock();
booltry_lock();
voidunlock();
};
structSemaphore {
NativeHandle handle;
Semaphore();
~Semaphore();
voidpost();
voidwait();
booltry_wait();
};
//implementation (source) follows//win32/thread.cppstaticuint32_twin32_thread_begin(void *obj) {
Thread *t = (Thread*)obj;
t->data = t->func(t->data);
return0;
}
Thread::Thread(): handle(nullptr), func(nullptr), data(nullptr) { }
Thread::~Thread() {
if(has_started()) {
//detach is what we should do; if the app is closing, the OS will//terminate threads still running -- this is unsafe for us to do//ourselves heredetach();
}
}
boolThread::can_start() const {
return handle == nullptr;
}
boolThread::start(Thread::Func fn, void *data) {
if(!can_start()) {
returnfalse;
}
//set data in thread objectthis->func = fn;
this->data = data;
//spawn thread
handle = (NativeHandle)_beginthreadex(
nullptr, //default security4096, //stack size TODO: customizable?
win32_thread_begin,
(void*)this,
0, //initflagnullptr//thrdaddr
);
//if handle is 0, it failedif((size_t)handle == 0) {
//TODO: Error; sets errno and _doserrno//EAGAIN too many threads, EINVAL if stack size incorrect -- arg invalid//EACCES insufficient resources (memory)returnfalse;
}
returntrue;
}
boolThread::has_started() const {
return handle != nullptr;
}
boolThread::can_join() const {
returnhas_started()
&& (WaitForSingleObject((HANDLE)handle, 0) == WAIT_OBJECT_0);
}
boolThread::try_join() {
if(can_join()) {
join();
returntrue;
}
returnfalse;
}
void* Thread::join() {
DWORD r = WaitForSingleObject((HANDLE)handle, INFINITE);
switch(r) {
case WAIT_OBJECT_0:
return data;
}
returnnullptr;
}
boolThread::can_detach() const {
returnhas_started();
}
voidThread::detach() {
if(!can_detach()) return;
//win32 detach means closing the handle; does NOT cancel a thread!CloseHandle((HANDLE)handle);
//VITAL: Set handle to nullptr so can_detach, can_join fail on this thread//object; does handle need to be atomic?
handle = nullptr;
}
boolThread::has_detached() const {
return !has_started();
}
boolThread::can_cancel() const {
returnhas_started();
}
voidThread::cancel() {
if(!can_cancel()) {
return;
}
BOOL b = TerminateThread((HANDLE)handle, 0);
detach();
}
//win32/mutex.cppMutex::Mutex(): handle(nullptr) {
//allocate memory for CRITICAL_SECTION//TODO: should this be in a pool tagged os?void *mem = std::malloc(sizeof(CRITICAL_SECTION));
if(mem == nullptr) {
//TODO: use our own logging systemstd::perror("Mutex::Mutex()");
return;
}
//initialize CRITICAL_SECTIONInitializeCriticalSection((LPCRITICAL_SECTION)mem);
//set handle to memory ptr
handle = (NativeHandle)mem;
}
Mutex::~Mutex() {
if(handle != nullptr) {
DeleteCriticalSection((LPCRITICAL_SECTION)handle);
std::free(handle);
handle = nullptr;
}
}
voidMutex::lock() {
EnterCriticalSection((LPCRITICAL_SECTION)handle);
}
boolMutex::try_lock() {
//TryEnterCriticalSection returns zero when another thread owns itreturnTryEnterCriticalSection((LPCRITICAL_SECTION)handle) != 0;
}
voidMutex::unlock() {
LeaveCriticalSection((LPCRITICAL_SECTION)handle);
}
//win32/semaphore.cppSemaphore::Semaphore(): handle(nullptr) {
handle = (NativeHandle)CreateSemaphoreExW(
nullptr, //default security attributes0, //initial count2147483647, //32 bit long maxnullptr, //LPCWSTR name -- allows IPC0,
DELETE | SYNCHRONIZE | SEMAPHORE_MODIFY_STATE
);
}
Semaphore::~Semaphore() {
CloseHandle((HANDLE)handle);
}
voidSemaphore::post() {
ReleaseSemaphore((HANDLE)handle, 1, nullptr);
}
voidSemaphore::wait() {
WaitForSingleObject((HANDLE)handle, INFINITE);
}
boolSemaphore::try_wait() {
returnWaitForSingleObject((HANDLE)handle, 0) == 0;
}
This is pulled from my own hobby engine's core library as it currently stands and smashed into one "file" here that should compile and be useful for any win32 implementation. All rights to my code here provided to any Godot devs that are interested in tackling this.
If this enhancement will not be used often, can it be worked around with a few lines of script?
As this is a core performance related proposal, no it cannot be worked around.
Is there a reason why this should be core and not an add-on in the asset library?
This is a proposal for modification to a core component.
The text was updated successfully, but these errors were encountered:
Regarding performance, can you provide an example of a project that ended up with performance issues because of the Windows implementation of std::mutex, or performance measurement of the Godot editor that show how much time is wasted locking/unlocking during, say, importing, or some other real-world case?
Regarding C++ threads not being killable, that's OK. You don't want to kill them anyway, but to let them finish and perform proper cleanup.
Also, as I already told, Godot has its own spin lock implementation, used in performance-critical areas.
I can share some of your point about being a bit uneasy about how the C++ library is implementing thread and sync classes, but so far it has worked perfectly.
Considering during a release candidate, a threading lib was added? Turns out this still needed done. I don't understand why I bother trying to contribute, I really don't.
Threads need to be killable. Ever hear of a cancel button on a long operation?
Describe the project you are working on
A procedurally generated open world RPG was the original goal -- who knows what the final form will take. Originally, I was simply opening a port using UPNP class; since it sleeps the thread until it finishes.. got a warning that said I "had to call wait_to_finish", and I dug deeper...
Describe the problem or limitation you are having in your project
Encountered a bug in a warning in Thread; seen C++ standard library threads being used and wept for Windows user's performance and "black box voodoo" nature of C++ library since it's different on each system.
Describe the feature / enhancement and how it helps to overcome the problem or limitation
Win32 threads have multiple ways to create them; and only one way is technically valid thanks to a bug. Who knows what the C++ library is going to use? _beginthreadex is the only safe version. CreateThread and others can overwrite static memory internally from a second call to CreateThread, thus returning invalid handles. We can hope and pray that the standard library is thoroughly hardened and tested for this -- or we can write it ourselves -- which we should do anyway because of the mutex potential speedup.
Win32 mutex is implemented at the kernel level. Every time a lock/unlock happens, a jump to ring level 0 happens, and this time is quite significant. This is most likely what std::mutex will use. Win32 has CriticalSection which acts very much like a mutex, however it is a spin lock. There's now a great chance of not needing to enter the kernel ring, thus saving huge amounts of time in naively written code, and worst case is as slow as a Win32 mutex.
I did not have to write an example (thank you, StackOverflow)
this is a >20x speedup -- and with Juan's recent post about AAA-readiness, it makes a lot of sense to me to take advantage of speed where able, even if it's not likely to be used much by hobbyists.
Further, I've been informed that it previously was written in an OS-specific manner before, however it was decided that it would be better to trust the black-box that is the standard C++ library for this, despite it also not offering a way to cancel/kill a thread, and despite the likely speed concerns. It seems quite backwards to throw out working code just to rewrite it using something that ends up being a more poor fit for the job, and if this proposal is accepted, it will be a third time.
Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams
Most platforms support pthreads. Use them everywhere except Windows. Windows: use _beginthreadex for threading, Win32 CriticalSection for mutex, Win32 Semaphore for semaphore.
This is pulled from my own hobby engine's core library as it currently stands and smashed into one "file" here that should compile and be useful for any win32 implementation. All rights to my code here provided to any Godot devs that are interested in tackling this.
If this enhancement will not be used often, can it be worked around with a few lines of script?
As this is a core performance related proposal, no it cannot be worked around.
Is there a reason why this should be core and not an add-on in the asset library?
This is a proposal for modification to a core component.
The text was updated successfully, but these errors were encountered: