Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: Increase the speed of pthread_mutex_lock / pthread_mutex_unlock #1342

Open
winspool opened this issue Oct 13, 2024 · 2 comments
Labels
CRTL C run-time library enhancement

Comments

@winspool
Copy link
Contributor

winspool commented Oct 13, 2024

I saw an article about the speed of pthread_mutex implementations ( https://justine.lol/mutex/ )
and tried the mentioned test program (high contended scenario) with OpenWatcom.
The test program uses one counter, 30 threads and 100000 iterations per thread.

Yes, that is just a test program, that makes an implementation detail visible,
but with the default thread count, the OpenWatcom compiled programm uses 100% cpu,
until i canceled it after ~5 minutes.

The test was created to compare the mutex implementation of the cosmopolitan libc (64 bit)
and a part of the mutex implementation is based on nsync: https://github.com/google/nsync
(nsync supports many more cpu`s, including x86, alpha, mips, ppc, ...)

When using all Hardware threads (12 on my system), the mutex implementation of cosmopolitan (64bit)
is more than twice as fast as glibc (64bit),
and the mutex in cosmopolitan is more than 100 times faster as the mutex in OpenWatcom (32bit).
OpenWatcom is still about 35 times slower as glibc (32bit)

12 threads (supported by my hardware): 
cosmopolitan libc(64bit): 16,2ms
glibc(64bit): 36,3ms / glibc(32bit): 47,0ms
OpenWatcom: 1682ms (1,68s)

When the program uses more threads as available in hardware,
the slowdown factor for the OpenWatcom compiled program increases dramatically to about
~ 7400 compared to cosmopolitan(64bit)
~ 2500 compared to glibc(32bit)

13 threads:
cosmopolitan libc(64bit): 17,3ms
glibc(64bit): 39,9ms / glibc(32bit): 50,4ms
OpenWatcom: 128819ms (128,8s)

Even when using only 4 threads, OpenWatcom is ~59 times slower than glibc(32bit)
With only one or two threads, OpenWatcom is >100 times slower as glibc(32bit)

I also retested with 25 threads:

cosmopolitan libc(64bit): 31,4ms
glibc(64bit): 77,6ms / glibc(32bit): 89,7ms
OpenWatcom: 694362ms (694s) (A second run took 705 seconds)

########

The worker code for the thread is simple, but really slow, when compiled with OpenWatcom.

pthread_mutex_t g_locker = PTHREAD_MUTEX_INITIALIZER;
int g_chores;

void *worker(void *arg) {
    for (int i = 0; i < ITERATIONS; ++i)
    {
        pthread_mutex_lock(&g_locker);
        ++g_chores;
        pthread_mutex_unlock(&g_locker);
    }
    return 0;
}

The Test program was adapted to work with OpenWatcom (rusage is not available):
pthread_mutex.c.txt

The implementation of pthread_mutex_lock is in ptmutex.c
The Author mentioned at the head of the file is J. Armstrong.
Is he still around?

pthread_mutex_lock uses sem_wait and sem_wait uses the futex syscall.

Is there something that can be changed, to handle some sync cases faster?

Is it possible to use some code or logic from other pthread_mutex_* or other semaphore implementations?

@winspool
Copy link
Contributor Author

Some links to Futex and CPU cache related documents:

Jens Gustedt "Futex based locks for C11’s generic atomics"

Ulrich Drepper's paper "Futexes Are Tricky"
Ulrich Drepper's paper "What Every Programmer Should Know About Memory"

@jmalak jmalak added enhancement CRTL C run-time library labels Oct 13, 2024
@ArmstrongJ
Copy link
Member

The pthread mutex code I implemented is a bit of a mess mostly because I did run into some issues with futexes. The issue isn't using futexes, it has something to do with the implementation of semaphores/futexes in OW. I remember it being a problem when I committed the code, but I don't remember the specifics. It should "work," but there is a time-consuming workaround if I'm not mistaken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CRTL C run-time library enhancement
Projects
None yet
Development

No branches or pull requests

3 participants