Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

threadsafe: Load preferences threadsafe #453

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

jdemel
Copy link
Contributor

@jdemel jdemel commented Feb 21, 2021

Instead of static variables in a function, we store preferences in a struct and use an atomic_int to prevent any more than one thread from loading preferences. This is my first try at implementing anything thread-safe in C. Of course I'd like feedback on that.

Every VOLK kernel calls some init logic on first call. After that logic is executed the appropriate function pointer is overwritten and the correct kernel implementation is called on all subsequent calls.

@bastibl : I tried to make this threadsafe. What do you think?

@dernasherbrezon : Does that fix the issue you opened?

Fixes #440

@jdemel
Copy link
Contributor Author

jdemel commented Feb 21, 2021

The windows build failed. I was too optimistic about C11 support in MSVC in 2021... stdatomic.h is not available.

@dernasherbrezon
Copy link
Contributor

Yes, looks good. I'm mostly concerned about thread-safeness of initialisation. Current code (which init lazily) has issues with that, because any of volk_xxx method can be executed on different threads, thus causing concurrency issues on the global variable. But having separate init/deinit methods should solve this. I would expect some app init volk on startup/the main thread and call volk_xxx on separate threads. So preferences while initialised won't change. And immutable structures don't need any thread-safe checks.

@bastibl
Copy link
Member

bastibl commented Feb 22, 2021

I only had a brief look, but it seems that you don't check that initialization is complete but only check if someone started to initialize the global data structure.
In free, you also just read the atomic variable. Shouldn't this be an atomic set to make sure that only one thread frees?

@jdemel
Copy link
Contributor Author

jdemel commented Feb 22, 2021

In free, you also just read the atomic variable. Shouldn't this be an atomic set to make sure that only one thread frees?

Actually, I tried to use threads.h and I wanted to use a mutex. Though, I don't know where to initialize the mutex. Also, I would want to use a scoped_lock but I assume that's a C++ only thing. I'd appreciate hints where and how to handle the mutex init.

I assume neither C atomic nor C threads build with MSVC. In that case I'd just replace this with an int flag (aka the current behavior).

I'm mostly concerned about thread-safeness of initialisation. Current code (which init lazily) has issues with that, because any of volk_xxx method can be executed on different threads, thus causing concurrency issues on the global variable. But having separate init/deinit methods should solve this.

We have separate init/deinit code for every kernel. Only the arch prefs part is global. We load VOLK preferences only once from file. I assume we should keep it that way.

I would expect some app init volk on startup/the main thread and call volk_xxx on separate threads. So preferences while initialised won't change.

Now, and also previously, we'd be able to call the appropriate function to load preferences manually or automatically in the background. It is a convenience thing to let VOLK do this as soon as it is required.

And immutable structures don't need any thread-safe checks.

Well, basically, I just changed that. Preferences are mutable now. We can free them. Besides the values should never change no matter how often we load the preferences. At least that would violate our assumptions on how to use these.

lib/volk_prefs.c Outdated

void volk_initialize_preferences()
{
if (!atomic_fetch_and(&volk_preferences.initialized, 1)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I am not very well versed the usage of atomic variables, but I do believe that this code is flawed:

  • Being a static variable with no initializer provided, volk_preferences will be zero initialized. This means that volk_preferences.initialized will stay 0 for ever, i.e. the preferences will be loaded upon every call.
  • Other threads must be prevented from loading prefences while some thread loads them and they must be notified when the load is completed. Two states are not sufficient to implement this, three are required (as far as I can tell):
    • state 0: Not initialized so far. Any thread observing this state changes state to 1. Note that the state switch – load of state, comparison to 0 and a possible modification of state – must be performed atomically => atomic compare exchange.
    • state 1: Some thread is loading the configuration. Any other threads must wait until this operation is finished. The loading thread signals this by setting state to 2.
    • state 2: Configration is loaded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volk_preferences.initialized should be set to 1 with atomic_fetch_and(&volk_preferences.initialized, 1). It's not a constant after all.

I totally agree with your stages. I'd like to add a mutex here but unlike volk_preferences.initialized, default initialization is insufficient here. And I don't know how to initialize a mutex without user interaction.

lib/volk_prefs.c Outdated
Comment on lines 85 to 131
void volk_free_preferences()
{
if (volk_preferences.initialized) {
free(volk_preferences.volk_arch_prefs);
volk_preferences.n_arch_prefs = 0;
volk_preferences.initialized = 0;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not thread safe if two threads call the function concurrently: Thread 1 can observe initialized being true and consequently enter cleanup, while Thread 2 is already in the process of cleaning up (i.e. Thread 2 is somewhere between “comparison just finished” and “just before resetting initialized” at the time point Thread 1 performs the comparison). This issue could be “circumvented” by stating that this function is not threadsafe (i.e. its up to the used to do the right thing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. If we figure out how to add a mutex to volk_preferences I'd just use this mutex here as well.

@rear1019
Copy link
Contributor

Fixes #440

We should distinguish between the two different issues (albeit their resolution is somewhat linked):

  1. Memory leak in volk_load_preferences #440 is about a “memory leak” which we shouldn’t bother fixing (see my comments in Memory leak in volk_load_preferences #440).
  2. This pull request is about making the loading of preferences thread safe. As noted by @bastibl in Memory leak in volk_load_preferences #440 that’s not the case right now. This could potentially lead to “real” memory leaks as far as I can tell. Then again, leaking some KiB in the context of GNU Radio is a drop in the bucket…

As far as point (2) is concerned, we could do one the following:

  • Use atomic variables. Likely hard to get right (see my comments of code).
  • Use a mutex. Likely easier than using atomic variables. Problem: Mutex initialization requires user intervention (i.e. an initialization function for Volk).
  • Use thread local storage (each thread will have an own copy of loaded preferences).
  • Use call_once() [1] to guard the call of volk_load_preferences(). Requires support by platform (you surely know which one I am looking at).
  • Remove static specifiers in volk_load_prefences() (and release the allocated memory before the function returns). Will work on any platform and very easy to implement 😉.
    static volk_arch_pref_t* volk_arch_prefs;
    static size_t n_arch_prefs = 0;

[1] https://en.cppreference.com/w/c/thread/call_once

@bastibl
Copy link
Member

bastibl commented Feb 25, 2021

Since you do not want to introduce library init calls, the only option is probably thread-local storage. You do not want to acquire mutexes or read atomic variables on the fast-path, i.e., during every call to a kernel, which might void the performance gain of Volk.

@jdemel
Copy link
Contributor Author

jdemel commented Feb 25, 2021

Since you do not want to introduce library init calls, the only option is probably thread-local storage. You do not want to acquire mutexes or read atomic variables on the fast-path, i.e., during every call to a kernel, which might void the performance gain of Volk.

The code I changed here will only be executed on the very first call to a VOLK kernel.

${kern.name} = &__${kern.name}_d;

This line makes all the difference. Before you call a kernel the first time, the function pointer points to the init function. At the end of the init function, this pointer is overwritten according to the values gathered in the init function. Afterwards, this kernel will never touch any init code again.
My PR here changes how the preferences file is loaded. This should only happen once for all kernels.
The current behavior is an integer flag that is supposed to prevent reload. Though, this is not thread-safe. I started to work on a thread-safe implementation that also enables to deinit this structure.

@jdemel
Copy link
Contributor Author

jdemel commented Feb 25, 2021

Fixes #440

We should distinguish between the two different issues (albeit their resolution is somewhat linked):

1. #440 is about a “memory leak” which we shouldn’t bother fixing (see my comments in #440).

I'd consider this PR a solution for this leak. You may deinit your memory if you need to. Although, this is not a common use case.

2. This pull request is about making the loading of preferences thread safe. As noted by @bastibl in #440 that’s not the case right now. This could potentially lead to “real” memory leaks as far as I can tell. Then again, leaking some KiB in the context of GNU Radio is a drop in the bucket…

The thread-safe part is what I want to figure out. Maybe there isn't a good solution. Thread-local storage seems to be quite heavy for every kernel.
At least in my experience GNU Radio is rather light on memory considering the amount of data you may pass through. Also, I don't want to consume memory if that's not strictly necessary.

As far as point (2) is concerned, we could do one the following:

* Use atomic variables. Likely hard to get right (see my comments of code).

* Use a mutex. Likely easier than using atomic variables. Problem: Mutex initialization requires user intervention (i.e. an initialization function for Volk).

* Use thread local storage (each thread will have an own copy of loaded preferences).

* Use `call_once()` [1] to guard the call of `volk_load_preferences()`. Requires support by platform (you surely know which one I am looking at).

I had a look at it. This would be a good solution if we skip the "deinit" part of my solution. Otherwise, we'd need a way to re-enable to load preferences.
I'm aware that a specific platform lacks support for the C standard. In this case, we might just go with the current int solution and add a note that issues regarding this behavior need to be redirected to the specific platform developer.

* Remove `static` specifiers in `volk_load_prefences()` (and release the allocated memory before the function returns). Will work on any platform and very easy to implement wink.
  https://github.com/gnuradio/volk/blob/797b0ac846858d081fbb53ed50e98765ec9cb6b2/lib/volk_rank_archs.c#L56-L57

This would get us close to a thread-local storage solution. Individual first kernel calls might take longer but this should not be a performance issue.

We might be able to useatomic_exchange to or some variation thereof. We test if the structure points to NULL. That's just a random thought for now.

[1] https://en.cppreference.com/w/c/thread/call_once

@jdemel
Copy link
Contributor Author

jdemel commented Mar 4, 2021

I think I was able to resolve all those thread safety concern and add the option to release the struct.
Though, I use threads.h to get a mutex. This seems to be an optional part of C that is not widely available. I didn't have any issues with GCC 9 on my machine but I assume basically all tests in our CI work without it because I saw issues that the GCC10 test could not find threads.h.
Also, there's still an issue with MSVC

#ifndef __STDC_NO_THREADS__

should fix everything but it doesn't. MSVC seems to leave this flag undefined and doesn't provide threads.h.

@rear1019
Copy link
Contributor

rear1019 commented Mar 5, 2021

Thread-local storage seems to be quite heavy for every kernel.

The storage is per thread, not per kernel. On the other hand using thread-local storage is not a good idea if you want to fix the “bug” from #440 – with TLS the deallocation must be performed per thread.

Though, I use threads.h to get a mutex. This seems to be an optional part of C that is not widely available.

Indeed. As per [1] it’s only optional and requires C library support (as opposed to compiler support). Support in glibc is only available since version 2.28, released in 2018-08-01 [1][2]. No support in MSVC yet [3] (search for “What’s not”).

Also, there's still an issue with MSVC

#ifndef __STDC_NO_THREADS__

should fix everything but it doesn't. MSVC seems to leave this flag undefined and doesn't provide threads.h.

Mh, maybe a check for the C version is required, i.e. #if __STDC_VERSION__ >= 201112L && ! __STDC_NO_THREADS__? I mean, one can not expect a compiler to be aware of any future feature.

Given the trouble involved with using C11’s <threads.h>, you should consider removing the static keyword from

static volk_arch_pref_t* volk_arch_prefs;
static size_t n_arch_prefs = 0;

This will work on any platfrom, is thread safe and will fix #440 (just don’t forget to add a free()). The only downside is that the configuration will be parsed upon every very first call of a kernel.

[1] https://gcc.gnu.org/wiki/C11Status
[2] https://sourceware.org/legacy-ml/libc-alpha/2018-08/msg00003.html
[3] https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-support-arriving-in-msvc/

Copy link
Contributor

@rear1019 rear1019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to mess around with a mutex at all. Just guard the call of volk_load_preferences() in volk_rank_arch.c with call_once(), i.e.

call_once(&flag, volk_load_preferences);

Make volk_load_preferences() save its results into static variables defined at file scope (i.e. outside of any function) instead of returning a value respectively writing into a passed argument.

EDIT: See [1]

[1] https://man7.org/linux/man-pages/man3/pthread_once.3p.html#RATIONALE

@jdemel
Copy link
Contributor Author

jdemel commented Mar 6, 2021

We start our CMakeLists.txt with

set(CMAKE_C_STANDARD 11)

I'd expect that this makes compilers without C11 standard available fail. I stumbled over a code section in VOLK that checks for C99 availability. I'd like to not have any such preprocessor checks but rely on the initial CMake setting and expect it to fail in case a compiler does not support C11. Am I writing up a wishlist here?

@rear1019 I hope I did remove static. It's part of this PR.

static volk_arch_pref_t* volk_arch_prefs;
static size_t n_arch_prefs = 0;

Do you suggest to reduce this PR to just removing static and be done with it?

My idea here is to make sure the mutex is initialized with:

 call_once(&mutex_init_once_flag, init_struct_mutex);

Afterwards I we can rely on that mutex to ensure thread safety AND load/free the configuration as often as we want. If we only had a call_once guard to load our configuration once, we'd better not allow deinit.

From you suggestions I see 2 routes to simplify this PR

  1. Reload config whenever a new kernel is first used per thread. Fixes thread safety. Removes permanently alloc'd memory.
  2. Load config once behind a call_once and return references to the initialized structure. In this case VOLK requires some memory permanently. This should amount to ~50k byte.

Still, in case threads.h is not available we'd probably just fall back to the current implementation.

@bastibl
Copy link
Member

bastibl commented Mar 8, 2021

This line makes all the difference. Before you call a kernel the first time, the function pointer points to the init function. At the end of the init function, this pointer is overwritten according to the values gathered in the init function. Afterwards, this kernel will never touch any init code again.

I only have a rough understanding, but is this assignment to the global variable (i.e., the global function pointer) synchronized? My understanding is that it's not, which is why I thought init/deinit would make sense (since you don't want to do something expensive on the fast-path). Maybe it can also work with TLS, but I think that the current approach doesn't work, since it doesn't protect the global function pointer. Or did I overlook something?

@jdemel
Copy link
Contributor Author

jdemel commented Mar 9, 2021

I only have a rough understanding, but is this assignment to the global variable (i.e., the global function pointer) synchronized?
No, this assignment is unsynchronized. It follows an "ancient" GR tradition to assume that anything int'ish works in a multi-threaded environment without sync.
I assume the worst that could happen is another call to the init function. Either the function pointer is updated or not. I assume that CPUs don't write 32/64bit values in more than one instruction. Thus, one does never read gibberish from that variable.

My understanding is that it's not, which is why I thought init/deinit would make sense (since you don't want to do something expensive on the fast-path). Maybe it can also work with TLS, but I think that the current approach doesn't work, since it doesn't protect the global function pointer. Or did I overlook something?
So, you suggest to:

  1. use thread local storage for init and function pointers, or
  2. sync that write to the global function pointer.
  3. Synchronize all the read/write init stuff.
    All options have their own pros and cons. I'd suggest to discuss the function pointer part in a different Issue/PR.

@bastibl
Copy link
Member

bastibl commented Mar 9, 2021

The point is that "sync that write to the global function pointer", which is what your current approach would require is a bad idea, since it would introduce synchronization primitives on the fast path. Anyway, I give up on this issue :-)

@rear1019
Copy link
Contributor

My idea here is to make sure the mutex is initialized with:

call_once(&mutex_init_once_flag, init_struct_mutex);

Afterwards I we can rely on that mutex to ensure thread safety AND load/free the configuration as often as we want. If we only had a call_once guard to load our configuration once, we'd better not allow deinit.

My bad, I overlooked that deinit is thread safe as well. However, see point (2) below for rationale.

Do you suggest to reduce this PR to just removing static and be done with it?

Yes, see point (1) below.

From you suggestions I see 2 routes to simplify this PR

  1. Reload config whenever a new kernel is first used per thread. Fixes thread safety. Removes permanently alloc'd memory.
  2. Load config once behind a call_once and return references to the initialized structure. In this case VOLK requires some memory permanently. This should amount to ~50k byte.
  1. Other benefits of just removing static: Works an any platform and fixes the “memory leak” from Memory leak in volk_load_preferences #440 without any user intervention.
  2. I still believe that we should not bother to fix Memory leak in volk_load_preferences #440. If you don’t want to take the perfomance hit of (1) (probably unrelevant anyway), take this route (guard configuration loading without introducing any user visible init/deinit functions). Then again, (1) is a much simpler solution.

@jdemel
Copy link
Contributor Author

jdemel commented Jan 14, 2023

MSVC may receive atomic support soon-ish.
In case MSVC supports threads as well, this PR would be interesting again.
As a side note: the most exciting feature that we'd wait for is complex.h support in MSVC.

@jdemel jdemel marked this pull request as draft January 14, 2023 14:56
Instead of static variables in a function, we store preferences in a
struct and use an `atomic_int` to prevent any more than one thread from
loading preferences.

Fixes gnuradio#440

Signed-off-by: Johannes Demel <demel@uni-bremen.de>
We initialize a mutex with `call_once` and then use this mutex to
protect the init and deinit portion of our struct handlers.

Signed-off-by: Johannes Demel <demel@uni-bremen.de>
Signed-off-by: Johannes Demel <demel@uni-bremen.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak in volk_load_preferences
4 participants