Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes occurring on multiple objects when changing fftsettings #386

Open
MattS6464 opened this issue Aug 10, 2023 · 54 comments
Open

Crashes occurring on multiple objects when changing fftsettings #386

MattS6464 opened this issue Aug 10, 2023 · 54 comments

Comments

@MattS6464
Copy link
Contributor

MattS6464 commented Aug 10, 2023

Crashes occur in the help files of fluid.melbands~, fluid.sinefeature~ and fluid.mfcc~. To recreate the crash, enter the help file for any of these objects, turn on audio, then rapidly increase the third parameter of fftsettings. This is occurring on a debut/test build.

@MattS6464 MattS6464 changed the title Crashes occurring on ,u Crashes occurring on multiple objects when changing fftsettings Aug 10, 2023
@tremblap
Copy link
Member

tremblap commented Aug 10, 2023

I got it to crash with 1) audio on and 2) wiggle the FFT size. it is not all the time:


----------begin_max5_patcher----------
446.3ocuTEsaBCBE841uBBOtzYJXsV2S6+XYwfsnCSk1.TmNi9sO3Rq5VLpw
47k1bObgy8btWXSX.dR0JtFidA8FJHXSXP..4.BZiCvKXqxKYZHMrj+Y0j43
H+RF9JC.+zNTbGXMyj+gPNarhma7mMIMsWbDhFm490OFBn8hQu2tGYyhpFSI
2.rPZQ8Pl00b+wf0hYRVId+1DE.61J54g3CGkP1cRTG11vP2mn+lDyWmWx2g
HT54D5.pSaDRh6WqpuuBM4+VnSKaDE8zlolZal6PuNcpQaqNqR0nADJ5YBJi
Lhd49MIcDX.jK4CzK3CQH7I8BxI8BxM3ELiQ0HNijnIi7cVBnkrQ2Zq8jB4H
yTwVvMb0XtjMoD1Tb6ZEBccIa8hpB.teKrqz8MtC8oGuuj.9BYP7c0W5eM9x
dCvV4NOP7E+h5GHCWJj+94OPhN7eZJ5pFUdWo1VYnCZrfqs9NyHpjGkCwmyI
s8qkGxUvyv6.OIOH8PuId7sKVc8RtR2lLPgsmOuB59YQPnP5CG.gJ9RQW9dD
lxNAariuMJnrvqR8unhc2qTR6vteTwJNKkvMAoczSWy75.tvDtM7a7IBztK
-----------end_max5_patcher-----------

this is annoying. but it was the same as the OP which are 100% crashing - for instance, sines~ every time...

@tremblap
Copy link
Member

this does it:


----------begin_max5_patcher----------
503.3ocuU1raiCBDG+r8SAhiqbiL3OpSO02ipUUDaRJUNXK.2MYqZd1WXHdS
xtNMNQt8hwL.ye9MCC7dX.dQyFtFid.8DJH38vf.vjyPv99A30rMk0LMLMro
Y0pZNNxOjrasPVyMvXj8FaYlxWDxUOq3kFuuuOeVbDhlk.Mo8eQ+7feZ5L+q
i7lLaa4dufERq+5WinB1PMKd8tB7ekVwVyMb0ybIaQMrtX2PeDF59DMRJk7e
Y8auWM7M.G3JV4tAQmddzIIdnmCMIYPG5YYO9+oKGOcD7icn3YjakAe9KM9B
Lb17mVrRxpGLElMgPVtsrluCQnzqFTxbGaj6Itl77u.PSmPPWV2IploERtdG
5Q6JVtznE+linwoEWaMZeRlBTSxKt.7IW.9ni+avPA4VBELiQ0I9T3pD51Z1
10MU7i2nCvLkbBrEyGYB2sI7Y.a.2Bu0qZ74hGCxNcpuxZDwkOI.LGB.jr3q
M.b3P20wexX4GVLt1cF+z2o.Zb1OMnna5Tk8Ru+wAzAbp3Za1hYDMxilSteN
CF1GqNYeS5PFgNYSfNzQnCYBzI8aRmjaRG+wOVa6abkd+jAIrm7esApAJhft
BouaFzUweSzOeuElxV7YrUdcJXag2jmh8K0dQkRZKdcEXgN3rRBU1Raogtk4
4.t.H7iv+3CLD0L
-----------end_max5_patcher-----------

@lewardo
Copy link
Member

lewardo commented Aug 10, 2023

On MSVC Debug build both patches fail in same glamorous way, crash when spamming the third FFT parameter

FFT.hpp:86 fails assert on newSize <= mMaxSize

@tremblap
Copy link
Member

I am sad to say that Windows has a better report - on Mac I cannot get it to print that assert string...

@tremblap
Copy link
Member

@weefuzzy might cry, @AlexHarker might too. It seems all maxAttributes have a similar behaviour...

@lewardo
Copy link
Member

lewardo commented Aug 10, 2023

managed to get it to crash, took some dedicated hwiggling
image

@weefuzzy
Copy link
Member

weefuzzy commented Aug 10, 2023

This didn't used to happen, right? All that assiduous testing of help files on Windows would've caught it. So, is there a release version where this doesn't happen? If so, isolating the breaking change would be The Way (not least, for working out if this is a flucoma-max bug or a flucoma-core bug)

@AlexHarker
Copy link
Contributor

@tremblap - can you repro on Mac?

@AlexHarker
Copy link
Contributor

I'm going to bet this is threading issue. Look at this point in the code:

https://github.com/flucoma/flucoma-core/blob/2681d58fd4642c67a4135a28b7e19b0a5f9d5972/include/clients/common/ParameterTypes.hpp#L404

As far as I can see the member variable of the struct are set incorrectly at this point (and yet to be clipped) - if the audio thread reads at this point it will read incorrectly. @weefuzzy - am I wrong?

I would note also that because these settings should be read atomically, but are not written atomically there is the possibility of reading them in a non-related state.

Welcome to the horror of multi-threading - I will be happy to collect my prize now (BTW if I am correct then running in SIAI and changing the values in the high-priority thread only should never crash).

@AlexHarker
Copy link
Contributor

AlexHarker commented Aug 10, 2023

Actually I think that would only be correct if v were the object state, but I think v should be a thread-local variable - the last bit (about the three values not writing atomically) is probably still true, but each element should still be atomic so it's hard to see how slicing could occur on the FFT size in this context...

The copy into the client parameters should be happening here if I understand the code correctly:

x->params().template set<N>(std::move(a),

This might be a good place to check for spurious values (or just above it). If a can ever have spurious values then it would be a case of working back from there.

@weefuzzy
Copy link
Member

Yeah, if it were that foundational, I feel like it would have bitten already. (Still, not ruling out threading badness, just on principle).

The last time there were problems of this sort (cascades of assertion deaths due to FFT size stuff) it was because object-initialization-time maxima weren't getting set properly, so algos would end up allocating too little space (which then gets picked up later by the assertions).

So, another thing to check is what mMaxSize is at the assertion site, and whether it is what one expects it to be.

@tremblap
Copy link
Member

2 bulk answers:
@AlexHarker yes I can on Mac too but the message was less clear than Windows. In both cases it always needs serious wiggling and is unpredictable for stftpass. on the helpfile of melband or sines it is quite more immediate and easier to reproduce.

@weefuzzy no it didn't use to happen but I wonder if we tested that extensively in debug on windows. I mean, this needs serious wiggling in debug to crash, hence the team finding it (we roll that way now to find more of these and understand the framework using debugger to spy on states)

1 question: should we recompile in debug on the last public release(s) until it stops crashing as recommended or are you further down the rabbit hole?

@weefuzzy
Copy link
Member

I'm nowhere near the rabbit hole, and will only be able to issue vague advice, which is:

  • yes, debug builds back through time until the badness vanishes. git-bisect is helpful for this, and you can just build a single errant object whilst you're going, to speed things up
  • the badness still needs to be diagnosed as well, so looking at what the values are when the assertion fires will be helpful for reasoning about this. (e.g. maybe when I thought I'd fixed this last time, I hadn't, which would point to init-issues). So, run with a debugger attached when reproducing

@tremblap
Copy link
Member

I was implying @AlexHarker might have been near the whole, not you dear @weefuzzy - sorry if that wasn't clear. vague advice is most welcome in all cases.

@AlexHarker
Copy link
Contributor

I am only near the parts @tremblap.

I would suggest some kind of console output at the point in the max wrapper above that can identify if there are spurious values there. If we discover yes we look at which values of the proposed value/client are spurious at that point and work back.

I can't repro on my current build/machine/Max version and I'd advise that we try to find a patch that auto-crashes rather than through lengthy wiggling.

@AlexHarker
Copy link
Contributor

E.g. metro to "wiggle"

@AlexHarker
Copy link
Contributor

I'd also suggest Max bug info here to make sure we are testing the same versions of Max and flucoma.

@AlexHarker
Copy link
Contributor

I can get consistent (but different) assert crashes if I build in debug mode and I'm pretty sure this is due to a non-correct object state due to threading issues, although I haven't fully traced it through.

Here's my theory - in the line identified above the FFTParams object is moved/copied (and I suspect in practice it is the latter). Each 64bit is likely copied atomically, so the first few values might (for instance) make it through the copy before the audio thread asks to read the values. At this point perhaps the trackers have not been copied and thus it looks like there's no resizing to do - but the values eventually update and by the time we are processing the larger frame size is in action. There's no easy (cost-free) fix if this is the case, and this is unlikely to be a new bug if I'm right. It could also affect any other parameters where what needs to be written is larger than a single 64 bit piece of POD.

@AlexHarker
Copy link
Contributor

AlexHarker commented Aug 10, 2023

OK - two things. I am still getting 100% consistent eigen_assert() issues with fluid.sines that are not the crashes above. However, if I comment out that assert I can get the two crashes @lewardo got above after a bit of wiggling (manual or automated)

I must now retract my previous retraction, because:

auto& a = x->params().template get<N>();

So in the setter for the FFTParam attribute the work is done directly on the client object, which is very much not threadsafe (in addition to the threadsafety issue from the copy, which in effect might not actually copy...). It looks like there are at least two bugs I'm seeing, one of which (the main one reported here) points to some fairly nasty parameter threadsafety problems. And whilst @weefuzzy says:

Yeah, if it were that foundational, I feel like it would have bitten already.

I don't actually think this is necessarily true. Threading bugs are by their nature non-deterministic and can commonly be subject to machine architecture etc., so they could exhibit more readily on some machines than others (we are not testing on the same OS, nor processors here) and they can be very occasional. When I first starting tracing a framelib threading issue last year it was taking 30 mins - 2 hours before I'd see a crash (until I devised some faster crashing code).

So - let's try to get to the bottom of the threading question.

@MattS6464 / @lewardo / @tremblap - please run the patch using a debug build (not an optimised one which won't have the asserts) below in two modes:

1 - overdrive on but schedular in audio interrupt off
2 - overdrive on and scheduler in audio interrupt on

Then post here to say whether 1/2 crash - if it's a yes for 1 and a no for 2 then it's almost certain we have a threadsafety issue.


----------begin_max5_patcher----------
630.3ocyW10aaBCEF9Z3WgkUupiFgMeDxtp+Ollpb.WlqHFD1jkrple6ye.M
IKnEfQh1MP7I190Omiyqcd20AttbGU.AeE7Mfiy6tNNlP5.NsscfaH6RKHBS
2fanBAImB8remjtSZh+5qRAUJY7bA.4iCAOg.Of55FuYCiWPkl4.2FrhHS+g
Z.uTSSk10.1OZguG.EfMuv95WXUCv2ONSkMxtoB0F0FRtuhZmGH7yAvxLqux
0u8DJDpi8gqq9g2.IlS+oZvW.LcWUM3AFB7HHveIF7EC18BL5u.bzRCo9HK2
yLv3o.rrLOu3yJbEolrgJo0uP4j0EF87GKkgqzbEEXnKr64XYjwk8iIZFqqJ
VqKUEDvyjTIaKEL9MwVZQ1JKJXpkz0Dddu7t7+ip5xXCY2ppZxLVTyHoGFaY
DEXAak4UPzUJi9WRP7LRviG.9KPSkAaMJbxtKBVNmTzaYJZFgLceZA8fx2GO
ZPQs+ly3iFGeC.cNO730hFV1BAiSEG.OqFg93S1unpC.CSF6YHcEYrgZTbxU
fO3Jv6c5m52vcJoBhTV2vFhATFSTUP1uoLid55cvm1fNKQjr5e+LU8Z+OulC
7x7B91lWFaBXkIAfh7mwDvw8p8vevo7aFKrPuG+7KXZVw53mmTDkM0ocJ2dc
MOvw0bFUnx6DIqjeZmr8o279fEBODgBmCkPCQI7LnzxgHzbj7BtSEo60lA78
ZS2.zIZFzIZ.5DOC5jLIcrlDjppszZQamMRnLZdqzX4j3YZx31lQll0zsrt9
aiPpU1fRkGXSsYYA2Ea+2XP84H0bkEq1pyUCmRRi+KWY0JpHVNL1zte39avv
a0Wi
-----------end_max5_patcher-----------

@AlexHarker
Copy link
Contributor

So far I can get 1 to crash but not 2...

I will be happy to collect my prize now

@weefuzzy
Copy link
Member

I must now retract my previous retraction, because:

auto& a = x->params().template get<N>();

Looks like I durped that last year, from inspecting the blame. Come what may, that auto& should become a copy, I think.

So, there's always been the potential for threading badness because the wrapper doesn't do anything more robust than the same spinlock that the SDK examples use, and you're quite right that there's no pain free fix. My non-thought-through idea was that using a FIFO design similar to scsynth would be nicer (as in easier to reason about), but never had time to see that through.

@AlexHarker
Copy link
Contributor

AlexHarker commented Aug 10, 2023

auto& should be a reference I think, but I'm not sure - I'd have to check that in the debugger.

I have just seen the spin 'locks' and I'm afraid they are woefully insufficient - if that's in the SDK then it is incredibly disappointing as an example because it will only protect you some of the time.

The setter object can pass the spin lock and then the audio thread increment the counter and both threads are then executing at the same time - the "lock: in this case does nothing (because it only checks that you aren't processing when it starts - it doesn't prevent the audio thread entering shared memory space after that). In theory there might also be low/high priority thread conflicts possible, but that may need addressing separately.

The solutions are:

1 - to devise a proper locking scheme, which will by necessity require waiting in both threads (there is no reasonable chance to rearchitect the way parameters work for non-locking code - there's just too much baked into the design at this point.

2 - to queue parameters between threads (something like the above).

1 is the easier route. Doubtless some people will have opinions about the efficiency of such an approach, but personally I do not see how a not blocking route could work here (in the convolution objects I just don't process audio when certain things are changing but not processing audio just because any parameter is changing is not an option). Therefore IMO proper locks with fallback to sleeping or thread yielding should be employed. I would expect the performance loss to be minimal (or zero) - unless you are bombarding the objects with parameter changes. Often in audio discussions about threading I see people paranoid about locking, but then using solutions that don't guarantee coherent state, which means potential undefined behaviour.

I can take the hand-rolled thread locks that I use elsewhere that are lightweight and tend to spin with microsleeps where needed and mock up something to test if that is helpful. This should be max only, because SC parameters are modified in the audio thread, pd is single-threaded and the CLI is also.

@AlexHarker
Copy link
Contributor

(pretty sure looking in the debugger that a& is the parameter in the Max object).

@AlexHarker
Copy link
Contributor

Given that mClient.process() is called in only 3 places I'm also pretty sure that thread locking could be used to solve all potential thread conflicts fairly simply. The steps would be:

  • make sure that operations like constraints on parameters are done locally
  • lock around all calls to process
  • lock when writing (and ideally reading) parameters from the client
  • locks should be only for the class to process/set/get and no larger

@AlexHarker
Copy link
Contributor

FWIW - the spinlocks in the SDK actually use two small POD variables each of which is only written on one thread, but read on the other to create at least a slightly more robust lock than in flucoma - it's a valid field that really protects the data and that can only be set when not in perform (to make sure that it doesn't get set invalid or otherwise in the middle of an audio routine).

@weefuzzy
Copy link
Member

auto& should be a reference I think, but I'm not sure - I'd have to check that in the debugger.

I wasn't clear. I meant the ampersand should be removed (not that it makes things more thread safe, just less hilarious).

Meanwhile

These are all shortcomings I was aware of but never had time to address, partly because once the dataset related objects come into the mix, with their arbitrary messages and shared state, everything gains a new gloss of horror. But thanks for laying them out clearly.

My strong preference is for a queue. Strong enough that it may be read as insistence. I know that this isn't a quick fix to this issue.

So, another thing to check is what mMaxSize is at the assertion site, and whether it is what one expects it to be.

I don't want to lose sight of someone actually reporting on the state of the objects as they hit the assertions. Please.

@tremblap
Copy link
Member

ok thanks for the morning read :)

I ran Alex's code but with stftpass instead (to remove potential issues unique to sines~) and a wider range of stft param changes. The code is below. It doesn't crash in overdrive+audio interrupt but it does after a certain time in overdrive without audio interrupt.


----------begin_max5_patcher----------
687.3ocuV1sbiBBEG+ZySACydQ6N1LB9Qz8p9drcmNDkjkNFzAvzzsSyy9hG
RZ1zllpV2biJGOve9w4vAddhGdd0FtFi9A5mHOumm34AlZM3sqsGdEaSdISC
tgWw0Z1RN128OCeiArSChR2aT1rRHK4FnGzcFqYl7eKjKuWwyMNEiBClF3iH
gz1WwPCp8azuNLPUMl8iDYmUmIyS0b2vfwu1AQALYpl+vMjDbqsWlLo8geGw
adiwTIOIHjy.xLXtGFAD75y9xwblb4oYID+p3J1JtgqtmKYyKgdFL.Nk7GsC
76hh7M0JTc0iWQu6NeESVTs5pL6mjnqudfKJowPLNZ1nFcyFRv8ixcWrvnsR
Zm9ZDIfFgtgf9FouYyzf3+IalPG4z4H73ElsYPpJKpnaY4FwZNp+vFkATBQU
BIbnr9go7yFBtlpkKK48MQcVBL6iC+Z6dERyIII8Bs2sfkusugQRnC5L3UX7
mDFCdOcIiXZ422hBlRFJCt3Wzf21oEKkrxSFBiGQHyeJuju0Vef1aPI61yQZ
ekj7e.zwrHyhxFQwTsYgo154Vzs1N0VoU7GNJkjQ661z8wYJ.NIk9I7S+D98
+nJsCYMfYLpFwdjJD55R1SqpJ.MCOGos87smBgOSgWhC+jT.+zrQ8TF5XWq5
30kyu.bH+nC7mA7SF4KM146YAcFWJju81yvTt09wKJ5pFU9docmu4iNLmK3Z
afmYD16ddvm1q805zIW26pPYcQnnibpRUvgPB4KoLoCJGOBDR6BgifNQWHcB
uP5PtT.Q5BQYWpb8DqSj2mqG7kTNtCJmLBDlNHcbUqX00q4J8NmAIr0benBv
O0GZJjtlwPSEesXu+NKLksfrwVMtQASK7ljHrqq1C6TRasdWcCKbVIgCBj1J
o5ZliC37hIuL4uPNY+bR
-----------end_max5_patcher-----------

@MattS6464 and @lewardo are you getting the same result?

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

I don't want to lose sight of someone actually reporting on the state of the objects as they hit the assertions. Please.

I'm on the case, I'm setting up the debugger to spy on their state as we speak

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

@tremblap's patch made my build crash automatically after a few seconds, before I could set up the debugger so I'll go ahead and run it again with the debugger attached to the process and report back
image

@AlexHarker
Copy link
Contributor

I don't want to lose sight of someone actually reporting on the state of the objects as they hit the assertions. Please.

This seems to me a meaningless thing to do - the state of the object is not coherent due to the threading issues. If you fix that (as I have done locally for testing) you never hit the asserts - what are you hoping to learn @weefuzzy that we don't already know?

My strong preference is for a queue. Strong enough that it may be read as insistence. I know that this isn't a quick fix to this issue.

I'm not sure how exactly how you would build a queue here because the parameters are different types, although I guess you could maybe queue the atoms and service that in the correct place, although what the correct place is would be unclear, and you'd need to update something immediately to make sure the attributes read immediately. Could I suggest that we test the locking route (which I already have coded) and look at performance, rather than leave crashes in for an unspecified amount of time. Oh course, **if the performance is insufficient then the queue could be looked at ** (and windows will likely perform differently to Mac so both need heavy testing), but I have done a lot of threading and often use locks simply because if you make them lock for short amounts of time they are easy to reason about and easier to gt right that fancier non-locking or non-blocking methods.

@weefuzzy
Copy link
Member

weefuzzy commented Aug 11, 2023

what are you hoping to learn @weefuzzy that we don't already know?

Whether, in addition to these concurrency issues, the management of max sizes set at initialisation time is also screwed up.

Re queue: I would enqueue a whole parameter set or its underlying tuple, I would think. But the type problems do indeed remain for the threading issues that pertain to all the dataset objects (though, thankfully, with less involvement of the audio thread thus far). I'm obviously not going to stop anyone trying something out, but am strongly icky about anything that implies the audio thread would sleep, and dubious about the idea that this is easy to reason about.

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

image
following stacktrace and local state, the failed assert is when startRow <= xpr.rows() - blockRows is queried, equivalent to 0 <= 512 - 1024 at runtime
am following up the stacktrace, this was inside eigen so I'm following towards the fabled fluidland

@weefuzzy
Copy link
Member

Ah, so this is a different assert to the one on FFT.hpp:86 (which fails assert on newSize <= mMaxSize). If you step up the stack into the client's processFrame(), is the debugger able to show you the contents of the fft settings?

@AlexHarker
Copy link
Contributor

AlexHarker commented Aug 11, 2023

what are you hoping to learn @weefuzzy that we don't already know?

Whether, in addition to these concurrency issues, the management of max sizes set at initialisation time is also screwed up.

They don't appear to be.

Re queue: I would enqueue a whole parameter set or its underlying tuple, I would think. But the type problems do indeed remain for the threading issues that pertain to all the dataset objects (though, thankfully, with less involvement of the audio thread thus far). I'm obviously not going to stop anyone trying something out, but am strongly icky about anything that implies the audio thread would sleep, and dubious about the idea that this is easy to reason about.

It is far far easier to reason about that than non-blocking arguments or queues. Mutexes in audio thread exist in loads of audio software you have used, it's just that there is a modern trend towards the terms "non-blocking" or "lock-free" being considered essential, without the added knowledge of how to keep complex overall state coherent, which is a much harder problem than sending single streams of values in a non-blocking fashion. That would be better discussed on a call - given that it is something I have strong feelings about. The worst results of that are non-working code with the notion that it is "better". FrameLib locks in all threads at some points (whenever it needs to reach out to the core memory allocator) - the trick is to lock for short amounts of time (say around writing a few pieces of POD) in non-audio threads, which means those threads are blocked for a minimal amount of time.

There is no point in queuing the whole state, in the sense of a queue with more than one slot (as you should always have the last state), but that is one option (a queue of length 1). It doesn't, however necessarily solve data races between low and high-priority threads. The harder part is actually that you need to transfer that state between threads, and to be honest I'd likely write a lock to do that - It's possible I could also mock that up sometime soonish in some lock-free way, but it involves a lot more thought.

@AlexHarker
Copy link
Contributor

Ah, so this is a different assert to the one on FFT.hpp:86 (which fails assert on newSize <= mMaxSize). If you step up the stack into the client's processFrame(), is the debugger able to show you the contents of the fft settings?

You will see different asserts/crashes until the threading is fixed - the state of the object is non-deterministic.

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

@AlexHarker so you're saying tracking the individual failed asserts in each case would not lead to any global fix, until the state has been made more robust and detreministic when being accessed between threads ?

@AlexHarker
Copy link
Contributor

OK - please can we get some testing on this:

https://github.com/AlexHarker/flucoma-max/tree/threadsafe-parameters

@lewardo / @MattS6464 - checking that it doesn't crash would be good (it doesn't crash here)

@tremblap - what you would be well placed to do is to compile in release (not debug) and thens stress test performance of both audio objects and dataset objects (for now). At the moment I'm not locking against non-audio thread processing as this is a minimal set of changes to fix the crashing, but there are some other places we should add locks I think.

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

I'm on it, I'll let you know what happens on that branch build, is there one specific object that I can compile so I don't have to recompile everything ?

@AlexHarker
Copy link
Contributor

@AlexHarker so you're saying tracking the individual failed asserts in each case would not lead to any global fix, until the state has been made more robust and detreministic when being accessed between threads ?

That is my position, yes. @weefuzzy may still have concerns, but I cannot personally see how we can reason about state if it is non-deterministic.

@AlexHarker
Copy link
Contributor

BTW - the reason I use my own thread locks (rather than generic mutexs) is that it's then controllable how long they spin, periodically check (busy wait) and then fall to sleep checking on wake - a generic lock might make use of thread signalling or something more heavy duty. Never sleeping/yielding is actually bad because it can prevent threads from progressing do to lock contention.

@AlexHarker
Copy link
Contributor

I'm on it, I'll let you know what happens on that branch build, is there one specific object that I can compile so I don't have to recompile everything ?

Yes - fluid.sines~ or any other FFT based audio object (which is then the one you use for testing)

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

With stftpass~ recompiled on MSVC 2022 and max-sdk 8.2.0 against @AlexHarker's commit and @tremblap's whiggle patch it isn't crashing anymore, or at least it takes eons more time than it did before recompiling, so it seems to have fixed the bug

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

the fluid.sines~ crash still crashes as @MattS6464 observed before even after recompiling with the patch, both that object and the fluid.sinfeature~ objects were recompiled

@AlexHarker
Copy link
Contributor

@lewardo - can I get the assert please or crashing line? - I suspect what you are now seeing is:

flucoma/flucoma-core#236

but I'd need to know what the assert/crash is to be sure.

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

it was FluidTensor.hpp:548 asserting sameExtents(mDesc, x.descriptor()), which iirc is an issue @MattS6464 ran into with various other objects using sine-based objects

I can get a screenshot of the crash if I rerun it if you need it, let me know if you need the extra verbosity

@AlexHarker
Copy link
Contributor

Can you rerun and screenshot. That's not the one I was hoping for sadly...

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

first image is the crash and bottom right of the second is my call stack upon fail

image
image

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

update: now the same error as flucoma/flucoma-core#236, on @AlexHarker's threadsafe-parameters branch rather than the FluidTensor.hpp:548 bug, investigating the fix suggested there

@AlexHarker
Copy link
Contributor

BTW - #385 and #387 should be the same bug.

@weefuzzy
Copy link
Member

Yes. Given that this is getting long, and the title only obliquely references the more general problem that this, #385 and #387 close in deference to a new issue that sets out what's up (which will make referencing any resulting PR clearer).

@AlexHarker
Copy link
Contributor

I can just retitle this issue if that's better and close the other two once we've confirmed they are also threading-related?

@lewardo
Copy link
Member

lewardo commented Aug 11, 2023

maybe refrain before closing #387, both @MattS6464 and myself have found that this (@AlexHarker's branch) fix in conjunction with the flucoma/flucoma-core#238 pr doesnt fix #387 and it still instacrashes on both windows and mac

@weefuzzy
Copy link
Member

I can just retitle this issue if that's better and close the other two once we've confirmed they are also threading-related?

My thinking is that, title notwithstanding, the opening posts of this issue just describe symptoms and there's quite a bit of scattered reading to do before one gets to causes.

Because this is a pretty consequential change (IMO), I think future maintainers will be grateful for having a more succinct account of the issue to refer to (not least because we know similarly ameliorative work on dataset objects remains as a treat for these future maintainers, so understanding the history of whatever PR addresses this problem will be A Good Thing)

@AlexHarker
Copy link
Contributor

I will make. new issue once we are well into the way to a fix and that one will be more succinct and less ranty (from me at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants