Instacrash when training `fluid.mlpregressor~` #314

rconstanzo · 2022-07-06T09:08:20Z

I don't have exact steps to reproduce as I was doing a few things at once (basically fit-ing a fluid.mlpregressor~ in a defer loop while also fluid.robustscale~-ing stuff elsewhere) but the crash report has a bunch of fluid stuff in it so figured I'd post it here in case it adds some insight.

flucoma crash.txt

The text was updated successfully, but these errors were encountered:

tremblap · 2024-10-09T12:04:34Z

hmm that is hard to pinpoint... let me know if you find a way to reproduce, especially not in Max. I know @AlexHarker has a pending fix to a lot of memory racing conditions problem in threading, but I'll need you as poweruser to test it with me :)

rconstanzo · 2024-10-09T12:45:29Z

Not really had this pop up as a recurring thing, so will keep my eyes peeled.

Can close this (if you want) as there's no concrete issue/steps.

tremblap · 2024-10-09T12:50:50Z

nah keep it on but keep an eye on it :)

AlexHarker · 2024-10-09T15:59:10Z

The crash is in what I assume is the scheduler (timer) thread.
I think it's a C++ exception on a weak pointer and it happens I think NRTSharedInstanceAdaptor::lookup().

At a quick and not very detailed look I am not really sure that this method is threadsafe, as it access the count of an std::unordered_map() without any protection of that object and then assumes that it can access the value(s) in that slot/bin without protection immediately afterwards (which in concurrent computer land doesn't actually mean immediately, nor does it mean the data has not mutated in the meantime)..

Hard to be sure but the most obvious explanation is that one thread looks up the dataset as another deletes it. The count returns 1 or more in the first bit of lookup() in the lookup thread, then the deletion occurs in the other, then the access to the weak_ptr back in the lookup thread is bogus. This might be fixed by what I am due to do at some point externally, but I'd argue (given that shared pointers are in use here - implying threadsafety) that lookup() itself should be thread safe. That would then cover usage in other environments.

tremblap · 2024-10-10T08:46:53Z

this reminds me of last summer when you had a fix for all of this but that might cost performance. Maybe when you have a bit of time you and I can sit around this and you can revive the pr (which you had on your local branch) which I could action fast I promise.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instacrash when training `fluid.mlpregressor~` #314

Instacrash when training `fluid.mlpregressor~` #314

rconstanzo commented Jul 6, 2022

tremblap commented Oct 9, 2024

rconstanzo commented Oct 9, 2024

tremblap commented Oct 9, 2024

AlexHarker commented Oct 9, 2024

tremblap commented Oct 10, 2024

Instacrash when training fluid.mlpregressor~ #314

Instacrash when training fluid.mlpregressor~ #314

Comments

rconstanzo commented Jul 6, 2022

tremblap commented Oct 9, 2024

rconstanzo commented Oct 9, 2024

tremblap commented Oct 9, 2024

AlexHarker commented Oct 9, 2024

tremblap commented Oct 10, 2024

Instacrash when training `fluid.mlpregressor~` #314

Instacrash when training `fluid.mlpregressor~` #314