-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instacrash when training fluid.mlpregressor~
#314
Comments
hmm that is hard to pinpoint... let me know if you find a way to reproduce, especially not in Max. I know @AlexHarker has a pending fix to a lot of memory racing conditions problem in threading, but I'll need you as poweruser to test it with me :) |
Not really had this pop up as a recurring thing, so will keep my eyes peeled. Can close this (if you want) as there's no concrete issue/steps. |
nah keep it on but keep an eye on it :) |
The crash is in what I assume is the scheduler (timer) thread. At a quick and not very detailed look I am not really sure that this method is threadsafe, as it access the count of an std::unordered_map() without any protection of that object and then assumes that it can access the value(s) in that slot/bin without protection immediately afterwards (which in concurrent computer land doesn't actually mean immediately, nor does it mean the data has not mutated in the meantime).. Hard to be sure but the most obvious explanation is that one thread looks up the dataset as another deletes it. The count returns 1 or more in the first bit of lookup() in the lookup thread, then the deletion occurs in the other, then the access to the weak_ptr back in the lookup thread is bogus. This might be fixed by what I am due to do at some point externally, but I'd argue (given that shared pointers are in use here - implying threadsafety) that lookup() itself should be thread safe. That would then cover usage in other environments. |
this reminds me of last summer when you had a fix for all of this but that might cost performance. Maybe when you have a bit of time you and I can sit around this and you can revive the pr (which you had on your local branch) which I could action fast I promise. |
I don't have exact steps to reproduce as I was doing a few things at once (basically
fit
-ing afluid.mlpregressor~
in adefer
loop while alsofluid.robustscale~
-ing stuff elsewhere) but the crash report has a bunch offluid
stuff in it so figured I'd post it here in case it adds some insight.flucoma crash.txt
The text was updated successfully, but these errors were encountered: