Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading frequent calls is crashing #46

Open
Envenger opened this issue Apr 15, 2019 · 6 comments
Open

Multithreading frequent calls is crashing #46

Envenger opened this issue Apr 15, 2019 · 6 comments

Comments

@Envenger
Copy link

If i call the JSON input function around once every second, its crashing the game with a large python log.
This happens on the CPU version of the plugin but I don't call any tensorflow commands so i doubt that would be causing it.

My JSON input function doesn't do anything for now, just returns an empty quote.

    def onJsonInput(self, jsonInput):
        result = ""
        return result

This crash doesn't occur with multi threading off.
Attached the crash logs.
ShooterAIProject-backup-2019.04.15-14.17.37.log

@Envenger
Copy link
Author

Do you want any help in replicating the bug? I can share you a small project where you can replicate the bug.

@getnamo
Copy link
Owner

getnamo commented Apr 17, 2019

That would be helpful! I suspect some race condition is occurring, but that would be weird because python uses a global interpreter lock, perhaps I'm not doing something quite right.

Note to self: c++ function used for this is found at https://github.com/getnamo/UnrealEnginePython/blob/master/Source/UnrealEnginePython/Private/UEPyEngine.cpp#L814

Note2: this is new https://github.com/20tab/UnrealEnginePython/blob/0393f40181988789eeec95d1cd9d6eec811ec2a2/android/python27/include/ceval.h#L79, do we need to wrap our function with it?

@magomedb
Copy link

magomedb commented Mar 1, 2020

I have a similar issue where using multithreading with similarily frequent calls to the JSON input function result in the following error occuring repeatedly inside the output log.

File "C:\Users\User\Documents\Unreal Projects\IAF\IAF\Plugins\tensorflow-ue4\Content\Scripts\TensorFlowComponent.py", line 116, in json_input_blocking
if(self.uobject.ShouldUseMultithreading):
Exception: PyUObject is in invalid state

The engine crashes after a while and the crash log lists python36, kernel32 and ntdll. Can also see that the memory gets gradually filled over time while it's running, until the eventual crash. Turning multithreading off removes the errors and doesn't result in a crash, and the memory doesn't get filled either.

Assuming that this is a similar problem to the original, wouldn't this indicate that the problem is a memory leak?

@getnamo
Copy link
Owner

getnamo commented Mar 2, 2020

With multi-threading on each call to Json input gets handled by a different thread. What's likely happening is a second thread tries to touch the same data before the first one is done with it and it unravels the memory by accessing out of bounds data (despite GIL or in the TF layer), then the continued calls keep leaking (it should have crashed earlier). Probably the best way forward would be to figure out how to add a lock so that JSON inputs get queued one at a time, not starting a new input until last one is fully done. Optionally a single JSON input thread would be the more efficient solution (with internal event queue), but I'm unsure if I know how to properly do that atm.

Thanks for providing more examples of the bug, it does narrow down the potential source.

@Uperstream
Copy link

I have encountered the same problem. I requested json input event every tick as well.
LogPython: Error: Exception in thread Thread-2291:
Traceback (most recent call last):
File "threading.py", line 916, in _bootstrap_inner
File "threading.py", line 864, in run
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Content\Scripts\upythread.py", line 19, in backgroundAction
result = action(actionArgs)
File "F:\ProjectFile\UnrealProject\AItest\Plugins\tensorflow-ue4\Content\Scripts\TensorFlowComponent.py", line 116, in json_input_blocking
if(self.uobject.ShouldUseMultithreading):
Exception: PyUObject is in invalid state

I got a different error when I turned off the multithreading:

LogPython: Error: Variable pi/dense/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
LogPython: Error: Traceback (most recent call last):
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\tensorflow-ue4\Content\Scripts\TensorFlowComponent.py", line 143, in setup_complete
self.train()
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\tensorflow-ue4\Content\Scripts\TensorFlowComponent.py", line 73, in train
self.train_blocking()
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\tensorflow-ue4\Content\Scripts\TensorFlowComponent.py", line 152, in train_blocking
self.trained = self.tfapi.onBeginTraining()
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Content\Scripts\DPPOnothread.py", line 194, in onBeginTraining
self.model = PPO(epMax, 8, 5, 1e-4, 2e-4, self.que)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Content\Scripts\DPPOnothread.py", line 36, in init
pi, pi_params = self._build_anet('pi', trainable=True)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Content\Scripts\DPPOnothread.py", line 68, in _build_anet
l1 = tf.layers.dense(self.tfs, 200, tf.nn.relu, trainable=trainable)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\layers\core.py", line 184, in dense
return layer.apply(inputs)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 817, in apply
return self.call(inputs, *args, **kwargs)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\layers\base.py", line 374, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 746, in call
self.build(input_shapes)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\keras\layers\core.py", line 944, in build
trainable=True)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\layers\base.py", line 288, in add_weight
getter=vs.get_variable)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 609, in add_weight
aggregation=aggregation)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\training\checkpointable\base.py", line 639, in _add_variable_with_custom_getter
**kwargs_for_getter)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1487, in get_variable
aggregation=aggregation)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1237, in get_variable
aggregation=aggregation)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 540, in get_variable
aggregation=aggregation)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 492, in _true_getter
aggregation=aggregation)
LogPython: Error: File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\ops\variable_scope.py", line 861, in _get_single_variable
name, "".join(traceback.format_list(tb))))
LogPython: Error: ValueError: Variable pi/dense/kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\framework\ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\framework\ops.py", line 3274, in create_op
op_def=op_def)
File "F:\ProjectFile\UnrealProject\AItest\Plugins\UnrealEnginePython\Binaries\Win64\Lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)

I have seen this PyUObject is in invalid state error has been mentioned in the original repository. I don't really understand it. I'll post the link below.

https://github.com/20tab/UnrealEnginePython/blob/master/docs/MemoryManagement.md

@Uperstream
Copy link

I have fixed the error when I turned off multithreading. It works when I turned it off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants