Skip to content
This repository has been archived by the owner on Nov 13, 2024. It is now read-only.

Error: OOM when allocating tensor of shape PLEASE HELP! #5743

Open
ismailyugirov opened this issue Nov 2, 2023 · 0 comments
Open

Error: OOM when allocating tensor of shape PLEASE HELP! #5743

ismailyugirov opened this issue Nov 2, 2023 · 0 comments

Comments

@ismailyugirov
Copy link

**Hi. I constantly encounter this error while training with Train SAEHD. Even if I lower the resolution or batch size, I continue to get errors. How can I solve the problem? Please help me.

My system:

11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
RTX 3060
16GB RAM**

Initializing models: 80%|##################################################4 | 4/5 [00:11<00:02, 2.94s/it]
asdfg [163840,300] and type float
[[node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const (defined at C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]

Original stack trace for 'src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const':
File "threading.py", line 884, in bootstrap
File "threading.py", line 916, in bootstrap_inner
File "threading.py", line 864, in run
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init
self.on_initialize()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize
self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu')
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables
vs = { v.name : tf.get_variable ( f'vs
{v.name}'.replace(':','
'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in
vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in call
return cls._variable_v1_call(*args, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in init
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1799, in _init_from_args
initial_value = initial_value()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\init_ops.py", line 230, in call
self.value, dtype=dtype, shape=shape, verify_shape=verify_shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 171, in constant_v1
allow_broadcast=False)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 294, in _constant_impl
"Const", [], [dtype_value.type], attrs=attrs, name=name).outputs[0]
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [163840,300] and type float
[[{{node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init
self.on_initialize()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 657, in on_initialize
model.init_weights()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\layers\Saveable.py", line 106, in init_weights
nn.init_weights(self.get_weights())
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\ops_init_.py", line 48, in init_weights
nn.tf_sess.run (ops)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor of shape [163840,300] and type float
[[node src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const (defined at C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:38) ]]

Original stack trace for 'src_dst_opt/vs_inter_B/dense1/weight_0/Initializer/Const':
File "threading.py", line 884, in bootstrap
File "threading.py", line 916, in bootstrap_inner
File "threading.py", line 864, in run
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\ModelBase.py", line 193, in init
self.on_initialize()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 341, in on_initialize
self.src_dst_opt.initialize_variables (self.src_dst_saveable_weights, vars_on_cpu=optimizer_vars_on_cpu, lr_dropout_on_cpu=self.options['lr_dropout']=='cpu')
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in initialize_variables
vs = { v.name : tf.get_variable ( f'vs
{v.name}'.replace(':','
'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 38, in
vs = { v.name : tf.get_variable ( f'vs_{v.name}'.replace(':','_'), v.shape, dtype=v.dtype, initializer=tf.initializers.constant(0.0), trainable=False) for v in trainable_weights }
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1595, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 1338, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 593, in get_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 545, in _true_getter
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 963, in _get_single_variable
aggregation=aggregation)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 266, in call
return cls._variable_v1_call(*args, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 227, in _variable_v1_call
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 205, in
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2642, in default_variable_creator
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 270, in call
return super(VariableMetaclass, cls).call(*args, **kwargs)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1670, in init
shape=shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1799, in _init_from_args
initial_value = initial_value()
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\init_ops.py", line 230, in call
self.value, dtype=dtype, shape=shape, verify_shape=verify_shape)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 171, in constant_v1
allow_broadcast=False)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\constant_op.py", line 294, in _constant_impl
"Const", [], [dtype_value.type], attrs=attrs, name=name).outputs[0]
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
File "C:\Users\Ersin\Desktop\DeepFaceLab\DeepFaceLab_NVIDIA_RTX3000_series_build_11_20_2021\DeepFaceLab_NVIDIA_RTX3000_series_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant