使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

EdwardChan5000 · 2022-09-22T12:35:01Z

code from https://github.com/EdwardChan5000/m3tl_run

错误show below

2022-09-21 16:17:06.321 | INFO | m3tl.utils:set_phase:478 - Setting phase to infer
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:271 - Initial lr: 2e-05
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:272 - Train steps: 408675
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:273 - Warmup steps: 40867
2022-09-21 16:17:06.361554: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2022-09-21 16:17:06.361588: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2022-09-21 16:17:06.361613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2022-09-21 16:17:06.369724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0
2022-09-21 16:17:06.655982: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2022-09-21 16:17:06.656157: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2022-09-21 16:17:07.017 | INFO | m3tl.utils:set_phase:478 - Setting phase to train
WARNING:tensorflow:The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True)).
WARNING:tensorflow:The parameter return_dict cannot be set in graph mode and will always be set to True.
2022-09-21 16:17:19.175 | INFO | m3tl.utils:set_phase:478 - Setting phase to train
Traceback (most recent call last):
File "m3tl_4room_rdrop.py", line 195, in
main(args)
File "m3tl_4room_rdrop.py", line 149, in main
create_tf_record_only=False, model_dir=model_dir, mirrored_strategy=mirrored_strategy)
File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 319, in train_bert_multitask
verbose=verbose
File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 163, in _train_bert_multitask_keras_model
validation_steps=validation_steps
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2
[[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]]
(1) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2
[[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]]
[[BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1/_44]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_423171]

Errors may have originated from an input operation.
Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1:
BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1:
BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Function call stack:
train_function -> train_function

The text was updated successfully, but these errors were encountered:

JayYip · 2022-10-11T07:13:57Z

好的，我下周看看

…

On Thu, Sep 22, 2022, 8:35 PM Edward Chan ***@***.***> wrote: code from https://github.com/EdwardChan5000/m3tl_run 错误show below 2022-09-21 16:17:06.321 | INFO | m3tl.utils:set_phase:478 - Setting phase to infer 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:271 - Initial lr: 2e-05 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:272 - Train steps: 408675 2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:273 - Warmup steps: 40867 2022-09-21 16:17:06.361554: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing. 2022-09-21 16:17:06.361588: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started. 2022-09-21 16:17:06.361613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs 2022-09-21 16:17:06.369724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0 2022-09-21 16:17:06.655982: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down. 2022-09-21 16:17:06.656157: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed 2022-09-21 16:17:07.017 | INFO | m3tl.utils:set_phase:478 - Setting phase to train WARNING:tensorflow:The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True)). WARNING:tensorflow:The parameter return_dict cannot be set in graph mode and will always be set to True. 2022-09-21 16:17:19.175 | INFO | m3tl.utils:set_phase:478 - Setting phase to train Traceback (most recent call last): File "m3tl_4room_rdrop.py", line 195, in main(args) File "m3tl_4room_rdrop.py", line 149, in main create_tf_record_only=False, model_dir=model_dir, mirrored_strategy=mirrored_strategy) File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 319, in train_bert_multitask verbose=verbose File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 163, in _train_bert_multitask_keras_model validation_steps=validation_steps File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit tmp_logs = self.train_function(iterator) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in *call* result = self._call(*args, **kwds) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in *call* filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call ctx=ctx) File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute inputs, attrs, num_outputs) tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]] (1) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2 [[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]] [[BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1/_44]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_423171] Errors may have originated from an input operation. Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412) Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1: BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412) Function call stack: train_function -> train_function — Reply to this email directly, view it on GitHub <#112>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADS2OTDITUG23H7HZGCEYTDV7RHADANCNFSM6AAAAAAQTAIMCM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

EdwardChan5000 commented Sep 22, 2022

JayYip commented Oct 11, 2022 via email

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

Comments

EdwardChan5000 commented Sep 22, 2022

JayYip commented Oct 11, 2022 via email