Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

Open
EdwardChan5000 opened this issue Sep 22, 2022 · 1 comment
Open

使用rdrop, it shows a tf.split error. 大佬帮忙看看 #112

EdwardChan5000 opened this issue Sep 22, 2022 · 1 comment

Comments

@EdwardChan5000
Copy link

code from https://github.com/EdwardChan5000/m3tl_run

错误show below

2022-09-21 16:17:06.321 | INFO | m3tl.utils:set_phase:478 - Setting phase to infer
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:271 - Initial lr: 2e-05
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:272 - Train steps: 408675
2022-09-21 16:17:06.345 | CRITICAL | m3tl.model_fn:compile:273 - Warmup steps: 40867
2022-09-21 16:17:06.361554: I tensorflow/core/profiler/lib/profiler_session.cc:136] Profiler session initializing.
2022-09-21 16:17:06.361588: I tensorflow/core/profiler/lib/profiler_session.cc:155] Profiler session started.
2022-09-21 16:17:06.361613: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1365] Profiler found 1 GPUs
2022-09-21 16:17:06.369724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcupti.so.11.0
2022-09-21 16:17:06.655982: I tensorflow/core/profiler/lib/profiler_session.cc:172] Profiler session tear down.
2022-09-21 16:17:06.656157: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1487] CUPTI activity buffer flushed
2022-09-21 16:17:07.017 | INFO | m3tl.utils:set_phase:478 - Setting phase to train
WARNING:tensorflow:The parameters output_attentions, output_hidden_states and use_cache cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: config=XConfig.from_pretrained('name', output_attentions=True)).
WARNING:tensorflow:The parameter return_dict cannot be set in graph mode and will always be set to True.
2022-09-21 16:17:19.175 | INFO | m3tl.utils:set_phase:478 - Setting phase to train
Traceback (most recent call last):
File "m3tl_4room_rdrop.py", line 195, in
main(args)
File "m3tl_4room_rdrop.py", line 149, in main
create_tf_record_only=False, model_dir=model_dir, mirrored_strategy=mirrored_strategy)
File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 319, in train_bert_multitask
verbose=verbose
File "/usr/local/lib/python3.6/site-packages/m3tl/run_bert_multitask.py", line 163, in _train_bert_multitask_keras_model
validation_steps=validation_steps
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 855, in _call
return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/function.py", line 560, in call
ctx=ctx)
File "/usr/local/lib64/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2
[[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]]
(1) Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 7) and num_split 2
[[node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1 (defined at data/yard/workspace/vega/m3tl_run/custom_top.py:250) ]]
[[BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1/_44]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_423171]

Errors may have originated from an input operation.
Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1:
BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Input Source operations connected to node BertMultiTask/BertMultiTaskTop/rdrop_preprocess/rdrop_preprocess/split_1:
BertMultiTask/basic_mtl/GatherNd (defined at usr/local/lib/python3.6/site-packages/m3tl/utils.py:412)

Function call stack:
train_function -> train_function

@JayYip
Copy link
Owner

JayYip commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants