Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run C++ inference example infer_water, got ALREADY_EXISTS: Op with name Gelu error #2223

Closed
yiqichenshallwetalk opened this issue Jan 6, 2023 · 5 comments
Labels

Comments

@yiqichenshallwetalk
Copy link

Bug summary

The DeepMD C++ inferace was built successfully following the instructions on https://docs.deepmodeling.com/projects/deepmd/en/master/install/install-from-source.html.

However, when tested with the inference example: https://docs.deepmodeling.com/projects/deepmd/en/master/inference/cxx.html, an error was raised:
2023-01-06 18:43:03.150927: F tensorflow/core/framework/op.cc:215] Non-OK-status: RegisterAlreadyLocked(deferred_[i]) status: ALREADY_EXISTS: Op with name Gelu
Aborted (core dumped)

It seems like there are conflicts between the custom Gelu and TF's default Gelu.

DeePMD-kit Version

v2.2.0b1.dev0+g89d0d23.d20230106

TensorFlow Version

2.8.0

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

compile infer_water.cpp successfully using:
gcc infer_water.cpp -L $deepmd_root/lib -L $tensorflow_root/lib -I $deepmd_root/include -Wl,--no-as-needed -ldeepmd_cc -lstdc++ -ltensorflow_cc -Wl,-rpath=$deepmd_root/lib -Wl,-rpath=$tensorflow_root/lib -o infer_water

However, runing ./infer_water generated a TF error:
F tensorflow/core/framework/op.cc:215] Non-OK-status: RegisterAlreadyLocked(deferred_[i]) status: ALREADY_EXISTS: Op with name Gelu
Aborted (core dumped)

Steps to Reproduce

  1. Install the tensorflow and deepmd C++ inferface.
  2. Compile infer_water.cpp using gcc.
  3. Run the compiled program.

Further Information, Files, and Links

No response

@njzjz
Copy link
Member

njzjz commented Jan 6, 2023

It seems like there are conflicts between the custom Gelu and TF's default Gelu.

TF does not have a Gelu OP. Did you use a modified TF code?

@yiqichenshallwetalk
Copy link
Author

Yes. I was using a Rocm enhanced version of TF, which has a Gelu OP.

@cugbls
Copy link

cugbls commented May 23, 2023

I have the same problem, Deepmd + ROCm ver. Tensorflow . How can I solve this problem ?

@yiqichenshallwetalk
Copy link
Author

Skipping the custom Gelu op works for me. Do not include gelu_multi_device.cc in source/op/CMakeLists and it should work.

@cugbls
Copy link

cugbls commented May 24, 2023

Skipping the custom Gelu op works for me. Do not include gelu_multi_device.cc in source/op/CMakeLists and it should work.

Thanks a lot ! It works !

But after solve this , I have a new problem when running Lammps , there is a error :
Not found: No attr named 'dtype' in NodeDef:
[[{{node filter_type_0/Cast/x}}]]
[[filter_type_0/Cast/x]]
ERROR: DeePMD-kit Error: TensorFlow Error: Not found: No attr named 'dtype' in NodeDef:
[[{node filter_type_0/Cast/x}]]
[[filter_type_0/Cast/x]] (../pair_deepmd.cpp:933)
Last command: pair_style deepmd Sn.pb

Maybe something still wrong.
BTW, thank you so much !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants