Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.07 no longer seems to work in multiple threads #39

Open
greglandrum opened this issue Aug 5, 2024 · 28 comments
Open

v1.07 no longer seems to work in multiple threads #39

greglandrum opened this issue Aug 5, 2024 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@greglandrum
Copy link

greglandrum commented Aug 5, 2024

I just tried to get v1.07 working within the RDKit and immediately ran into the problem that our tests which attempt to use the InChI code from multiple threads simultaneously now fail.

The failure mode is that I start getting bad values (empty InChIs) or a seg fault at some point during execution. This does not happen if I run the same test with the number of threads set to 1, and it still happens even if I call the RDKit's "MolBlockToInChI" function, which is a thin wrapper around MakeINCHIFromMolfileText(), so this doesn't seem to be something in the way we're using the InChI library. But, then again, it's multi-threaded stuff, so who knows?

Is there something extra that now needs to be done from client code in order to safely use the code from mulitple threads or is this a regression in the InChI code itself?

In case you're curious, the specific test that's failing is this one:
https://github.com/rdkit/rdkit/blob/master/External/INCHI-API/test.cpp#L65

@JanCBrammer
Copy link
Collaborator

JanCBrammer commented Aug 8, 2024

@greglandrum, thanks for the issue. Which InChI version was used to run those tests before / that didn't fail the multithreading? Looking at https://github.com/rdkit/rdkit/blob/3d972a9e2f9ef097a36c4cf09334e898c69f16c1/External/INCHI-API/download-inchi.sh#L55 it seems that v1.05 is the last known good version?

@greglandrum
Copy link
Author

@JanCBrammer Yes, we use v1.05 now.

@JanCBrammer
Copy link
Collaborator

JanCBrammer commented Aug 8, 2024

Any chance you could try with v1.06? I.e., we could start by bisecting versions before bisecting the commits from v1.06 to v1.07.

@greglandrum
Copy link
Author

greglandrum commented Aug 8, 2024

Any chance you could try with v1.06? I.e., we could start by bisecting versions before bisecting the commits from v1.06 to v1.07.

Sorry I mis-spoke: we are actually using v1.06, not v1.05

@JanCBrammer
Copy link
Collaborator

JanCBrammer commented Aug 8, 2024

we are actually using v1.06, not v1.05

Nice, then we'll bisect the commits between v1.06 and v1.07 using https://github.com/rdkit/rdkit/blob/3d972a9e2f9ef097a36c4cf09334e898c69f16c1/External/INCHI-API/test.cpp#L65. Keeping you in the loop.

@JanCBrammer
Copy link
Collaborator

@greglandrum, I could replicate the issue as follows.
When running RDBASE=$PWD/.. PYTHONPATH=$RDBASE LD_LIBRARY_PATH=$RDBASE/lib:$LD_LIBRARY_PATH ctest -R testInchi
with v1.07 (git checkout v1.07.0) I get (across multiple runs)

Test project /workspaces/rdkit/build
	Start 1: testInchi
1/1 Test #1: testInchi ........................Subprocess aborted***Exception:   4.94 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   4.95 sec

The following tests FAILED:
	  1 - testInchi (Subprocess aborted)
Errors while running CTest

or

Test project /workspaces/rdkit/build
	Start 1: testInchi
1/1 Test #1: testInchi ........................***Exception: SegFault  0.07 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.08 sec

The following tests FAILED:
	  1 - testInchi (SEGFAULT)
Errors while running CTest

When running the tests with v1.06 (git checkout v1.06) I get

Test project /workspaces/rdkit/build
	Start 1: testInchi
1/1 Test #1: testInchi ........................   Passed    5.06 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   5.07 sec

Therefore, we know that the last known "good" commit G is the one that's tagged with v1.06 and we know that commit B tagged with v1.07.0 is "bad". Now, I need to find the first commit B' between G and B that's bad. B' should contain the change that breaks the tests.

Following this rationale, using git bisect, I identified B' to be 66f42ac. I.e., running the tests after git checkout 66f42acbe9e29270a06e1c34307d12527c4ba399 results in

Test project /workspaces/rdkit/build
    Start 1: testInchi
1/1 Test #1: testInchi ........................Subprocess aborted***Exception:   0.10 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.11 sec

The following tests FAILED:
          1 - testInchi (Subprocess aborted)
Errors while running CTest

@djb-rwth, could you verify this?

@greglandrum
Copy link
Author

@JanCBrammer thanks for tracking the commit down.
It looks like that commit introduces some global variables. I'm going to guess those are what are causing the problem with thread safety.

@djb-rwth
Copy link
Collaborator

Hi @greglandrum,
Sorry for a slightly belated response.

@JanCBrammer's findings might be a good starting point in resolving this issue; however many parts of the code which belong to these past commits have been deleted and/or altered in the meantime.

When I first read this post, I thought that InChI multi-threading support in RDKit has been provided through libinchi API which had been compiled using Intel® oneAPI Threading Building Blocks, as a basic framework for OneTBB has been added to InChI since v.1.06; obviously, this is not the case according files in rdkit/External/INCHI-API/ folder as they use CMake to compile InChI from source.

Your assumption regarding the compromised thread safety related to use global variables makes sense, but I would still need to set an environment which can compile test.cpp. Please let me know if I should compile RDKit from source, and if the procedure is similar to compiling OpenCV from source.

This is essential for rewriting some parts of the InChI code, so they comply to reentrant and/or threadsafe rules.

Implementing parallel programming into InChI would be much better (and a personal aspiration), but it looks as if some other directions in development have bigger priority at the moment.

@JanCBrammer
Copy link
Collaborator

set an environment which can compile test.cpp. Please let me know if I should compile RDKit from source, and if the procedure is similar to compiling OpenCV from source.

@djb-rwth, see https://www.rdkit.org/docs/Install.html regarding RDKit compilation.

@greglandrum
Copy link
Author

Actually, this version is a bit more compact and, I think, up to date:
https://greglandrum.github.io/rdkit-blog/posts/2023-03-17-setting-up-a-cxx-dev-env2.html

@greglandrum
Copy link
Author

When I first read this post, I thought that InChI multi-threading support in RDKit has been provided through libinchi API which had been compiled using Intel® oneAPI Threading Building Blocks, as a basic framework for OneTBB has been added to InChI since v.1.06; obviously, this is not the case according files in rdkit/External/INCHI-API/ folder as they use CMake to compile InChI from source.

To clarify my perspective: The RDKit just needs to be able to call into the InChI code multiple times from different threads; this used to be possible. I don't need the InChI library to provide this and definitely have no interest in adding a dependency on Intel's TBB

Your assumption regarding the compromised thread safety related to use global variables makes sense, but I would still need to set an environment which can compile test.cpp. Please let me know if I should compile RDKit from source, and if the procedure is similar to compiling OpenCV from source.

This is essential for rewriting some parts of the InChI code, so they comply to reentrant and/or threadsafe rules.

Implementing parallel programming into InChI would be much better (and a personal aspiration), but it looks as if some other directions in development have bigger priority at the moment.

My two cents: I don't see much value in having a parallel version of the InChI code itself. I think the highest value the InChI team can provide is improvements to the efficiency and robustness of the existing library

@djb-rwth
Copy link
Collaborator

Actually, this version is a bit more compact and, I think, up to date: https://greglandrum.github.io/rdkit-blog/posts/2023-03-17-setting-up-a-cxx-dev-env2.html

Hi @greglandrum,
Thanks for the link.
My current priority is to get test.cpp compiled in Visual Studio 2022 on Windows so I could try to resolve the multi-threaded issue. Please let me know if the above mentioned installation is able to provide this.

Two notes:

  • I have given up implementing OneTBB and do not have any intentions of revisiting this unless required.
  • Introducing parallel programming into inChI is an idea which will not likely happen soon as there are too many issues to be resolved with the current version of the code.

@greglandrum
Copy link
Author

Hi @djb-rwth
here's an extremely minimal set of instructions for getting a build environment working for the InChI test on Windows. I'm assuming that you have some type of conda install (if not, install and configure miniforge here: https://github.com/conda-forge/miniforge)

First setup and activate a conda environment with the pre-requisites you need. I run these commands in a git bash shell:

conda create -n py310_rdkit_build -c conda-forge python=3.10 boost-cpp boost libboost libboost-devel cmake
conda activate py310_rdkit_build

Do the following in that same shell:

  • Go to the directory with the RDKit source.
  • create a build directory (I normally just call it build)
  • cd to the build directory
  • cmake -DRDK_BUILD_PYTHON_WRAPPERS=OFF -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=OFF ../
  • That should run without errors (though it will spit out some warnings)
  • Now open the solution file it produced in Visual Studio and build the testInchi target (there's probably some way to do this from the command line, but I don't know what it is for Windows). you probably want to switch the build configuration to "Release" first
  • Back in your shell, set the RDBASE environment variable to point to the directory where you checked out the RDKit source.
  • Run the testInchI executable: ctest -V -R testInchi

I just tried this on my laptop and it worked... hopefully it will also work for you.

@djb-rwth
Copy link
Collaborator

Hi @djb-rwth here's an extremely minimal set of instructions for getting a build environment working for the InChI test on Windows.

Thanks so much for this @greglandrum. I will try to do this ASAP.

@giallu
Copy link
Contributor

giallu commented Aug 16, 2024

  • cd to the build directory
  • cmake -DRDK_BUILD_PYTHON_WRAPPERS=OFF -DRDK_BUILD_INCHI_SUPPORT=ON -DRDK_BUILD_FREETYPE_SUPPORT=OFF ../
  • That should run without errors (though it will spit out some warnings)
  • Now open the solution file it produced in Visual Studio and build the testInchi target (there's probably some way to do this from the command line, but I don't know what it is for Windows). you probably want to switch the build configuration to "Release" first

I believe that would be cmake --build --target testinchi

@greglandrum
Copy link
Author

I believe that would be cmake --build --target testinchi

I was not able to make that work.

@giallu
Copy link
Contributor

giallu commented Aug 16, 2024

I was not able to make that work.

You are right, I just realized the Visual Studio generator is a multi-config one so --target can't work there.

@janholstjensen
Copy link

I can reproduce the issue with test code that I have setup to make the Reaction InChI C API thread-safe. InChI 1.0.6.0 works, while 1.0.7.0 fails.

In some cases the InChI library starts returning inchi_Ret_ERROR and in other cases I see a segmentation fault. Feel free to reach out to me for alternative test code and additional testing of InChI concurrency.

@JanCBrammer JanCBrammer added the bug Something isn't working label Sep 20, 2024
@djb-rwth djb-rwth self-assigned this Sep 23, 2024
@djb-rwth
Copy link
Collaborator

Hi @greglandrum,
Back at work again.
Just one final question about setting the RDKit development environment:

  • Back in your shell, set the RDBASE environment variable to point to the directory where you checked out the RDKit source.

Please let me know if I should add this as an entry in CTestCustom.ctest file or do it some other way.
Thank you in advance.

@djb-rwth
Copy link
Collaborator

djb-rwth commented Sep 24, 2024

I can reproduce the issue with test code that I have setup to make the Reaction InChI C API thread-safe. InChI 1.0.6.0 works, while 1.0.7.0 fails.

In some cases the InChI library starts returning inchi_Ret_ERROR and in other cases I see a segmentation fault. Feel free to reach out to me for alternative test code and additional testing of InChI concurrency.

Hi @janholstjensen,
Thank you for reaching out.
Sending the test code files would be more than welcome as there is always a possibility that two different issues are triggering the multithreading difficulties.

@greglandrum
Copy link
Author

Please let me know if I should add this as an entry in CTestCustom.ctest file or do it some other way. Thank you in advance.

Hi @djb-rwth I don't use ctest that way, so I'm not really sure what the answer is to that part.
I set the environment variable in the shell from which I invoke ctest

@djb-rwth
Copy link
Collaborator

djb-rwth commented Sep 24, 2024

Hi @greglandrum,
Thanks for the quick response.

After replacing the source code in <path_to>\rdkit\External\INCHI-API\src with working v.1.07.x version, I can confirm that the multithreading block gives an error output:

1: [23:26:30]     Test multithreading
1: reading molecules
1: generating reference data
1: processing
1:  launch :0
1:  launch :1
1:  launch :2
1:  launch :3
1: [23:26:30]
1:
1: ****
1: Test Assert
1: Expression Failed:
1: Violation occurred on line 48 in file <path_to>\rdkit\External\INCHI-API\test.cpp
1: Failed Expression: mol2
1: ****
1:
1: [23:26:30] Invalid InChI prefix in generating InChI Key
1: [23:26:30]
1:
1: ****
1: Test Assert
1: Expression Failed:
1: Violation occurred on line 45 in file <path_to>\rdkit\External\INCHI-API\test.cpp
1: Failed Expression: key2 == keys[i]
1: ****
1:
1/1 Test #1: testInchi ........................Exit code 0xc0000409
***Exception:   0.37 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) =   0.41 sec

The following tests FAILED:
          1 - testInchi (Exit code 0xc0000409
)
Errors while running CTest
Output from these tests are in: <path_to>/rdkit/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.

Will get back to you shortly with more information.

Please disregard all my queries from previous versions of this comment -- I have already found everything I need to resolve this issue.

@JanCBrammer
Copy link
Collaborator

We now reproduce the multithreading failure on CI, i.e., CI fails due to the segfault(s) in the InChI library.

See 746a927.

@greglandrum
Copy link
Author

greglandrum commented Nov 5, 2024

FYI: I talked to Gerd yesterday who let me know that this one should be fixed. I just grabbed the current state of main and built the RDKit against that. All of my tests now pass
Apologies, that was me making a mistake in the build setup. I still get errors in my multithreading test using main

@djb-rwth
Copy link
Collaborator

djb-rwth commented Nov 7, 2024

FYI: I talked to Gerd yesterday who let me know that this one should be fixed. I just grabbed the current state of main and built the RDKit against that. All of my tests now pass Apologies, that was me making a mistake in the build setup. I still get errors in my multithreading test using main

Hi @greglandrum,
Yes, the information you got from @gblanke02 is correct but I still have not pushed the new version to GitHub.
Will let you know as soon as I do because there is one more feature we wish to include in v.1.07.2 beforehand.

@djb-rwth
Copy link
Collaborator

Hi @greglandrum,
The multithreading issues have been addressed in InChI v1.07.2, which has now been uploaded to rwth branch.
Please feel free to let me know if you have any further suggestions or comments regarding multithreading.
Hoping to hear from you soon.

@greglandrum
Copy link
Author

@djb-rwth I just did an RDKit build using the rwth branch and the multi-threading tests passed without problems.

Running the rest of my tests turns up a new problem with, I think, GetStructFromINCHI(), but I will report that in a separate issue.

@djb-rwth
Copy link
Collaborator

Hi @greglandrum,
Thanks for confirming that multithreading issue has been resolved, and creating issue #67, which will be addressed separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants