Skip to content

Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2025

Conversation

chaubold
Copy link
Contributor

@chaubold chaubold commented Mar 31, 2025

The TwoComponentReaction submitted tasks for an executor service in the following scheme: one task for each element in the first input column. The task then performed the reaction of this element (=reactant) with all reactants of a second input column. So the output of this task is the list of reaction results of all these pairings. The output needs to be kept in memory until it has been written out.

Now imagine the second input column has a lot of rows, meaning each task needs to keep a lot of results in memory.

The thread pool is configured to use ~2x as many threads as there are CPU cores, so if there's a 4 core CPU this means 8 tasks are running in parallel, so at least 8 large results need to be kept in memory.

Changed with this commit: each reaction is handled as individual task. While this might increase the bookkeeping overhead, it makes sure that way fewer results need to be kept in memory, which in practice showed much better performance because the operating system doesn't need to manage too memory (which is outside of the JVM, but RDKit molecules in C via JNI).

Copy link
Collaborator

@ptosco ptosco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaubold Sorry for taking a while to complete this review.
This looks good to me; I have left in a few comments that you might want to take into consideration.

@chaubold chaubold force-pushed the reduce-memory-footprint branch from 260fadb to b5f436f Compare May 12, 2025 08:12
@chaubold
Copy link
Contributor Author

Thanks for the feedback @ptosco! Sorry for all the formatting changes, I should've removed them from the start. I applied the suggested improvements and also added your license file as header of Pair.java -- noticed that I forgot that one.

Copy link
Collaborator

@ptosco ptosco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaubold This looks good to me; I only suggested using KNIME's copyright notice in org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/Pair.java given you added this file, and also removing a further set of curly brackets in org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/RDKitTwoComponentReactionNodeModel.java by changing else { if (...) { into else if (...) {.

…reduce memory usage

The TwoComponentReaction submitted tasks for an executor service in the following scheme:
one task for each element in the first input column. The task then performed the reaction
of this element (=reactant) with all reactants of a second input column. So the output of
this task is the list of reaction results of all these pairings. The output needs to be
kept in memory until it has been written out.

Now imagine the second input column has a lot of rows, meaning each task needs to keep a lot
of results in memory.

The thread pool is configured to use ~2x as many threads as there are CPU cores, so if
there's a 4 core CPU this means 8 tasks are running in parallel, so at least 8 large
results need to be kept in memory.

Changed with this commit: each reaction is handled as individual task. While this might
increase the bookkeeping overhead, it makes sure that way fewer results need to be kept in
memory, which in practice showed much better performance because the operating system doesn't
need to manage a much memory (which is outside of the JVM, but RDKit molecules in C via JNI).
@chaubold chaubold force-pushed the reduce-memory-footprint branch from d18a2a2 to b80988f Compare May 13, 2025 09:56
Copy link
Collaborator

@ptosco ptosco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chaubold Thank you for your contribution and for applying my suggested changes!
@greglandrum I am happy to merge.

@greglandrum
Copy link
Member

@chaubold Thank you for your contribution and for applying my suggested changes!

@greglandrum I am happy to merge.

Thanks to both of you. Merge away @ptosco !

@ptosco ptosco merged commit ab1b03a into rdkit:master May 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants