Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

chaubold · 2025-03-31T14:01:45Z

The TwoComponentReaction submitted tasks for an executor service in the following scheme: one task for each element in the first input column. The task then performed the reaction of this element (=reactant) with all reactants of a second input column. So the output of this task is the list of reaction results of all these pairings. The output needs to be kept in memory until it has been written out.

Now imagine the second input column has a lot of rows, meaning each task needs to keep a lot of results in memory.

The thread pool is configured to use ~2x as many threads as there are CPU cores, so if there's a 4 core CPU this means 8 tasks are running in parallel, so at least 8 large results need to be kept in memory.

Changed with this commit: each reaction is handled as individual task. While this might increase the bookkeeping overhead, it makes sure that way fewer results need to be kept in memory, which in practice showed much better performance because the operating system doesn't need to manage too memory (which is outside of the JVM, but RDKit molecules in C via JNI).

ptosco

@chaubold Sorry for taking a while to complete this review.
This looks good to me; I have left in a few comments that you might want to take into consideration.

org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/Pair.java

org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/PairIterable.java

...odes/src/org/rdkit/knime/nodes/twocomponentreaction2/RDKitTwoComponentReactionNodeModel.java

chaubold · 2025-05-12T08:14:38Z

Thanks for the feedback @ptosco! Sorry for all the formatting changes, I should've removed them from the start. I applied the suggested improvements and also added your license file as header of Pair.java -- noticed that I forgot that one.

ptosco

@chaubold This looks good to me; I only suggested using KNIME's copyright notice in org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/Pair.java given you added this file, and also removing a further set of curly brackets in org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/RDKitTwoComponentReactionNodeModel.java by changing else { if (...) { into else if (...) {.

...odes/src/org/rdkit/knime/nodes/twocomponentreaction2/RDKitTwoComponentReactionNodeModel.java

org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/Pair.java

…reduce memory usage The TwoComponentReaction submitted tasks for an executor service in the following scheme: one task for each element in the first input column. The task then performed the reaction of this element (=reactant) with all reactants of a second input column. So the output of this task is the list of reaction results of all these pairings. The output needs to be kept in memory until it has been written out. Now imagine the second input column has a lot of rows, meaning each task needs to keep a lot of results in memory. The thread pool is configured to use ~2x as many threads as there are CPU cores, so if there's a 4 core CPU this means 8 tasks are running in parallel, so at least 8 large results need to be kept in memory. Changed with this commit: each reaction is handled as individual task. While this might increase the bookkeeping overhead, it makes sure that way fewer results need to be kept in memory, which in practice showed much better performance because the operating system doesn't need to manage a much memory (which is outside of the JVM, but RDKit molecules in C via JNI).

ptosco

@chaubold Thank you for your contribution and for applying my suggested changes!
@greglandrum I am happy to merge.

greglandrum · 2025-05-14T05:40:44Z

@chaubold Thank you for your contribution and for applying my suggested changes!

@greglandrum I am happy to merge.

Thanks to both of you. Merge away @ptosco !

ptosco approved these changes May 10, 2025

View reviewed changes

chaubold force-pushed the reduce-memory-footprint branch from 260fadb to b5f436f Compare May 12, 2025 08:12

ptosco approved these changes May 12, 2025

View reviewed changes

...odes/src/org/rdkit/knime/nodes/twocomponentreaction2/RDKitTwoComponentReactionNodeModel.java Outdated Show resolved Hide resolved

org.rdkit.knime.nodes/src/org/rdkit/knime/nodes/twocomponentreaction2/Pair.java Outdated Show resolved Hide resolved

chaubold force-pushed the reduce-memory-footprint branch from d18a2a2 to b80988f Compare May 13, 2025 09:56

ptosco approved these changes May 13, 2025

View reviewed changes

ptosco merged commit ab1b03a into rdkit:master May 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

Uh oh!

chaubold commented Mar 31, 2025 •

edited

Loading

Uh oh!

ptosco left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaubold commented May 12, 2025

Uh oh!

ptosco left a comment

Uh oh!

Uh oh!

Uh oh!

ptosco left a comment

Uh oh!

greglandrum commented May 14, 2025

Uh oh!

Uh oh!

Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

Adjust parallelization of TwoComponentReaction node to significantly reduce memory usage #162

Uh oh!

Conversation

chaubold commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptosco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chaubold commented May 12, 2025

Uh oh!

ptosco left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ptosco left a comment

Choose a reason for hiding this comment

Uh oh!

greglandrum commented May 14, 2025

Uh oh!

Uh oh!

chaubold commented Mar 31, 2025 •

edited

Loading