You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This leads to RAM issues because Molpipeline simultaneously tries to fit the RDKit data structures for all 1.4M molecules into the RAM. This happens because Molpipeline splits the pipeline elements into syncing and non-syncing parts during the instance-based processing splitting.
In the constructor of MolToDescriptorPipelineElement, the _requires_fitting is set when the standardizer is not None:
I tried to process a data set of 1.4M molecules with a small Pipeline looking like this:
This leads to RAM issues because Molpipeline simultaneously tries to fit the RDKit data structures for all 1.4M molecules into the RAM. This happens because Molpipeline splits the pipeline elements into syncing and non-syncing parts during the instance-based processing splitting.
In the constructor of
MolToDescriptorPipelineElement
, the_requires_fitting
is set when the standardizer is not None:The RAM issues can be avoided by doing this:
It would be better to have the standardization in a way that does not lead to RAM issues.
The text was updated successfully, but these errors were encountered: