Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAM usage grows without bound when using pyabc.sampler.SingleCoreSampler() #626

Open
Gabriel-p opened this issue Jan 16, 2024 · 18 comments
Open
Assignees
Labels

Comments

@Gabriel-p
Copy link

Gabriel-p commented Jan 16, 2024

Bug description
When I use pyabc.ABCSMC()with sampler=pyabc.sampler.SingleCoreSampler() the RAM usage will some times grow until all available RAM is consumed. This happens rarely but I tested it enough times to reproduce it. The issue goes away if I use instead sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Script with sampler=pyabc.sampler.SingleCoreSampler()

Captura de pantalla de 2024-01-16 10 37 50

Exact same script but using sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1)

Captura de pantalla de 2024-01-16 10 38 11

Expected behavior
Not use all the RAM.

To reproduce
I can't, my scrip is very large and it also does not happen all the time.

Environment

Name: pyabc
Version: 0.12.13
Summary: Distributed, likelihood-free ABC-SMC inference
Home-page: https://github.com/icb-dcm/pyabc
Author: The pyABC developers
Author-email: yannik.schaelte@gmail.com
License: BSD-3-Clause
Location: /home/gabriel/miniconda3/envs/asteca/lib/python3.12/site-packages
Requires: click, cloudpickle, distributed, gitpython, jabbar, matplotlib, numpy, pandas, redis, scikit-learn, scipy, sqlalchemy
Required-by:
/home/gabriel/miniconda3/envs/asteca/bin/python
Python 3.12.0

elementary OS 7.1 (based on Ubuntu 22.04.3 LTS); Linux 6.5.0-14-generic

@Gabriel-p Gabriel-p added the bug label Jan 16, 2024
@stephanmg
Copy link
Collaborator

Thanks @Gabriel-p for reporting this.

So are you saying you cannot provide the script for us to test to reproduce the results? It would be good to confirm it on another installation.

@Gabriel-p
Copy link
Author

Let me see if I can clean it up and reduce the number of files to the minimum required

@Gabriel-p
Copy link
Author

Ok, here's the compressed file with everything needed to reproduce the issue. You'll need a conda environment with:

python 3.12.0
pyABC 0.12.13
numpy 1.26.2
scipy  1.11.13
astropy 5.3.4
pandas 2.1.1
fastparquet 2023.10.1
fast_histogram 0.12

Then you just run the test_pyABC.py script changing the lines 90 & 91 to switch between samplers.

Let me know if something does not work.

pyABC_test.zip

@stephanmg
Copy link
Collaborator

Ah, perfect, we will have a look at this.

@stephanmg stephanmg self-assigned this Jan 17, 2024
@stephanmg
Copy link
Collaborator

At @Gabriel-p I can't reproduce your issue here, what is the frequency of this error happening?

@Gabriel-p
Copy link
Author

Hi @stephanmg, I think I sent the files improperly packaged, not sure if you could manage to run the test_pyABC.py if not let mo know.

I can reproduce the issue 100% of the times, even after restarting the system.Another thing I've noticed is that sometimes the script keeps running in the background even after I close my IDE (Sublime Text)

@stephanmg
Copy link
Collaborator

Yes, please re-package if possible and I will give it another try. Thanks for your patience.

@Gabriel-p
Copy link
Author

Now it should work
pyABC_test.zip

@stephanmg
Copy link
Collaborator

Hi @Gabriel-p I can't reproduce it here, I will also assign @arrjon to check the issue.

@Gabriel-p
Copy link
Author

Ok, I can still reproduce this issue 100% of the times so let me know what I can do to help

@arrjon
Copy link
Collaborator

arrjon commented Feb 5, 2024

I checked it now on MacOS, and it seems like SingleCoreSampler() is opening more threads than it should. This might explain your issue and seems to be a bug. Using MulticoreEvalParallelSampler(n_procs=1) works as expected.

@stephanmg
Copy link
Collaborator

Hi @Gabriel-p,

could you show the content of OMP_NUM_THREADS, e.g. echo $OMP_NUM_THREADS.

@stephanmg
Copy link
Collaborator

... and could you try the branch fix_singlecore, and let me know if it works?

@Gabriel-p
Copy link
Author

echo $OMP_NUM_THREADS returns nothing.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.MulticoreEvalParallelSampler(n_procs=1):

ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.Sampler INFO: Parallelize sampling on 1 processes.
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC.History INFO: Start <ABCSMC id=5, start_time=2024-02-06 08:38:41>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: t: 0, eps: 1.32229323e-01.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: Accepted: 500 / 1031 = 4.8497e-01, ESS: 5.0000e+02.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: t: 1, eps: 1.00988341e-01.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: Accepted: 500 / 972 = 5.1440e-01, ESS: 4.2383e+02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: t: 2, eps: 8.23765786e-02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: Accepted: 500 / 1098 = 4.5537e-01, ESS: 4.1058e+02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: t: 3, eps: 7.20554730e-02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: Accepted: 500 / 1096 = 4.5620e-01, ESS: 4.2701e+02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: t: 4, eps: 6.45272070e-02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Accepted: 500 / 1144 = 4.3706e-01, ESS: 4.2139e+02.
ABC INFO: Stop: Maximum walltime.
ABC INFO: Stop: Maximum walltime.
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>
ABC.History INFO: Done <ABCSMC id=5, duration=0:02:05.371858, end_time=2024-02-06 08:40:47>

It appears to be running the sampler twice? The RAM usage stays low as expected.

This is the output to screen with the fix_singlecore branch and sampler=pyabc.sampler.SingleCoreSampler():

ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC.History INFO: Start <ABCSMC id=6, start_time=2024-02-06 08:41:40>
ABC INFO: Calibration sample t = -1.
ABC INFO: Calibration sample t = -1.
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
[<_MainThread(MainThread, started 139971898045504)>]
Active threads: <function active_count at 0x7f4dbc321120>
....

The RAM usage immediately starts climbing.

@stephanmg
Copy link
Collaborator

Thanks for the information @Gabriel-p - we are currently still troubleshooting the issue. We will push the fix, when it's ready, to the fix_singlecore branch for you.

@stephanmg
Copy link
Collaborator

stephanmg commented Feb 29, 2024

@Gabriel-p might be related to this issue: ICB-DCM/pyPESTO#1312

Could you please try again the fix_singlecore branch?

@Gabriel-p
Copy link
Author

@stephanmg just tested the fix_singlecore branch, the issue is still there

@stephanmg
Copy link
Collaborator

Thanks for testing so quickly, hoped the issue would go away in light of this. However, seems that we need to dig deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants