Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves pysmurf-controller functions that load lots of data to subprocess #669

Merged
merged 10 commits into from
May 13, 2024

Conversation

jlashner
Copy link
Collaborator

@jlashner jlashner commented May 1, 2024

This PR changes pysmurf-controller operations that load TODs such that instead of running in the main process, they call out to a subprocess.

Description

This introduces the smurf_subprocess_util module, and a protocol for running functions inside a twisted subprocess in such a way that a general RunCfg can be passed to it, and a general RunResult can be returned. Any operation that involves TOD data loading is modified so that the loading and analysis occurs in a subprocess rather than in the main one.

Motivation and Context

This should hopefully solve the memory leak, which I believe is coming from instances where TOD data is not being released to the system after it leaves scope. Moving TOD loading and analysis into subprocesses instead of the main one should enforce that this memory is returned to the OS when the subprocess exits. This discussion has some links that motivate this solution.

How Has This Been Tested?

I have tested the subprocess protocol locally through the use of the test process (and the new run_test_func operation).
I will need to test all other operations with hardware on SATp1 or SATp3.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@jlashner jlashner changed the title Moves functions that load lots of data to subprocess Moves pysmurf-controller functions that load lots of data to subprocess May 1, 2024
return threads.blockingCallFromThread(
reactor, _run_func_in_subprocess_reactor, cfg)
else:
return _run_func_in_subprocess_reactor(cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool... I believe that if you are in reactor context, then you can't block -- i.e you need to return a Deferred here, which would only be possible if _run_func_in_subprocess_reactor(cfg) returned prot.deferred or some wrapped thing that applies the Runresult.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... I changed to run_func_in_subprocess_from_thread and had it only handle the from-thread case

@jlashner
Copy link
Collaborator Author

jlashner commented May 2, 2024

This has been tested on satp1 and it works! It seems to have fixed the large memory leak. Notes from my testing is here: https://simonsobs.atlassian.net/wiki/spaces/~55705801070cc775254537ab15f2dcf6702b78/pages/412778522/2024-05-02+pysmurf-controller+subprocessing+tests

Notably, memory returns to base-level after IVs and bias steps, which it did not previously:
image

I want to note that there does still seem to be a memory leak caused by the functions that have not been moved to subprocesses. I did not move the stream data function to a subprocess because it isn't loading TODs, however just the creation of the pysmurf object seems to be leaking memory, at a rate of ~15 MB per call:
image

I think we should probably move all logic involving pysmurf to a subprocess, however I this this PR solves the most pressing issues and largest memory leaks, so I think we should merge this first.

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad the subprocess route is working! Can you address the unused imports caught by pre-commit: https://results.pre-commit.ci/run/github/186511668/1714677193.715HK9dWRnKVTIpCyOTUcg

Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, a few small comments/questions below.

socs/agents/pysmurf_controller/agent.py Outdated Show resolved Hide resolved
socs/agents/pysmurf_controller/agent.py Outdated Show resolved Hide resolved
socs/agents/pysmurf_controller/agent.py Show resolved Hide resolved
socs/agents/pysmurf_controller/agent.py Outdated Show resolved Hide resolved
socs/agents/pysmurf_controller/smurf_subprocess_util.py Outdated Show resolved Hide resolved
socs/agents/pysmurf_controller/smurf_subprocess_util.py Outdated Show resolved Hide resolved
Copy link
Member

@BrianJKoopman BrianJKoopman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments look good, merge conflicts still need resolving though.

@BrianJKoopman BrianJKoopman self-requested a review May 13, 2024 20:01
@BrianJKoopman BrianJKoopman merged commit b95aed3 into main May 13, 2024
5 checks passed
@BrianJKoopman BrianJKoopman deleted the controller-subprocessing branch May 13, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants