Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix OOMs during server rebalancing #150

Merged
merged 3 commits into from
Dec 12, 2022
Merged

Fix OOMs during server rebalancing #150

merged 3 commits into from
Dec 12, 2022

Conversation

borzunov
Copy link
Collaborator

@borzunov borzunov commented Dec 12, 2022

The cause of OOMs were the cyclic references TransformerBackend <-> PrioritizedTaskPool that could not have been garbage collected properly:

Screenshot 2022-12-13 at 00 31 30

Still, I've added explicit tensor removal just in case.

@borzunov borzunov changed the title Fix OOMs during rebalancing Fix OOMs during server rebalancing Dec 12, 2022
@borzunov borzunov force-pushed the rebalancing-stability branch from 8495e63 to 4698367 Compare December 12, 2022 19:40
@borzunov borzunov force-pushed the rebalancing-stability branch from 4698367 to b251516 Compare December 12, 2022 19:48
@borzunov borzunov merged commit e4dc938 into main Dec 12, 2022
@borzunov borzunov deleted the rebalancing-stability branch December 12, 2022 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants