Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory exhaustion #272

Closed
2 tasks done
sgc-code opened this issue Dec 5, 2023 · 9 comments · Fixed by #499
Closed
2 tasks done

Memory exhaustion #272

sgc-code opened this issue Dec 5, 2023 · 9 comments · Fixed by #499
Assignees
Labels
Hard Task is hard to fix help wanted Extra attention is needed OnlyDust performance

Comments

@sgc-code
Copy link

sgc-code commented Dec 5, 2023

Describe the bug (observed vs expected behavior)

We completed our migration from devnet python to devnet rust, after a few adjustments now we have all tests running again. There is only one issue. Our tests take around 20 min to run, during that time the devnet memory keeps increasing, after 5 minutes it reaches 7.99Gb and it starts to slow down, after another minute the container stops (no specific error i could find)

As a workaround we are calling the restart endpoint every 2 minutes, and that seems to help, although i still see the memory increasing it’s good enough for our test suite (edited)

Not reproducible on alpha-goerli

  • This issue is only present on Devnet and cannot be reproduced on alpha-goerli (check the box if true).

To Reproduce
Steps to reproduce the behavior:

  1. Keep using devnet rust for 20 minutes

Devnet version

  • I am using Devnet version: shardlabs/starknet-devnet-rs:64c425b832b96ba09b49646fe0fbf49862c0fb6d
  • This happens with a dockerized Devnet (check the box if true).
  • This does not appear on the following Devnet version:

System specifications

  • OS: MacOS and also running Github actions on ubuntu
@mikiw mikiw self-assigned this Dec 5, 2023
@mikiw mikiw linked a pull request Dec 7, 2023 that will close this issue
10 tasks
@mikiw
Copy link
Contributor

mikiw commented Dec 7, 2023

I reproduced it with starknet.js tests and I can confirm that behavior, after 3 test runs starknet-devnet-rs takes 22gb of memory but it shouldn't.

@ivpavici suggested checking if it's not accidentally downloading Netflix movies in the background.

@FabijanC
Copy link
Contributor

FabijanC commented Dec 7, 2023

@mikiw I think you might be after something with that last idea:
image

@FabijanC
Copy link
Contributor

FabijanC commented Dec 11, 2023

While minding my own business, I was digging through the code and had some ideas about this issue.

  • I saw that we clone the state in several places (look for state.clone())
    • Those are refactoring candidates, maybe something is not properly dropped due to our use of Arc
  • if visual analysis tools weren't of help, there are Rust APIs that return resource usage information
    • Could be used for inserting function calls as sort of breakpoints inside the code
    • Thus we could check where exactly memory grows most
  • Do we know since when this has been a problem?
    • If there is a simple enough testing procedure for the reproduction of this problem (e.g. sending N requests which eventually cause the memory to grow; N being sufficiently large), it could be reused to backtrack revisions and potentially find one without these issues

@mikiw mikiw removed their assignment Dec 13, 2023
@FabijanC
Copy link
Contributor

@sgc-code hey, do you need old state history? Like, being able to query old states with an old block_id?

@sgc-code
Copy link
Author

@sgc-code hey, do you need old state history? Like, being able to query old states with an old block_id?

Hi, @FabijanC if we have a flag to disable old history it could workaround our issue for now. We don't currently need to access the older blocks. Maybe you want to look into some more efficient storage options later. Thanks

@FabijanC
Copy link
Contributor

@sgc-code Sure we are also considering more efficient storage options, but were just considering conditional disabling of state storing as the quickest fix.

Thanks for the reply

@mikiw mikiw linked a pull request Dec 14, 2023 that will close this issue
10 tasks
@mikiw mikiw mentioned this issue Dec 15, 2023
10 tasks
@mikiw mikiw removed a link to a pull request Dec 15, 2023
10 tasks
@mikiw
Copy link
Contributor

mikiw commented Dec 15, 2023

So the flag is added here #290 but I'm still not sure if the current code is right for the flag set to StateArchiveCapacity::Full. I would treat this PR as an enable/disable feature PR but I didn't check if the enabled feature is implemented correctly.

@ivpavici ivpavici moved this from 🆕 New to 🏗 In progress in starknet-devnet-rs Dec 15, 2023
@ivpavici ivpavici added help wanted Extra attention is needed performance labels Dec 15, 2023
@mikiw
Copy link
Contributor

mikiw commented Dec 15, 2023

@sgc-code #290 was merged so now memory should not growing insanely

@FabijanC
Copy link
Contributor

FabijanC commented Feb 2, 2024

I suspect the largest part of the state history are contract classes. As has already been mentioned somewhere, storing a clone of state (together with declared classes) for each block is highly inefficient. That's why we are currently defaulting to not storing the state history.

Proposal

Instead of cloning all of the classes, we could have one global storage of type HashMap<ClassHash, (ContractClass, BlockNumber)> (wrapped in e.g. Arc<Mutex<...>> or Arc<RwLock<...>>) - meaning we could store the class together with the block number when it was added.

Why

So cloning a state would only clone the reference (Arc) to the contract class map.

Implementation details

When retrieving a class (in the response of the JSON-RPC method starknet_getClass), we would return the class only if stored_block_number <= query_block_number.

Refactoring might be needed to store a contract class together with the next block number, because currently we store the class before the next block is created.

@ivpavici ivpavici moved this from 🏗 In progress to 📋 Backlog in starknet-devnet-rs Apr 8, 2024
@ivpavici ivpavici added ODHack Issue to assign for the ODHack event Hard Task is hard to fix labels Apr 19, 2024
@ivpavici ivpavici removed the ODHack Issue to assign for the ODHack event label May 6, 2024
@mikiw mikiw self-assigned this May 7, 2024
@FabijanC FabijanC self-assigned this May 31, 2024
@github-project-automation github-project-automation bot moved this from 📋 Backlog to ✅ Done in starknet-devnet-rs Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hard Task is hard to fix help wanted Extra attention is needed OnlyDust performance
Projects
Status: Done
4 participants