Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loadgen stall #4155

Closed
mhofman opened this issue Dec 7, 2021 · 2 comments
Closed

Loadgen stall #4155

mhofman opened this issue Dec 7, 2021 · 2 comments
Assignees
Labels
bug Something isn't working SwingSet package: SwingSet telemetry
Milestone

Comments

@mhofman
Copy link
Member

mhofman commented Dec 7, 2021

Describe the bug

Under fairly heavy loadgen load, the chain or solo seem to sometimes no longer make progress loadgen tasks, even after a restart of the solo and loadgen client.

This has so far only been reproduced on a fairly old version of the sdk (6dc7152 / agoricdev-17), and only on the benchmark machine (my personal laptop never encountered such an issue).

To Reproduce

Steps to reproduce the behavior:

  1. Connect to benchmark machine
  2. docker run -e SDK_REVISION=6dc7152 loadgen-runner --stages=26 --stage.duration=60 --stage.loadgen.vault.interval=12 --stage.loadgen.amm.interval=12 --stage.loadgen.amm.wait=6 --stage.loadgen.vault.limit=10 --stage.loadgen.amm.limit=10
  3. Observe chain getting into a state where no loadgen tasks occur (empty blocks)

Expected behavior

Forward progress on pending tasks

Platform Environment

Additional context

I did not see any reported errors, but it's possible some rejection happened and wasn't handled. I haven't ruled out a loadgen solo agent bug not catching an error, but a cursory look indicate that all results are awaited or returned. A restart of the solo and loadgen client does not unblock anything, with the loadgen deploy script never reaching the ready state.

Screenshots

The full logs with chain storage captured by the runner can be found on the benchmark machine at /mnt/volume_sfo3_03/manual-phase45-fix/manual-6dc7152-with-storage

@mhofman mhofman added the bug Something isn't working label Dec 7, 2021
@mhofman mhofman added performance Performance related issues SwingSet package: SwingSet labels Jan 20, 2022
@Tartuffo
Copy link
Contributor

Tartuffo commented Feb 2, 2022

@mhofman has not seen this occur again after a medium number of runs. will close if we get more confidence that it is not an issue.

@Tartuffo Tartuffo added MN-1 and removed MN-1 labels Feb 2, 2022
@mhofman mhofman added telemetry and removed performance Performance related issues labels Feb 10, 2022
@mhofman
Copy link
Member Author

mhofman commented Feb 10, 2022

Closing, we can reopen if needed

@mhofman mhofman closed this as completed Feb 10, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 milestone Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working SwingSet package: SwingSet telemetry
Projects
None yet
Development

No branches or pull requests

2 participants