Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R: Multi-seed parallel simulation #2313

Merged
merged 8 commits into from
Sep 26, 2018
Merged

Conversation

cwgoes
Copy link
Contributor

@cwgoes cwgoes commented Sep 12, 2018

  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added entries in PENDING.md with issue #

  • rereviewed Files changed in the github PR explorer

Ref #1924


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@ValarDragon
Copy link
Contributor

While we're at it, can we also get make test_gaia_sim_fast to use different seeds each time as well? This would mean the randomized testing can be covering a wider surface. (Procedure on a bug find would be to test on develop as well)

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 12, 2018

While we're at it, can we also get make test_gaia_sim_fast to use different seeds each time as well? This would mean the randomized testing can be covering a wider surface. (Procedure on a bug find would be to test on develop as well)

Maybe. I like having deterministic failures, it makes CI easier to work with. Definitely in favor of running lots of random seeds before any release.

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 12, 2018

Seems some of the Amino serialization code is not goroutine-safe, or we're using singleton codes which aren't goroutine-safe. Presently running multiple GaiaApps in goroutines doesn't work (run make test_sim_gaia_full). I wonder if we want to fix this or not (can also run multi-seeds from a bash script or something).

@ValarDragon
Copy link
Contributor

Not sure why we need this in go, running multiple in parallel within bash works fine on my machine.

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 12, 2018

Not sure why we need this in go, running multiple in parallel within bash works fine on my machine.

Immediately, probably not, but I could see it being advantageous in the future - running multiple instances of different Gaia apps during version upgrades, or maybe multiple "Apps" at once for a very fast IBC relay program.

@ValarDragon
Copy link
Contributor

I think we can punt figuring out how to test multiple chains together safely to postlaunch. I think we should go with parallel calls within CI / bash for now.

Testing on my system as well, your analysis seems right, its probably a singleton map instance somewhere.

@codecov
Copy link

codecov bot commented Sep 12, 2018

Codecov Report

Merging #2313 into develop will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #2313   +/-   ##
========================================
  Coverage    61.53%   61.53%           
========================================
  Files          122      122           
  Lines         7472     7472           
========================================
  Hits          4598     4598           
  Misses        2554     2554           
  Partials       320      320

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 12, 2018

I think we can punt figuring out how to test multiple chains together safely to postlaunch. I think we should go with parallel calls within CI / bash for now.

OK, I'm not entirely sure where the issue is so it might not be an easy fix - I still think we should find out though and make an informed decision.

Switched to a bash script for now.

@cwgoes cwgoes changed the title WIP: Multi-seed parallel simulation R4R: Multi-seed parallel simulation Sep 26, 2018
@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 26, 2018

This should be run as the "24-hour simulation" before cutting a release, possibly with more seeds / a higher number of blocks - and on a large multi-core machine.

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK -- just left a minor comment. Side note, are we not afraid we'll miss out on some potential bug discoveries by not using random seeds or will the fast sim use random seeds?

echo "Running full Gaia simulation with seed $seed. This may take awhile!"
file="$tmpdir/gaia-simulation-seed-$seed-date-$(date -Iseconds -u).stdout"
echo "Writing stdout to $file..."
go test ./cmd/gaia/app -run TestFullGaiaSimulation -SimulationEnabled=true -SimulationNumBlocks=1000 -SimulationVerbose=true -SimulationCommit=true -SimulationSeed=$seed -v -timeout 24h > $file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we break this command on multi-lines with \ to make it easier to read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@cwgoes
Copy link
Contributor Author

cwgoes commented Sep 26, 2018

Side note, are we not afraid we'll miss out on some potential bug discoveries by not using random seeds or will the fast sim use random seeds?

We should have a separate script which sequentially runs simulations with new random seeds and stops if it finds a failing seed - let's do that separately though - #2409.

@cwgoes cwgoes merged commit 91ee6b0 into develop Sep 26, 2018
@cwgoes cwgoes deleted the cwgoes/multi-seed-parallel-sim branch September 26, 2018 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants