-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add network integration test for quick post sapling sync testing #1229
Conversation
I'd like to document the alternatives we've considered, so we retain this knowledge for when we modify tests in future. Can we write a narrowly-scoped design RFC, focusing on restartable tests with cached state? (I don't have all the context here, so I'd need pointers to tickets, PRs, or other discussions.) |
All of this is infra, work, and setup that can be handled for us by our cloud provider. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats to be expected, this was just me trying to set something up that would download the chain, but that takes like 8 hours so its definitely not gonna work from within a github actions job without the cached data |
right so this should be behind a flag/environmental variable |
We're currently using So I think another env var that enables the cached data tests would be a good idea. Can we commit a zebrad config file for the initial cached data setup? There isn't much point in doing the cached data setup inside the acceptance test framework. |
I don't agree on this point. There are advantages to doing it alongside our acceptance tests, or at least to doing it within rust code. We have a somewhat well polished framework for creating tests and interacting with the zebrad daemon and inspecting its output. We keep all of our tests in one place and one style, reducing the maintenance burden. And the issues with logging are not inherent to the test framework or difficult to bypass. All we have to do is use Alternatively, we could move them from being tests to being binaries or examples, which would let us maintain the same structure, while removing the test framework capturing stderr / stdout from the equation. Though we would need to do some code reorganization in that case, which is what makes me prefer to just write them as acceptance tests.
What tests do we have that we're ignoring but running in CI? My feeling is that we should only use |
b5d9438
to
02f109b
Compare
…e at Sapling activation for sync_past_sapling_mainnet test
…t and use for running sync_past_sapling_testnet
630c5e5
to
b05179b
Compare
Motivation
This change implements the second of three tests described in the basic network integration testing RFC. This test is meant to act as a quick test of sync behavior immediately post checkpointing.
Solution
This test is implemented by persisting a copy of the database that has been synced to sapling checkpoint height. This starting point is then used to sync the next few thousand blocks immediately after the Sapling activation.
Related Issues
Unresolved Questions
This design is currently an incomplete sketch of just the initial pieces of the final integration test. Syncing to a certain point and backing up the database, and starting from the backup if one already exists. This doesn't help us on our CI as it is currently implemented. Github Actions and Google Cloud Build both have little to no support for persistent data on test runners. We would have to copy the entire database over the network before each test run. There are a couple of solutions that I'm currently investigating:
VM + Custom Job Queue
Setup a persistent VM used to run these tests and some basic CI infra that can be used to dispatch and queue test jobs to that VM. That VM would then own the persisted copy of the up-to-sapling database. Test jobs would be sent from github actions and/or google cloud build, which would tell the remote VM to download the pre compiled zebrad image and run the test. Once the test completes it would reply back to the runner that requested the job, passing it the test results and letting its process exit.
Custom FUSE
Alternatively, we might be able to setup a simple FUSE that we can then mount in our Github Actions or Google Cloud Build containers. The FUSE would be only implement the minimum API needed to represent the single database file used by sled. It would implement copy on write functionality, where if sled tries to read a block that hasn't been updated locally it will read that block from the database file over the network, and if sled tries to write to an existing block or write a new one it would store the new block data in memory or on another file locally, rather than writing that data back to the database file on the network.
Edit: I've been doing more research on this option, it looks like all we'd need to do is setup a webserver to store the synced sled database files, then we could read bits and pieces of those files with https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests. Then we'd have a binary we run in ci to mount the FUSE in the tempdir, and based on which parts of the database file it needs to read it would either use the locally cached data (from previous writes) or do an http fetch to get the piece of the database it needs over the network for first time reads of old data.