-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678
Comments
Some challenges with importing ref-fvm in to builtin-actors as a dev dependency:
Considerations like the above motivated the development of the testing VM in specs-actors (rather than depending on Lotus), which worked great. This is basically a fully-functional VM with testing hooks and fake chain data. This was mostly ported to builtin-actors. A key difference in this case is that the WASM build and execution is necessary for accurately profiling gas consumption. One possible alternative path here is to build an integration VM in builtin-actors repo that executes WASM. I.e. replicate a skeleton FVM. A possible objection to this is that it would be a waste of effort to "duplicate" that code, but that argument applied to the Go case too, but implementing a thin VM specialised to testing turned out to be not as hard as expected, and a great investment. Note: I am not committed or hard against any approach at the moment, but think it's worth seriously weighing options before embedding one. |
In this case, probably. But really, the goal is to find the right balance so we need to benchmark both runtime and storage costs. In general, the idea that storage costs dominate no longer holds in many cases. This is where we ran into trouble in the nv17 upgrade.
Ideally, benchmarks would happen in a separate job and wouldn't block merge. We could even consider running them (or some of them) on master/next directly rather than on every PR. Also note: the builds would (should) be heavily cached.
The alternative is to implement another VM with the same gas semantics. This worked before because the VM was relatively simple, but the VM and gas model are now more complex and will only get more complex. I'd really like to have multiple implementations, but that's not really something any of us have time to make happen.
I agree that this is really frustrating. We should try to keep the shared dependencies to a minimum. Unfortunately, we do have a shared blockstore interface because we share the HAMT/AMT data types. One solution (actually, something we had in the past) would be to:
However, we moved away from that because it quickly got confusing (and matching the abstractions had a quite a few rough edges). Also note: this isn't really relevant to this issue, from what I can tell. We'd be using the integration testing framework as a black-box in the benchmark suite.
The key here is isolation. The ref-fvm dependency must be isolated to the benchmarks. If we do that, it shouldn't affect anything else. |
Note: if/when the FVM stabilizes a bit more, we'll be in a better place to re-implement simpler versions. But at the moment, I think we'd spend too much time trying to suss out subtle differences. |
Note on build time, I'm more concerned about local dev than CI. If we can reliably ensure that FVM isn't built when running common testing commands ( |
We can exclude it from the We can also exclude tests by default with |
Yeah, but will benchmarks or |
My impression of gas was that it should be subject to continuous estimation, exercising a diverse set of contracts to calibrate prices to approximate execution time. It will probably never be perfect, not least because it's done on an idealised configuration that validators may or may not be running. In the case of built-in actors in the linked issue, we are talking about very specific storage layout strategies the Solidity compiler is using, which are at odds with the HAMT. We have a chance to measure the actual underlying variables of storage access by tracking the number and volume of the data blocks we read and write. This ignores time, by using an in-memory backend. I would argue that using gas here would be muddying the water by applying coefficients on these values and making it harder to understand if the high cost comes from bad pricing or inefficiencies in the data structure. Gas may be doubly confusing when it comes to storage if we think about the concept of rent, ie. should storage be charged not just for access time but the fact that it will occupy the disk for a long time, and the related refund for deletions. Long story short, I think it's okay to treat at least this instance of the EVM built-in actor in isolation to minimise the main cost drivers, and look at gas calibration in the context of all contracts in the ref-fvm repo, which already has some example contracts for integration testing. |
On the other hand if I were developing a specific contract where I know exactly what my access patterns are going to be from the domain, but I don't know what would be the best storage schema for it, it would be great to be able to use the actual gas prices to tune the model. There are certain tradeoffs that only the contract authors can make, and to them it's easier to present a gas cost breakdown by labels than fragmented sets of variables from storage and other areas that track them in ad-hoc ways. |
I believe this is answered by the repository anorth/fvm-workbench. We have used this to profile gas usage on the FVM multiple times, and we can run many of the existing integration tests directly on the FVM. (The crates in this repo will probably be moved into filecoin-project repositories over time). |
This discussion is branched from filecoin-project/ref-fvm#888 which discusses gathering metrics on the EVM actor storage specifically. Such costs are probably dominated by storage access (which is easy to account for with mock/test runtimes) but they reasonably want to profile the whole gas cost too. This is very likely something to be useful to future built-in actor development in general, and also can show an example to other actor developers.
One option is to
What are the tradeoffs of this option? What are the alternatives?
@ZenGround0 @Stebalien @raulk @Kubuxu @aakoshh
The text was updated successfully, but these errors were encountered: