How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678

anorth · 2022-09-19T21:21:32Z

This discussion is branched from filecoin-project/ref-fvm#888 which discusses gathering metrics on the EVM actor storage specifically. Such costs are probably dominated by storage access (which is easy to account for with mock/test runtimes) but they reasonably want to profile the whole gas cost too. This is very likely something to be useful to future built-in actor development in general, and also can show an example to other actor developers.

One option is to

Add a dev-dependency on ref-fvm and call the contract through Wasm to run all the gas cost machinery.

What are the tradeoffs of this option? What are the alternatives?

@ZenGround0 @Stebalien @raulk @Kubuxu @aakoshh

anorth · 2022-09-19T21:33:42Z

Some challenges with importing ref-fvm in to builtin-actors as a dev dependency:

Loss of abstraction. It's supposed to be an abstract machine. It might be worth it, but my starting point is to resist.
Added build-time cost of building a whole VM in addition to the actors.
Friction of depending on tooling that's in another repo and versioning scheme. I'm continually running into utility code that I want to change for actor development but that's in the ref-fvm repo. The originating issue prompted me to discover there's already a TrackingBlockstore in fvm_ipld_blockstore, but it can't be dropped into the mock runtime as-is, and can't be modified easily from the actors side. We need a whole FVM release train to make tweaks. The obvious path forward is to copy and customise it to builtin-actors needs. These issues will increase if we depend on more such code.
Complexity of synchronising versions here. It's already proven challenging to import consistent versions of the various deps that builtin-actors have on crates like fvm-shared. This compounds with the above point, where pulling in new utility/dev code might also trigger a wholesale upgrade of unrelated APIs. I would generally aim to break these dependencies rather than add to them.

Considerations like the above motivated the development of the testing VM in specs-actors (rather than depending on Lotus), which worked great. This is basically a fully-functional VM with testing hooks and fake chain data. This was mostly ported to builtin-actors.

A key difference in this case is that the WASM build and execution is necessary for accurately profiling gas consumption.

One possible alternative path here is to build an integration VM in builtin-actors repo that executes WASM. I.e. replicate a skeleton FVM. A possible objection to this is that it would be a waste of effort to "duplicate" that code, but that argument applied to the Go case too, but implementing a thin VM specialised to testing turned out to be not as hard as expected, and a great investment.

Note: I am not committed or hard against any approach at the moment, but think it's worth seriously weighing options before embedding one.

Stebalien · 2022-09-19T22:33:08Z

Such costs are probably dominated by storage access (which is easy to account for with mock/test runtimes) but they reasonably want to profile the whole gas cost too.

In this case, probably. But really, the goal is to find the right balance so we need to benchmark both runtime and storage costs.

In general, the idea that storage costs dominate no longer holds in many cases. This is where we ran into trouble in the nv17 upgrade.

Added build-time cost of building a whole VM in addition to the actors.

Ideally, benchmarks would happen in a separate job and wouldn't block merge. We could even consider running them (or some of them) on master/next directly rather than on every PR.

Also note: the builds would (should) be heavily cached.

Loss of abstraction. It's supposed to be an abstract machine. It might be worth it, but my starting point is to resist.

The alternative is to implement another VM with the same gas semantics. This worked before because the VM was relatively simple, but the VM and gas model are now more complex and will only get more complex.

I'd really like to have multiple implementations, but that's not really something any of us have time to make happen.

Friction of depending on tooling that's in another repo and versioning scheme.

I agree that this is really frustrating. We should try to keep the shared dependencies to a minimum.

Unfortunately, we do have a shared blockstore interface because we share the HAMT/AMT data types. One solution (actually, something we had in the past) would be to:

Define a blockstore trait just for the HAMT/AMT.
Implement that blockstore trait for the FVM's blockstore trait, and the actor's blockstore trait.

However, we moved away from that because it quickly got confusing (and matching the abstractions had a quite a few rough edges).

Also note: this isn't really relevant to this issue, from what I can tell. We'd be using the integration testing framework as a black-box in the benchmark suite.

Complexity of synchronising versions here.

The key here is isolation. The ref-fvm dependency must be isolated to the benchmarks. If we do that, it shouldn't affect anything else.

Stebalien · 2022-09-19T22:35:35Z

Note: if/when the FVM stabilizes a bit more, we'll be in a better place to re-implement simpler versions. But at the moment, I think we'd spend too much time trying to suss out subtle differences.

anorth · 2022-09-19T22:38:25Z

Note on build time, I'm more concerned about local dev than CI. If we can reliably ensure that FVM isn't built when running common testing commands (make test, cargo test --workspace etc) and only when specifically invoked, then this will be reasonably mitigated. But I confess I'm not confident.

Stebalien · 2022-09-19T23:00:16Z

We can exclude it from the make test target, and benchmarks won't be run on cargo test.

We can also exclude tests by default with #[ignore] (passing cargo test --include-ignored to run ignored tests).

anorth · 2022-09-20T02:32:34Z

Yeah, but will benchmarks or #[ignore]d tests be built (thus building fvm)? E.g. when doing random unit tests from IDE

aakoshh · 2022-09-20T08:20:19Z

My impression of gas was that it should be subject to continuous estimation, exercising a diverse set of contracts to calibrate prices to approximate execution time. It will probably never be perfect, not least because it's done on an idealised configuration that validators may or may not be running.

In the case of built-in actors in the linked issue, we are talking about very specific storage layout strategies the Solidity compiler is using, which are at odds with the HAMT. We have a chance to measure the actual underlying variables of storage access by tracking the number and volume of the data blocks we read and write. This ignores time, by using an in-memory backend.

I would argue that using gas here would be muddying the water by applying coefficients on these values and making it harder to understand if the high cost comes from bad pricing or inefficiencies in the data structure.

Gas may be doubly confusing when it comes to storage if we think about the concept of rent, ie. should storage be charged not just for access time but the fact that it will occupy the disk for a long time, and the related refund for deletions.

Long story short, I think it's okay to treat at least this instance of the EVM built-in actor in isolation to minimise the main cost drivers, and look at gas calibration in the context of all contracts in the ref-fvm repo, which already has some example contracts for integration testing.

aakoshh · 2022-09-20T11:09:27Z

On the other hand if I were developing a specific contract where I know exactly what my access patterns are going to be from the domain, but I don't know what would be the best storage schema for it, it would be great to be able to use the actual gas prices to tune the model. There are certain tradeoffs that only the contract authors can make, and to them it's easier to present a gas cost breakdown by labels than fragmented sets of variables from storage and other areas that track them in ad-hoc ways.

anorth · 2024-01-22T21:05:17Z

I believe this is answered by the repository anorth/fvm-workbench. We have used this to profile gas usage on the FVM multiple times, and we can run many of the existing integration tests directly on the FVM.

(The crates in this repo will probably be moved into filecoin-project repositories over time).

anorth mentioned this issue Sep 19, 2022

EVM Storage: Baseline test filecoin-project/ref-fvm#888

Closed

This was referenced Sep 19, 2022

Implement verified claim term extension by client #669

Merged

Copy fvm_ipld_blockstore code into this repo, drop dependency #679

Closed

anorth mentioned this issue Nov 14, 2022

Request: stand-alone VM with instrumentation (especially gas) filecoin-project/ref-fvm#1080

Open

anorth mentioned this issue Mar 2, 2023

Refactor integration tests to use abstract VM trait #1236

Closed

anorth closed this as completed Jan 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678

How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678

anorth commented Sep 19, 2022

anorth commented Sep 19, 2022 •

edited

Loading

Stebalien commented Sep 19, 2022

Stebalien commented Sep 19, 2022

anorth commented Sep 19, 2022

Stebalien commented Sep 19, 2022

anorth commented Sep 20, 2022 •

edited

Loading

aakoshh commented Sep 20, 2022

aakoshh commented Sep 20, 2022

anorth commented Jan 22, 2024

How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678

How to do detailed integration testing and benchmarking of built-in actors – import ref-fvm? #678

Comments

anorth commented Sep 19, 2022

anorth commented Sep 19, 2022 • edited Loading

Stebalien commented Sep 19, 2022

Stebalien commented Sep 19, 2022

anorth commented Sep 19, 2022

Stebalien commented Sep 19, 2022

anorth commented Sep 20, 2022 • edited Loading

aakoshh commented Sep 20, 2022

aakoshh commented Sep 20, 2022

anorth commented Jan 22, 2024

anorth commented Sep 19, 2022 •

edited

Loading

anorth commented Sep 20, 2022 •

edited

Loading