Replies: 2 comments 4 replies
-
Improving VM Performance Black-Box Versioning |
Beta Was this translation helpful? Give feedback.
-
Thanks for writing this up @kantai! A few thoughts below.
I just want to double-check this benchmark. It takes nearly a second to do a function call in the Clarity VM? As in, just calling a function in the same contract? Are there confounding factors in Clarinet that could be causing this, or are we certain that this is solely due to work done within the VM itself? Can you explain in more detail as to how this performance metric was obtained (i.e. where did you start/stop measuring, and what function(s) did you test)?
I'm very skeptical that the MARF is the bottleneck for VM performance. I'll have more data next week with reproducible experiments, but some preliminary gatherings I made tonight suggest that MARF key/value-hash lookups are pretty fast (on the order of 10ms or lower). What I suspect is a major cause of slowness in the VM's storage layer is the Clarity sqlite DB, for a few reasons:
What this effectively means is that the bytecode spec and interpreter implementation are consensus-critical code paths as well. I point this out because an alternative (but naive and brittle) approach could be to support multiple bytecode back-ends from a consensus-critical Clarity IR as a means of supporting EVM, WASM, and other execution environments. I just want to make sure everyone reading this understands that this is not going to be supported, and should not be supported, because there is exactly one correct sequence of state-transitions in the VM's memory and disk state for any piece of Clarity code. Making sure that multiple bytecode back-ends do this would be extremely difficult. But that said, ...
Do we even care about EVM compatibility? We don't want TC smart contracts in the first place (so there's little gain in compatibility with existing EVM-compiled code or EVM-compiled languages), and the EVM isn't particularly well-designed to leverage real hardware (so it's not the performance win it could be). The former concern applies to WASM as well. Also, the bytecode VM (whatever it happens to be) will be somewhat tightly integrated with the Clarity DB and burnchain DB, due to the fact that we have many Clarity built-ins that load data about the blockchain itself. I'm not sure how much using an off-the-shelf VM helps us here, since the integration work will likely be substantial.
I think this is great in principle, but the devil will be in the details. We'll need to think long and hard about how |
Beta Was this translation helpful? Give feedback.
-
I'd like to propose two major changes to the Clarity VM with the introduction of Clarity 3. The first of these is focused on performance improvements to the VM. The second of these, which is probably a necessary step toward the first, is focused on backwards-compatibility and safety.
Improving VM Performance
Fundamentally, the Clarity VM is an eval/apply mutually recursive interpreter. This is a very easy kind of interpreter to build and maintain. However, there are generally performance consequences to this kind of interpreter: the mutual recursion means that the VM isn't very stack friendly, there's costs associated with all of the necessary dispatching, and the interaction of recursion and the borrow checker means that the interface to the eval/apply require a fair amount of cloning. In addition, the VM execution exhibits very little spatial locality because each AST node could reside in a different heap allocated chunk. There's a good reason few production interpreters are implemented this way: it's slow.
To improve on Clarity's runtime, I'm proposing moving to bytecode interpretation. To be clear, contract publish transactions will continue to publish source code, and the source code would itself still be thought of as the consensus critical component: just as a bug today in the Clarity VM is a "consensus bug", so too would a bug in the bytecode compilation (and any bug in the subsequent bytecode execution).
This is not necessarily as dramatic of a change as it sounds. The byte code change could be as simple as a linearization of the AST, with the implementation impacting mostly the "special" Clarity functions (e.g.,
fold
,filter
,contract-call?
) and, of course, the eval loop. In any event, the static analysis passes, which operate on the AST, could be largely unchanged.However, this would also present a very good inflection point for deciding on targeting an existing bytecode language. Production of WASM or EVM compatible bytecode would present a clear pathway toward compatibility support for other blockchains.
I fully realize that this proposal is opening a big change in the Clarity VM, however, I think its worthwhile to engage with
this. Testing moderately complicated contracts such as https://github.com/hirosystems/stacks-pyth-bridge in Clarinet (where any MARF runtime overhead is eliminated, because it does not use a MARF) reveals function calls that take up to 800-900ms on modern hardware. While this may not be the major bottleneck right now (that is still MARF speed), it is likely to be the next bottleneck and if the Stacks blockchain is going to support anywhere near the speeds of other blockchains, the Clarity VM will need to be an order of magnitude faster (at least). The obvious path to that is bytecode execution.
Black-Box Versioning
The Clarity 2 <-> Clarity 1 upgrade occurred in Stacks 2.1. Stacks 2.1 continues to support both Clarity versions. The implementation of this upgrade involves the same Clarity VM executing both Clarity 1 and Clarity 2 contracts. This is a somewhat risky implementation strategy, because the Clarity 2 changes must be made in such a way that existing Clarity 1 contract behavior does not change. While this is risky, the risk in Clarity 2 was somewhat limited due to the more limited nature of the Clarity 2 changes. The type system remains largely the same, with some tweaks the trait invocation (which did create a consensus bug in Stacks 2.2, neccissitating a fix in Stacks 2.3); and beyond that, the changes were limited to new functions and variables.
If more changes are proposed, however, the risk dramatically increases of inadvertently breaking backwards compatibility, or of introducing new bugs due to esoteric implementation requirements of a single VM handling 3 different contract versions. For example, if the type system were to expand with new 256-bit integer types (or smaller integer types like 32/64), the impact on the codebase could be quite damaging: now we'd need to manage two incompatible type hierarchies, and check on every function call whether or not the current value matches the expected type hierarchy of the current contract version.
The alternative to this is a kind of black-box versioning. In a perfect world, the way this would work is that the
blockstack_lib
output in thestacks-blockchain
repo would include two differentclarity
dependencies (e.g.,"2.0.0"
and"3.0.0"
). The transaction handler would invoke the correct library depending on the version in a contract publish transaction, or based on the stored contract's version. There are some difficulties here. For example,contract-call?
invocations would need to invoke a different VM. However, these difficulties would make the compatibility code explicit, and the boundaries between Clarity versions would therefore be well defined.Beta Was this translation helpful? Give feedback.
All reactions