-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEP-481: Synchronous wasm submodules #481
Conversation
5954cfb
to
2e50b54
Compare
Thank you @mooori for submitting this NEP. As a moderator, I reviewed this NEP and it meets the proposed template guidelines. I am moving this NEP to the REVIEW stage and would like to ask the @near/wg-protocol members to assign 2 Technical Reviewers to complete a technical review (see expectations below). Just for clarity, Technical Reviewers play a crucial role in scaling NEAR ecosystem as they provide their in-depth expertise in the niche topic while work group members can stay on guard of the NEAR ecosystem. The discussions may get too deep and it would be inefficient for each WG member to dive into every single comment, so NEAR Developer Governance designed this process that includes subject matter experts helping us to scale by writing a summary with the raised concerns and how they were addressed. Technical Review Guidelines
Please tag the @near/nep-moderators once you are done, so we can move this NEP to the voting stage. Thanks again. |
@mooori thanks for the submission! Do you mind explaining the permission model in more details? For example, how would a contract decide who can deploy submodules to the contract? |
By default, there’s only the impl Contract {
pub fn deploy_submodule(&mut self, key: Vec<u8>, wasm: Vec<u8>) {
// Check permissions and if they are satisfied, then trigger the
// `DeploySubmodule` action to store submodule `wasm` under `key`.
}
} According to this approach, each contract can/must implement custom logic to check permissions. If a contract operates under the assumption that submodules are untrusted code, it might even allow anyone to deploy submodules. This should be possible without introducing vulnerabilities as the set of host functions available to submodules is very limited (ref. section Trustless). For contracts that want to restrict who is permitted to deploy submodules, the AccessControllable contract plugin might be a helpful tool. It facilitates restricting public methods to be invoked successfully only by accounts that have been granted user defined roles. |
@mooori another question: does Aurora plan to only store wasm submodules compiled from EVM bytecode onchain or does it plan to store EVM bytecodes in the engine as well? |
@bowenwang1996 this question is a little tangential to the NEP itself since Aurora's usage only motivated the proposal, but the proposal is more general than only our usage. That said, our idea is to have a sort of "upgrade" process where initially new EVM contracts are interpreted via SputnikVM (EVM bytecode on chain as it is today), but if the EVM contract gets a lot of usage then we will compile it to a submodule. This will make the Aurora Engine more efficient on Near overall because compiled EVM contracts will use less Near gas when they execute. |
@mooori, @encody: would it be possible for one or both of you to write down a few sentences comparing this NEP and https://github.com/near/NEPs/pull/480/files please? There seems to be quite a bit of overlap. |
From my perspective of being more familiar with NEP-481, some of the differences are: Visibility and privacyA submodule can be executed only if the parent contract has a function which triggers the execution of the submodule. A function of wasm deployed to a namespace can be invoked directly by specifying the namespace in the StateSubmodules do not have their own state and cannot access the state of the parent contract. However, a contract may implement a custom protocol of providing submodules access to state on top of the data that can be exchanged between parent and submodule. Each namespace has its own state which is isolated from the state of other namespaces on the same account. Host functionsThe set of host functions available to submodules is limited to allow yielding back to the parent and exchanging data with the parent. Reading through NEP-480 I would assume that a contract deployed to a namespace has access to the same set of host functions as a regular contract (with state being separate as mentioned above). Synchronous executionSubmodules are executed synchronously, which is a key feature of this proposal and supported in the PoC implementation. Also namespaced contracts should be executable synchronously, though I am not sure if it is a top priority there as well. For instance, maybe a first implementation of account namespaces enables only asynchronous execution and synchronous execution would be added later on. @encody In case I misunderstood any details of the account namespaces proposal, please let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I’m missing in this NEP is a limit on submodule call depth. It seems like the limit is implicitly 2 because the submodules are specified to have a limited access to host functions, but then there's a fair bit of text about integrating with account extensions and/or expanding number of host function calls available to call.
I wonder if it would be better to make the presence, and configuration, of such limit, explicit somehow. It could be configurable by the contract that's using submodules, or in the runtime, or both.
neps/nep-0481.md
Outdated
|
||
### Submodule host functions | ||
|
||
The submodule can import host functions to read data received from the parent, yield back to the parent and set a return value on termination. A `gas` host function must be available for gas accounting and to meet the [requirements](https://github.com/near/nearcore/blob/83b1f80313ec982a6e16a1941d07701e46b7fc35/runtime/near-vm-runner/src/instrument/gas/mod.rs#L396-L402) of nearcore wasm instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of gas
function is entirely internal to the runtime and its exposure to contracts is accidental. If we're setting up distinct lists of exported host functions for parent contract modules and submodules as proposed here, there doesn't seem to be much value in exposing that function either (unless there's value in allowing these submodules to waste some gas ^^)
(The way code is structured today may require gas
still, but I think the preferable option would still be to refactor so that it isn't, rather than entrenching this mistake any further)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, currently the PoC implementation requires a gas
host function. Without it, preprocessing of the submodule’s wasm fails with an error of:
Link(Import("env", "gas", UnknownImport(Function(FunctionType { params: [I32], results: [] }))))'
Given that this is accidental, the refactoring of nearcore such that the gas
function is not required anymore would be a change separate to the implementation of this NEP? Then we could leave a note in the NEP to remove gas
once that will be possible. For this PoC implementation it should be a rather small change, presumably it only requires to revert these changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only want from my end is to not accidentally expose this to the contracts again for no good reason. And yeah, it would mean modifying the way the contract runtime works in this respect if the implementation of this NEP happens to precede landing of finite-wasm work.
neps/nep-0481.md
Outdated
|
||
The proposed wasm submodules can be interpreted as account extensions with tight restrictions. A wasm submodule is an account extension that can be executed only by the parent contract itself. It can access only submodule host functions which allow it to yield back to the parent and exchange data with it. Host functions available to regular contracts cannot be invoked by the submodule. A submodule cannot read or write the parent’s storage and has no storage associated with itself. | ||
|
||
Due to its complexity, account extension functionality might be implemented in multiple stages. Synchronous wasm submodules could be an initial stage, which progresses towards account extensions as the restrictions mentioned above are lifted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather look at these two different proposals not as two features that might potentially compose some day, but rather as proposals that must compose. I think keeping this in mind we will end up with something that's more generally useful and might even find new uses in future improvements to the protocol.
In fact, I believe there is very little that needs to be done to make sure this holds true. Today the two sets of proposals seem quite analogous to me in their attempt to introduce two distinct (OOP) objects with their own slightly differing semantics. If we instead separate data and methods we might quite easily end up in a place where we don't even need to draw any comparisons between the NEPs, because each one is useful on their own. In particular one way forward I could see is focusing entirely on introducing a mechanism to make a synchronous function call to an isolated wasm-core-1 module with a communication channel of some sort. Even without changes to the data model this feature can be used in some way by the contract making self-calls (and for Aurora specifically, they could temporarily bake contracts into their contract while the data model changes are underway; this would likely make this proposal much more straightforward to think 'bout & implement too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we instead separate data and methods we might quite easily end up in a place where we don’t even need to draw any comparisons between the NEPs, because each one is useful on their own.
Intuitively I’m thinking changes to the data model such as seperate state are useful only once there exist some kind of separate entities (like submodules or contracts deployed to a namespace) which can be called. Which makes me wonder how changes to the data model along the lines of NEP-480 could be useful on their own? Probably I’m missing or misunderstanding something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was saying "data model changes" I meant specifically extending the protocol with a concept of an account extension or a submodule as described here. But I also think you misunderstood the order of steps I was suggesting to be taken. My thinking was that the data model changes would be the very last step once the actions and APIs necessary for both this and account extension are in place. But this is just my brainstorming. Ultimately the substance of the earlier message is:
I would rather look at these two different proposals not as two features that might potentially compose some day, but as proposals that must compose. I think keeping this in mind we will end up with something that's more generally useful and might even find new uses in future improvements to the protocol.
I don't have a particularly strong opinion on how this is achieved, but I think it is important that we don't end up with both submodules and account extensions in the protocol, because unifying them is likely to be infeasible.
neps/nep-0481.md
Outdated
|
||
Limiting the interface between a parent and its submodules to passing bytes introduces overhead in cases that require interaction. For instance, instead of directly persisting data in storage a submodule has to serialize data and send it back to the parent. The parent then needs to deserialize the data, write it to storage and resume the submodule. Besides requiring extra logic this pattern also increases the number of host function calls. | ||
|
||
That overhead could be reduced by extending the interface, for instance by making more host functions available to submodules. The trade-off with giving submodules direct access to more host functions is complexity in permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something that could be implemented by introducing a mechanism for the "start submodule" to specify which host functions are available to the callee.
The scope of the implementation of this NEP would increase significantly by:
How would this affect the requirements for the reference implementation respectively PoC? Landing all these features in nearcore will probably be a multi-step process. Would a reference implementation for something that could be landed in a first step suffice or is the reference implementation required to support all features? |
The only concern to be aware of is that any contract-facing breaking changes are infeasible in NEAR, unfortunately. With that in mind it is important that the end result that's striven for is clear, so that a plan to implement the feature the right way can be laid out. Outside of that, the introduction and implementation of the feature may take as many steps as necessary or convenient. |
@mooori I agree with what Simonas said. If we can agree on the end goal (on the spec level) and there is a poc to demonstrate general feasibility, the implementation can take many steps. I think it is important, as Simonas pointed out, to align on how we want to reconcile this NEP with the account extension NEP. cc @encody @firatNEAR |
As far as I'm aware, there is some discussion that still needs to be had to hash this out between these two NEPs. I will be sure to report back with updates. |
@birchmd and I had a fruitful discussion about combining NEPs #480 and #481. Here are my notes from that meeting. Namespaces/Submodules Priorities:
NEP layers:
Edge cases:
|
Thanks for the comments @nagisa and @bowenwang1996 I agree that it is important to have a clear vision on how this NEP and #480 align with each other. I just had a call with @encody where we discussed this. He has posted more detailed notes above, but here is my high level summary. We will consider Account Namespaces to be a logical prerequisite for Synchronous execution (I say "logical" because we could work on getting the implementations done in parallel). Therefore this NEP will be an extension of the previous one. The additional functionality this NEP adds is to allow specifying another field in the There are still some details to work out, but we'll do that next week. I'll make another comment once we have finished updating this NEP to be an extension on top of #480. |
I’m glad to hear positive news here. I do have a question though: what is the reason to add a flag for whether the contract is synchronous-vs-asynchronous to I guess, personally, considering the developer UX, I would prefer synchronicity of the contract to be a property of the contract code, rather than the deploy action. |
Thanks for the comment @nagisa ! I'll keep this in mind as I am revising the NEP. Part of the purpose of the deploy-time flag was also to mark external visibility (is this namespace callable from other accounts using the |
@nagisa @encody @bowenwang1996 I've revised the NEP to be based on the changes proposed in NEP-480. Please take another look and let me know what you think. |
Do we know how much this would actually save Aurora in gas per month? I can see one benchmark which is numeric + writing to memory that has been hand written in rust rather than cross compiled from EVM code. This seems like the most favorable compiled/interpreted benchmark possible and is likely not representative of the speedup they're actually going to see. How does this compare to WebAssembly Components? Since we're going to have to implement them eventually anyway, is it worth us holding off until then? We'd then get education, tooling and runtime support for free. |
Thanks for getting involved in the discussion @DavidM-D
You are right that benchmark is not typical, but gives us a ceiling for this approach. It turns out that ceiling is quite high as that example had a 1500x speed-up. We have another example which sort of gives us the floor. In that example I manually translate the ERC-20 contract into Rust (keeping the Solidity ABI) and compile it to Wasm. Since ERC-20 is almost entirely IO (which will be identical in both the EVM interpreter and Wasm cases because IO is done via Near host functions), we expect there to be very little improvement from this approach. But still in that case we saw a 15% improvement, which is a pretty good floor all things considered. Without doing a more comprehensive study (which is something we hope to do eventually) we can't say for sure how much gas Aurora will save. But with a floor of 15% and ceiling of 99.9% savings, we could reasonably expect 40% or 50%. But it's also worth mentioning that it's not just about current savings, but also about creating new possibilities. There are some use cases that are impossible on Aurora today because of gas limitations. Near's 300 Tgas limit translates into around a 3 million EVM gas limit for transactions on Aurora, which is lower than the block gas limit on Ethereum. There are real use cases that require more than 3 million EVM gas; for example flashloans can use a lot of gas as they interact with multiple defi components. The only way for Aurora to enable these use cases is to make our contract more efficient at executing EVM contracts and this synchronous Wasm approach is by far the most promising direction we have.
This proposal intentionally does not interact with Wasm Components. Our proposal provides a way for the Near runtime to compose multiple independent Wasm VMs, as opposed to some kind of dynamic loading within a single VM. The reason we are avoiding Components is because it is unclear what the timeline would be for the standard to be finalized and then included into the production VM Near uses. A 2 year timeline for example is not acceptable given that Aurora wants to make use of this feature sooner that this. Of course when the Components feature is eventually released it should be possible to implement the host functions described in the proposal using that feature instead of independent VMs. This will be a welcome performance improvement which should lower the gas overhead of making a synchronous call on Near. |
@birchmd: some clarifying questions please. So if I understand it properly, aurora would like to have this feature because currently they are executing EVM bytecode in an interpreter which is quite slow compared to compiling it down to wasm. Is this correct? If so, just a bunch of questions on the flow if you do not mind please.
I suppose, in general, I am trying to understand what is the aurora flow today and how will it change after this NEP is accepted. |
We do have a compiler in progress and the plan would be to use this compiler to translate EVM contracts to Wasm. Using the same benchmark as where we saw the 1500x speedup using hand-written Rust, the compiler sees only a 15x speedup. But still getting an order of magnitude from the EVM bytecode directly (no manual intervention required) is promising. We currently have two engineers on staff at Aurora actively working on the compiler.
This is an approach we have considered before. It is technically feasible, but certainly not ideal. The main reasons are (1) deploying an update to Aurora Engine requires approval of the DAO, creating additional administrative overhead; (2) the Aurora Engine is a large contract (almost 1 MB), so it would be pretty inefficient to deploy the whole thing frequently.
No. The idea is similar to how you do not need to re-install VS Code when you add a new extension (or think of any other example where plug-ins are used on the regular). The core app is designed to know how to communicate with any extension that follows the right interface. We would define the interface between the Aurora Engine and any module being used as an EVM contract, then to add a new module we would only need to call an admin-only function to dynamically redirect the Engine's control flow for the address we are adding the Wasm module for. This control flow redirect mechanism already exists in the Engine since it is how custom precompiles (like those that are involved in bridging) are implemented.
The issues would be mitigated by a gas limit increase. I have suggested this before and increasing the limit is also not a trivial protocol change. I say "mitigated" not "solved" because we want at least a factor of 10 increase in the amount of EVM gas we can process in a single Near transaction, but there is no way we can increase Near's transaction gas limit by a factor of 10 (the computation needed would exceed the block time).
Users deploying EVM contracts using EVM transactions sent to the Aurora network. Essentially it is the equivalent of the source of bytecode on Ethereum mainnet or even Near itself (of course Near contracts are Wasm bytecode not EVM bytecode).
Yes, there are. Probably a few hundred per week. But we would not compile all the EVM contracts to Wasm. We would only compile the ones that are used frequently because those are the ones that matter for gas usage. Another case would be a specific partner who knows in advance the gas limit will be a problem, then we may compile their EVM contract to Wasm from day 1.
No. As of today EVM bytecode is persisted in the Aurora Engine contract storage. In a world with the proposed synchronous Wasm feature new contracts would be deployed under new namespaces so the core Engine contract does not need to change.
We will do the compilation off-chain and deploy the result. Obviously there are security considerations here, but we would make the compilation process deterministic so that users could verify the Wasm module matches its source EVM bytecode. The correctness of the compiler is also important of course, but we will have extensive testing (possibly including fuzzing or formal methods) to ensure this.
The compiler today is quite slow (can take multiple minutes for a single contract), but as an off-chain process that is not a problem. Getting the compiler to the point where it could happen on-chain is not feasible in my opinion. |
@birchmd: thanks for the detailed response. Just to make sure I understand, the flow is the following:
Is the above correct? Another naïve question please. Will it not be possible to deploy the compiled wasm contracts into a separate account and call out to these accounts to execute them? I suppose implementation wise it should not be so complicated but the problem will be latency there of cross contract calls? |
This isn't quite right. We will not store the Wasm modules in the Aurora Engine contract storage because there is no (proposed) way to use it from there. The modules will need to be deployed to namespaces (following the specification of NEP-480) of the Aurora Engine account.
No, this does not work with synchronous execution because separate accounts may be on separate shards. Aurora's use case requires synchronous execution because the EVM is synchronous and therefore we need synchronous calls for these Wasm modules to be used as part of an EVM execution. I also maintain that synchronous calls are important for other use cases as well. The main thing they provide is atomicity (all actions are committed or none are). Composing programs into an atomic result is easier (safer) than trying to compose programs in a non-atomic way because you do not need to worry if all intermediate states maintain whatever invariants are important to the security of the overall interaction. |
@birchmd: thanks for the response. It seems to me that a more general problem to solve here might be how to do atomic cross contract calls between contracts on a single shard. It seems like namespaces are being used to ensure that the contracts are in the same shard. So I would imagine that if we had a mechanism of doing atomic cross contract transactions on a single shard, then that might cover aurora's usecase. Having said that, I do believe that implementing that would be quite a big task and also require quite a significant change to the user experience i.e. exposing the concept of shards. WDYT? |
Exactly, allowing multiple Wasm modules on a single account is specifically to ensure all data is available in the same shard. So, yes, having a more general mechanism where cross-contract calls between any two accounts in the same shard would work for Aurora (and any other use case I can think of since we would essentially replace the concept of namespaces with the concept of sub-accounts, which already exists). However, I also agree that this is a departure from Near's original design philosophy and complicates the user experience. Shards have always been invisible to users both because it simplifies their experience and because it means that there are no restrictions on resharding (the shard boundaries can theoretically be moved at any time to optimize execution). If we expose shards on the user level we lose both of these benefits. Allowing multiple Wasm modules on a single account also introduces complexity for users, so whether we expose shards or introduce namespaces is maybe equivalent from a UX complexity standpoint. But resharding is an important point to consider. Is it worth giving up potential future optimizations in exchange for having synchronous execution between some accounts? I am leaning towards 'no', but if resharding is not anywhere on the horizon then maybe I could be convinced otherwise. |
Based on experience, I have my reservations against abstracting sharding. Generally, power users can use a network / system more efficiently when it is possible to bypass such restrictions. Still I think we are going off tangent. I think we should identify exposing sharding as an alternate solution to this problem and let the WG decide on the best path forward. In that spirit, I would like to see a brief discussion under https://github.com/near/NEPs/pull/481/files#diff-c42ac558a9ca73f718e12332e4d86a94478776ee3fa8780dfef53bf8c13a4268R142 on the exposing sharding approach. |
@akhi3030 the protocol working group discussed the topic of whether we should allow synchronous execution within the same shard and the answer is no, mostly because it deviates from homogeneous sharding, which is a fundamental design philosophy of NEAR. While the opinion of the working group may change in the future, this is considered the final decision for now. I also think that if we allow synchronous execution within the same shard, then naturally we will shift towards a model where almost all the calls are synchronous because of the desire to take advantage of it. Then we essentially move towards an appchain-centric model like many other blockchains. |
Thank you for the clarifications @bowenwang1996. This helps me as an SME in figuring out how to make progress on this proposal. |
Have you also considered to include a WASM interpreter inside the smart contract? The main question that I see is performance. My first guess would be to try and benchmark WAMR in the "fast interpreter" setting.
source: https://bytecodealliance.github.io/wamr.dev/blog/introduction-to-wamr-running-modes/ (How they achieved the fast interpretation is an interesting read, too: https://www.intel.com/content/www/us/en/developer/articles/technical/webassembly-interpreter-design-wasm-micro-runtime.html) |
@jakmeier thank you for the suggestion and references. We are looking into it and will try to come up with some benchmarks. |
Interpreting wasm inside wasm
|
Assessing
|
Thanks @mooori for the detailed analysis! I'm glad you checked if this is an option but I can see this is not really viable. At least for Aurora. Other use cases that want sync execution might still be able to use it, I think. offtopic/FYI: Your research regarding performance of different interpreters is also useful for considerations if we should move to a interpreter for validators, which is listed as an option in near/nearcore#9379 |
This work has been put on hold for now. Closing this NEP and we can open a new one if the work resumes. |
This NEP proposes to enable the synchronous execution of wasm submodules in the NEAR runtime.
Context
Acknowledgements
This is a team effort of Aurora and we thank Pagoda and community members for their input in previous conversations.
NEP Status (Updated by NEP moderators)
SME reviews:
Protocol WG voting indications:
Voting hasn't started yet.