-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: streamline contract caching code #7851
Conversation
vm.engine | ||
.load_universal_executable(&executable) | ||
.map(|v| Arc::new(v) as _) | ||
.map_err(|err| panic!("could not load the executable: {}", err.to_string())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very misleading: this is not a map_err
, it's effectively an .unwrap_or_else
. I think this is correct code, but it is no the non-prod path where the Cache is None.
Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)), | ||
Err(err) => { | ||
let err = CompilationError::WasmerCompileError { msg: err.to_string() }; | ||
cache_error(&err, key, cache)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here's the "cached" path counterpart of
https://github.com/near/nearcore/pull/7851/files#r997413310
I change the behavor here to return a CacheError, rather than cache error. I think this is what we should do (ie, the current code on master is buggy). load_universal_executable
fails only if we fail to allocate memory, and that's definitely not an error we should cache (@nagisa coudl you double-check this?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I believe load_universal_executable
can only fail for obscure resource reasons. The ref
variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.
Ok(artifact) => Ok(Ok(Arc::new(artifact) as _)), | ||
Err(err) => { | ||
let err = CompilationError::WasmerCompileError { msg: err.to_string() }; | ||
cache_error(&err, key, cache)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I believe load_universal_executable
can only fail for obscure resource reasons. The ref
variant can additionally fail if the backing serialized storage is corrupted in some way, but that’s also something we shouldn’t be caching I feel.
let stored_artifact: Option<VMArtifact> = match cache_record { | ||
None => None, | ||
Some(CompiledContract::CompileModuleError(err)) => return Ok(Err(err)), | ||
Some(CompiledContract::Code(serialized_module)) => { | ||
unsafe { | ||
// (UN-)SAFETY: the `serialized_module` must have been produced by a prior call to | ||
// `serialize`. | ||
// | ||
// In practice this is not necessarily true. One could have forgotten to change the | ||
// cache key when upgrading the version of the wasmer library or the database could | ||
// have had its data corrupted while at rest. | ||
// | ||
// There should definitely be some validation in wasmer to ensure we load what we think | ||
// we load. | ||
let executable = | ||
UniversalExecutableRef::deserialize(&serialized_module) | ||
.map_err(|_| CacheError::DeserializationError)?; | ||
let artifact = self | ||
.engine | ||
.load_universal_executable_ref(&executable) | ||
.map(Arc::new) | ||
.map_err(|_| CacheError::DeserializationError)?; | ||
Some(artifact) | ||
} | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pattern seems to be shared between VMs and feels somewhat error-prone. Can this be abstracted into a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but I decided not to, at least not in this PR:
- smaller conceptual diff
- we probably won't be touching wasmer1 code much (eg, we might want change wasmer2 behavior here under protocl feature, but not wasmer1)
- with wasmer two, we have separate code paths fro Executable vs ExecutableRef, which I think raises the cost of abstraction here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems fine to not roll up that into this PR at least...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I was thinking more about the match { None => None, Some(...) => Ok(Err(err)), _ => whatever }
part. I was thinking that the matching on the structure could go away (much like e.g. map_err
/transpose
/etc convenience methods do for Result
) and the VM-specific portion could be just a closure or some such thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, yeah, we can I guess add some helper to CompiledContractCache, will note in the relevant issue
@jakmeier could you take a look here? This is super tricky (though hopefully less than it used to be), I'd love some extra hand-holding! (and feel free to slap auto-merge if it looks ok) |
Heh, the test failure here is an prime example why global state is bad ^^ The failing tests creates two runtimes, and checks that contacts are properly compiled when one runtime syncs two another. The two runtimes have separate db-level caches (b/c they are explicitly passed in), but share the same in-memory cache (b/c its a static, duh). |
This does two rather big code motions at the same time (sorry!): * precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation * code to deal with our two-layered caching is moved from `cache.rs` into impls of VM
This happens either due to resource exhaustion, or when zero-copy deserialized data is invalid
This fixes `test_two_deployments`. The fix is to ensure that `precompile` *only* compiles and caches the contract, without loading it. This reshuffles error yet again, as `LoadingError` moves from `CacheError` to `VMRunnerError`, where it actually belongs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this all looks good to me. Definitely like the changes here for code readibilty!
This does two rather big code motions at the same time (sorry!): * precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation * code to deal with our two-layered caching is moved from `cache.rs` into impls of VM
This does two rather big code motions at the same time (sorry!): * precompilation code is moved from "let's match on VMKind" style to "let's call Runtime's virtual method". To do so, we re-purpose existing `precompile` functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilation * code to deal with our two-layered caching is moved from `cache.rs` into impls of VM
This does two rather big code motions at the same time (sorry!):
precompile
functions in the trait. Remember, those fns are essentially dead code from "pipelined" compilationcache.rs
into impls of VM