Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cranelift-wasm: Only allocate if vectors need bitcasts #4543

Merged
merged 2 commits into from
Jul 27, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 73 additions & 44 deletions cranelift/wasm/src/code_translator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ use cranelift_codegen::ir::{
};
use cranelift_codegen::packed_option::ReservedValue;
use cranelift_frontend::{FunctionBuilder, Variable};
use itertools::Itertools;
use smallvec::SmallVec;
use std::cmp;
use std::convert::TryFrom;
Expand Down Expand Up @@ -540,10 +541,7 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
};
{
let return_args = state.peekn_mut(return_count);
let return_types = wasm_param_types(&builder.func.signature.returns, |i| {
environ.is_wasm_return(&builder.func.signature, i)
});
bitcast_arguments(return_args, &return_types, builder);
bitcast_wasm_returns(environ, return_args, builder);
match environ.return_mode() {
ReturnMode::NormalReturns => builder.ins().return_(return_args),
ReturnMode::FallthroughReturn => {
Expand Down Expand Up @@ -575,13 +573,13 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
let (fref, num_args) = state.get_direct_func(builder.func, *function_index, environ)?;

// Bitcast any vector arguments to their default type, I8X16, before calling.
let callee_signature =
&builder.func.dfg.signatures[builder.func.dfg.ext_funcs[fref].signature];
let args = state.peekn_mut(num_args);
let types = wasm_param_types(&callee_signature.params, |i| {
environ.is_wasm_parameter(&callee_signature, i)
});
bitcast_arguments(args, &types, builder);
bitcast_wasm_params(
environ,
builder.func.dfg.ext_funcs[fref].signature,
args,
builder,
);

let call = environ.translate_call(
builder.cursor(),
Expand Down Expand Up @@ -612,12 +610,8 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
let callee = state.pop1();

// Bitcast any vector arguments to their default type, I8X16, before calling.
let callee_signature = &builder.func.dfg.signatures[sigref];
let args = state.peekn_mut(num_args);
let types = wasm_param_types(&callee_signature.params, |i| {
environ.is_wasm_parameter(&callee_signature, i)
});
bitcast_arguments(args, &types, builder);
bitcast_wasm_params(environ, sigref, args, builder);

let call = environ.translate_call_indirect(
builder,
Expand Down Expand Up @@ -3024,40 +3018,75 @@ fn pop2_with_bitcast(
(bitcast_a, bitcast_b)
}

/// A helper for bitcasting a sequence of values (e.g. function arguments). If a value is a
/// vector type that does not match its expected type, this will modify the value in place to point
/// to the result of a `raw_bitcast`. This conversion is necessary to translate Wasm code that
/// uses `V128` as function parameters (or implicitly in block parameters) and still use specific
/// CLIF types (e.g. `I32X4`) in the function body.
pub fn bitcast_arguments(
arguments: &mut [Value],
expected_types: &[Type],
builder: &mut FunctionBuilder,
) {
assert_eq!(arguments.len(), expected_types.len());
for (i, t) in expected_types.iter().enumerate() {
if t.is_vector() {
fn bitcast_arguments<'a>(
builder: &FunctionBuilder,
arguments: &'a mut [Value],
params: &[ir::AbiParam],
param_predicate: impl Fn(usize) -> bool,
) -> Vec<(Type, &'a mut Value)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it help at all to use a SmallVec here, tuned to some size that captures most cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about using SmallVec but figured first I'd try the experiment that doesn't require any tuning. I'm not sure: how would you suggest picking a size? In the past I've usually picked however many elements will fit in two usize, because that's the minimum size the inline part of a SmallVec will consume anyway. But in this case that's 1, which hardly seems worth doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SmallVec in this case will be on-stack so we can be a bit looser with its size I think (and sub rsp, 128 is almost always faster than malloc(128)). For some cases like this I have just picked a reasonable value but to be more objective about it we could try 1, 4, 16 and see what the difference is...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the intgemm-simd benchmark, Sightglass says plain Vec is 53-73% faster than a size-16 SmallVec, and 8-22% faster than a size-4 SmallVec. I don't understand that result, do you?

Unless there's a simple explanation I'm not seeing, I'm inclined to not think further about SmallVec right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's somewhat surprising, but I can hypothesize at least one potential explanation: if the return value is almost always an empty list, then the empty SmallVec takes more stack space than an empty Vec and hence creates a little more spread in the memory usage (so bad cache effects). I'm stretching a bit with that though.

Anyway if it's rare enough then a simple Vec is totally fine here; it's certainly better than what we had before!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to give size-1 SmallVec a shot too for completeness: if it's a matter of different stack frame sizes, then that should be no worse than Vec. But Vec was 44-58% faster on intgemm-simd than even size-1 SmallVec. So I remain thoroughly puzzled. 🤷

let filtered_param_types = params
.iter()
.enumerate()
.filter(|(i, _)| param_predicate(*i))
.map(|(_, param)| param.value_type);

// zip_eq, from the itertools::Itertools trait, is like Iterator::zip but panics if one
// iterator ends before the other. The `param_predicate` is required to select exactly as many
// elements of `params` as there are elements in `arguments`.
let pairs = filtered_param_types.zip_eq(arguments.iter_mut());

// The arguments which need to be bitcasted are those which have some vector type but the type
// expected by the parameter is not the same vector type as that of the provided argument.
pairs
.filter(|(param_type, _)| param_type.is_vector())
.filter(|(param_type, arg)| {
let arg_type = builder.func.dfg.value_type(**arg);
assert!(
builder.func.dfg.value_type(arguments[i]).is_vector(),
arg_type.is_vector(),
"unexpected type mismatch: expected {}, argument {} was actually of type {}",
t,
arguments[i],
builder.func.dfg.value_type(arguments[i])
param_type,
*arg,
arg_type
);
arguments[i] = optionally_bitcast_vector(arguments[i], *t, builder)
}

// This is the same check that would be done by `optionally_bitcast_vector`, except we
// can't take a mutable borrow of the FunctionBuilder here, so we defer inserting the
// bitcast instruction to the caller.
arg_type != *param_type
})
.collect()
}

/// A helper for bitcasting a sequence of return values for the function currently being built. If
/// a value is a vector type that does not match its expected type, this will modify the value in
/// place to point to the result of a `raw_bitcast`. This conversion is necessary to translate Wasm
/// code that uses `V128` as function parameters (or implicitly in block parameters) and still use
/// specific CLIF types (e.g. `I32X4`) in the function body.
pub fn bitcast_wasm_returns<FE: FuncEnvironment + ?Sized>(
environ: &mut FE,
arguments: &mut [Value],
builder: &mut FunctionBuilder,
) {
let changes = bitcast_arguments(builder, arguments, &builder.func.signature.returns, |i| {
environ.is_wasm_return(&builder.func.signature, i)
});
for (t, arg) in changes {
*arg = builder.ins().raw_bitcast(t, *arg);
}
}

/// A helper to extract all the `Type` listings of each variable in `params`
/// for only parameters the return true for `is_wasm`, typically paired with
/// `is_wasm_return` or `is_wasm_parameter`.
pub fn wasm_param_types(params: &[ir::AbiParam], is_wasm: impl Fn(usize) -> bool) -> Vec<Type> {
let mut ret = Vec::with_capacity(params.len());
for (i, param) in params.iter().enumerate() {
if is_wasm(i) {
ret.push(param.value_type);
}
/// Like `bitcast_wasm_returns`, but for the parameters being passed to a specified callee.
fn bitcast_wasm_params<FE: FuncEnvironment + ?Sized>(
environ: &mut FE,
callee_signature: ir::SigRef,
arguments: &mut [Value],
builder: &mut FunctionBuilder,
) {
let callee_signature = &builder.func.dfg.signatures[callee_signature];
let changes = bitcast_arguments(builder, arguments, &callee_signature.params, |i| {
environ.is_wasm_parameter(&callee_signature, i)
});
for (t, arg) in changes {
*arg = builder.ins().raw_bitcast(t, *arg);
}
ret
}
7 changes: 2 additions & 5 deletions cranelift/wasm/src/func_translator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
//! function to Cranelift IR guided by a `FuncEnvironment` which provides information about the
//! WebAssembly module and the runtime environment.

use crate::code_translator::{bitcast_arguments, translate_operator, wasm_param_types};
use crate::code_translator::{bitcast_wasm_returns, translate_operator};
use crate::environ::{FuncEnvironment, ReturnMode};
use crate::state::FuncTranslationState;
use crate::translation_utils::get_vmctx_value_label;
Expand Down Expand Up @@ -255,10 +255,7 @@ fn parse_function_body<FE: FuncEnvironment + ?Sized>(
if !builder.is_unreachable() {
match environ.return_mode() {
ReturnMode::NormalReturns => {
let return_types = wasm_param_types(&builder.func.signature.returns, |i| {
environ.is_wasm_return(&builder.func.signature, i)
});
bitcast_arguments(&mut state.stack, &return_types, builder);
bitcast_wasm_returns(environ, &mut state.stack, builder);
builder.ins().return_(&state.stack)
}
ReturnMode::FallthroughReturn => builder.ins().fallthrough_return(&state.stack),
Expand Down