-
Notifications
You must be signed in to change notification settings - Fork 264
`is_x86_feature_detected` in Rust SGX SDK
Crates often use is_x86_feature_detected
to select appropriate implementations (such as AVX/SSE/SSSE/FMA). It triggers cpuid
instruction in default libstd
implementation on x86_64. We want to avoid such kind of SGX in-compatible instructions and unnecessary AEX events.
We found that Intel's SDK initializes its optimized libraries in a way of:
- initialize a global cpu feature indicator by enclave initialization parameter in urts
//Since CPUID instruction is NOT supported within enclave, we enumerate the cpu features here and send to tRTS.
get_cpu_features(&info.cpu_features);
get_cpu_features_ext(&info.cpu_features_ext);
init_cpuinfo((uint32_t *)info.cpuinfo_table);
- Initialize optimized libraries according to the global cpu feature indicator in trts
// optimized libs
if (SDK_VERSION_2_0 < g_sdk_version || sys_features.size != 0)
{
if (0 != init_optimized_libs(cpu_features, (uint32_t*)sys_features.cpuinfo_table, xfrm))
{
return -1;
}
}
We found that in init_optimized_libs
, a global variable g_cpu_feature_indicator
is initialized to store the feature_bit_array
which contains everything we need!
static int set_global_feature_indicator(uint64_t feature_bit_array, uint64_t xfrm) {
......
g_cpu_feature_indicator = feature_bit_array;
return 0;
}
Since Rust SGX SDK depends on trts, we can simply re-use the g_cpu_feature_indicator
and simulate the is_x86_feature_detected
macro easily! First we import the value from trts:
#[link(name = "sgx_trts")]
extern {
static g_cpu_feature_indicator: uint64_t;
static EDMM_supported: c_int;
}
#[inline]
pub fn rsgx_get_cpu_feature() -> u64 {
unsafe { g_cpu_feature_indicator }
}
Then parse g_cpu_feature_indicator
like std_detect:
#[macro_export]
macro_rules! is_cpu_feature_supported {
($feature:expr) => ( (($feature & $crate::enclave::rsgx_get_cpu_feature()) != 0) )
}
#[macro_export]
macro_rules! is_x86_feature_detected {
("ia32") => {
$crate::cpu_feature::check_for($crate::cpu_feature::Feature::ia32)
};
...
}
We observed that some crates (such as matrixmultiply) are likely to use the highest level of instructions for speed up. But it may not be the best solution. For example, the "machine-learning" SGX sample depends on rusty-machine and matrixmultiply, which intend to use AVX instruction if supported. However, if we use the "fallback" mode, it'll be about 10x faster than the AVX version. The AVX optimiztion is pretty complicated and I have no time to read Intel's Intel® 64 and IA-32 Architectures Optimization Reference Manual. And I don't think either of crate's owner or llvm backend can optimize it ideally. I recommend to choose the appropirate instruction set per workload.