-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a SSE2 fast path for AMD GPU #827
base: master
Are you sure you want to change the base?
Conversation
This saves memory because of the differnce in structure padding. As a side effect storage of unused fields has been removed from this to save more time and effort.
I can see moving from AOS to SOA making a difference, but. Does the SSE2 stuff actually make any difference? I am guessing not? Compilers are good at vectorizing in $CURRENT_YEAR (pls provide numbers + compile flags you used) |
uint16_t soc_temp_c; | ||
uint16_t gpu_temp_c; | ||
uint16_t apu_cpu_temp_c; | ||
#ifdef AMG_GPU_TEMP_MONITORING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMG? Did you mean AMD?
# Check for SSE2 | ||
if cc.compiles('''#include <emmintrin.h> | ||
int main() { | ||
__m128 v1 = _mm_set1_ps(-1.0f); | ||
__m128 v2 = _mm_set1_ps(1.0f); | ||
v1 = _mm_add_ps(v1, v2); | ||
float sum[4]; | ||
_mm_store_ps(sum, v1); | ||
return (int)sum[0]; | ||
}''', | ||
name : 'SSE2 support', | ||
args : '-msse2') | ||
pre_args += '-DUSE_SSE2' | ||
pre_args += '-msse2' | ||
endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use the SIMD module?
This does a couple of things:
memcpy
ing it. This saves us from some handling of fields that aren't actually exported, and is a bit less future maintenance.Open questions with this work: