-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about batch_fitness
and SIMD
#521
Comments
@adriendelsalle Thanks for the kind words! Indeed you are correct, the data layout was thought with thread/process-based BFEs in mind. I suppose that, as the adoption of AVX512 increases, the availability of gather/scatter instructions would at least alleviate the issue. As an alternative, we could think about extending the BFE API to give the user the ability to signal how the data is stored (i.e., row-major vs column-major). Of course we would need to ensure that such extension does not break existing uses of the BFE API, which could be tricky. |
I love the idea of trying to signal/flag the layout. I'll have to take a deeper look at the project to see how I could make some relevant PR on that, it will probably take few week before I can find time to really investigate further.. but if you're fine with contributions I can give it a try ! |
@adriendelsalle of course it would be great if you wanted to take a stab at this. Feel free to ping me if you need assistance (here or on the gitter channel https://gitter.im/pagmo2/Lobby ) |
Description
This project looks really nice, thanks for that!
I have a really simple question regarding compatibility of
batch_fitness
andSIMD
computation due to the layout of the input/decision vector (and the output/fitness one).From the docs:
Is it really possible to do vectorized operations when concatenating input as described without requiring to allocate a new vector and reorder it internally before calling some simd intrinsics (probably lowering the benefits of the vectorization, or even making it slower than a naive sequential impl)?
I was expecting a contiguous storage of each input element: for a batch of size
b
, first component of the decision vector occupies the index range[0, b)
, etc.I understand this layout is handy for a multithreaded
BFE
to be able to work concurrently on different portions of the input vector. Is it really compatible withSIMD
?Thanks for your help, and sorry if I missed something :)
The text was updated successfully, but these errors were encountered: