Begin migrate ScalarQuantizer to simdlib #3613

mdouze · 2024-07-05T13:49:54Z

Summary:
As a demo for Mengdi.

The steps to fully migrate to simdlib are:

change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode.
make sure it also compiles on ARM
see which functions can be mirgrated to only use the generic codepath
benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support)

Differential Revision: D59395882

facebook-github-bot · 2024-07-05T13:50:16Z

This pull request was exported from Phabricator. Differential Revision: D59395882

facebook-github-bot · 2024-07-05T14:00:55Z

This pull request was exported from Phabricator. Differential Revision: D59395882

Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) Differential Revision: D59395882

Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882

facebook-github-bot · 2024-07-05T14:08:03Z

This pull request was exported from Phabricator. Differential Revision: D59395882

alexanderguzhva · 2024-07-05T20:01:43Z

@mdouze Do you have any plans to support ARM SVE, if possible? The primary problem of simdlib with ARM SVE is that it implies SIMD registers of a variable size. Technically, there are two the popular models on the market: Amazon Graviton 3 with SIMD width 256b and an upcoming Graviton 4 with SIMD with 512b, so maybe one could stick with 256 bits for now.

mdouze · 2024-07-29T07:47:19Z

@alexanderguzhva IMO it would be great to support SVE.
What I don't understand is if the SVE size needs to be known at compile time. In that case, we could just add it as another SIMD compile for the 256 and 512 versions.

alexanderguzhva · 2024-07-29T17:39:15Z

@mdouze Yes, the SVE size is known at the compile time. Usually, it is done via svcntb() instruction. The PROBLEM is that for x86 you can have registers, such as __m256, to be a part of a class or struct, but you cannot have SVE registers such as svuint8_t to be so. This will trigger a compiler error O_o. So, you will have to use workarounds, such as keeping std::uint8_t tmp[16]; inside your simdlib for SVE256, and do loads / stores between a register and a buffer. I'm not sure how compiler will be able to optimize it, I hope it will be.

alexanderguzhva · 2024-08-05T20:43:02Z

what is the status of this diff? Should I wait before I bring some updates to ScalarQuantizer?

mengdilin · 2024-08-20T18:26:56Z

@alexanderguzhva I'm starting to work on this but it's gonna take some time. If you want to make your changes in now, feel free to and I can work on refactoring later down the line

alexanderguzhva · 2024-08-23T17:03:44Z

@mengdilin any time estimates on your end? Basically, are you in a stage where you know what to do exactly or are you in a research stage?

Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882

mengdilin · 2024-08-26T16:53:56Z

@alexanderguzhva I think I can finish up AVX2/Neon in ScalarQuantizer around October (have other work items at hand atm). My understanding here is I should move the respective parts of AVX2 and Neon code in ScalarQuantizer into faiss/utils/simdlib_avx2.h and faiss/utils/simdlib_neon.h as part of my SIMD ramp-up. I've made some progress on the refactor, but I have not thought about how simdlib can be extended to support SVE. Before committing my progress, I'm building out a performance regression test suites that can ensure my changes don't introduce regressions across AVX2, Neon, and no optimizations.

I'm a SIMD noob here. Let me know if I'm moving in the right direction for the refactor or if I'm missing anything major.

Summary: Pull Request resolved: facebookresearch#3613 As a demo for Mengdi. The steps to fully migrate to simdlib are: 1. change all function interfaces to use the generic simd8float32 and friends prototypes -- make sure it compiles on fbcode. 2. make sure it also compiles on ARM 3. see which functions can be mirgrated to only use the generic codepath 4. benchmark if the simd emulated path is competitve with the scalar (for platforms without specific SIMD support) The rationale here is that there are many SIMD instructions that are straightforward, like adding or subtracting registers, they can be put in common between implementations. The only code that may remain with arch-specific intrinsics is where they way of doing things is very different between AVX and NEON. Differential Revision: D59395882

facebook-github-bot added the CLA Signed label Jul 5, 2024

facebook-github-bot added the fb-exported label Jul 5, 2024

mdouze force-pushed the export-D59395882 branch from 6c98579 to 8e2068a Compare July 5, 2024 14:01

mdouze force-pushed the export-D59395882 branch from 8e2068a to 05027cd Compare July 5, 2024 14:08

junjieqi added the Performance label Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Begin migrate ScalarQuantizer to simdlib #3613

Begin migrate ScalarQuantizer to simdlib #3613

mdouze commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

alexanderguzhva commented Jul 5, 2024

mdouze commented Jul 29, 2024

alexanderguzhva commented Jul 29, 2024 •

edited

Loading

alexanderguzhva commented Aug 5, 2024

mengdilin commented Aug 20, 2024

alexanderguzhva commented Aug 23, 2024

mengdilin commented Aug 26, 2024

Begin migrate ScalarQuantizer to simdlib #3613

Are you sure you want to change the base?

Begin migrate ScalarQuantizer to simdlib #3613

Conversation

mdouze commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

facebook-github-bot commented Jul 5, 2024

alexanderguzhva commented Jul 5, 2024

mdouze commented Jul 29, 2024

alexanderguzhva commented Jul 29, 2024 • edited Loading

alexanderguzhva commented Aug 5, 2024

mengdilin commented Aug 20, 2024

alexanderguzhva commented Aug 23, 2024

mengdilin commented Aug 26, 2024

alexanderguzhva commented Jul 29, 2024 •

edited

Loading