Releases: DLTcollab/sse2neon
Releases · DLTcollab/sse2neon
v1.7.0
What's Changed
- refactor: Add missing ARM64 implementation by @howjmay in #576
- test: Build/run with crypto and/or crc by @howjmay in #574
- doc: Describe the right coverage of SSE2NEON_PRECISE_MINMAX by @howjmay in #578
- refactor: Reimplement _mm_movelh_ps for Arm64 by @howjmay in #579
- tests: Cover all immediate numbers by @howjmay in #584
- test: Use macro for validate results by @howjmay in #585
- Improve precision of mm{rsqrt,sqrt,rcp,div}_{ps,ss} conversions by @Cuda-Chen in #580
- Fix MSVC compile issues by @toxieainc in #588
- Tweak MSVC ifdef guard for _BitScanForward64 by @aqrit in #592
- Add notice that NEON handles certain IEEE single-precision values by @Cuda-Chen in #593
- Add infinity test in
test_mm_{max,min}_{pd,sd}
by @Cuda-Chen in #594 - Remove Kahan algorithm in
_mm_dp_ps
by @Cuda-Chen in #597 - MSVC support by @anthony-linaro in #596
- test: Cover all the valid imm range in tests by @howjmay in #586
- Add test running for MSVC to CI by @anthony-linaro in #598
- Align result to SSE when input is 0.0f/-0.0f in mm_rsqrt{ps, ss} by @Cuda-Chen in #599
- fix: Fix exceeding width of type warning by @howjmay in #601
- docs: Fix the typos by @howjmay in #603
- docs: Fix the typos by @spacemiqote in #605
- Fix build for gcc-13 and 32 bit arm systems. by @balister in #609
- Fix unused parameters warning by @anakinxc in #610
- Fixed gcc strict prototype and other build errors by @mnjdhl in #611
- Fix
_mm_cmplt_sd
and_mm_cmpnlt_sd
test cases by @Cuda-Chen in #612 - disambiguate vector type to avoid errors depending on lax conversion … by @JoachimSchurig in #614
- docs: fix typo failback by @howjmay in #616
- Introduce fast and deterministic RNG by @Cuda-Chen in #615
- fix: Fix typo nand by @howjmay in #617
- fix: Fix MSVC warnings by @howjmay in #604
- Add A32 support in CI by @Cuda-Chen in #620
- Fix _mm_test_mix_ones_zeros and _mm_testnzc_si128 by @aqrit in #621
New Contributors
- @anthony-linaro made their first contribution in #596
- @spacemiqote made their first contribution in #605
- @anakinxc made their first contribution in #610
- @mnjdhl made their first contribution in #611
- @JoachimSchurig made their first contribution in #614
Full Changelog: v1.6.0...v1.7.0
v1.6.0
What's Changed
- 100% intrinsics coverage for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AES extension.
- Implement
_rdtsc
by @Cuda-Chen in #532 - Improve
_mm_srai_epi32
to handle complex arguments by @Developer-Ecosystem-Engineering in #533 - Implement
_mm_cmpestri
and_mm_cmpestrm
by @Cuda-Chen in #534 - Implement five
_mm_cmpestr
by @Cuda-Chen in #552 - Implement
_mm_cmpistri
and_mm_cmpistrm
by @Cuda-Chen in #553 - Implement five
_mm_cmpistr
by @Cuda-Chen in #555 - tests: Fix warnings raised by clang++ by @Cuda-Chen in #540
- Exclude
_mm_malloc
/free
definitions on Windows by @invertego in #541 - Remove designated initialization of an array by @invertego in #542
- Reintroduce
ext
-based implementations for shift intrinsics by @AymenQ in #543 - Improve performance of float-to-integer intrinsics by @AymenQ in #546
- Support
__builtin_shuffle
as an alternative to__builtin_shufflevector
by @AymenQ in #545 - Improve performance of various intrinsics by @AymenQ in #549
- Vectorize
_mm_minpos_epu16
by @AymenQ in #551 - Align
_mm_prefetch
behavior to document by @howjmay in #550 - Add clang/Windows build by @invertego in #556
- Test all valid immediates in
_mm_dp_pd
by @Cuda-Chen in #557 - Optimize
_mm_aesenclast_si128
for Arm64 by @howjmay in #561 - Implement
_mm_aesdec_si128
by @howjmay in #559 - Implement
_mm_aesdeclast_si128
by @howjmay in #565 - Implement
_mm_aesimc_si128
by @howjmay in #567 - Optimize
aeskeygenassist_si128
for Arm64 by @howjmay in #569 - Update Intel intrinsics document links by @howjmay in #570
New Contributors
- @Cuda-Chen made their first contribution in #532
- @Developer-Ecosystem-Engineering made their first contribution in #533
- @balister made their first contribution in #535
- @invertego made their first contribution in #541
- @AymenQ made their first contribution in #543
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- fix: Fix dividing zero error in validateFloatError by @howjmay in #515
- Fix compilation with standardized C compilers by @jserv in #516
- Fix _mm_storel_epi64 by @andrewevstyukhin in #517
- Add support for 32-bit targets on ARMv8 architectures by @jonathanhue in #520
- Use CRC and directed rounding intrinsics on A32 by @jonathanhue in #522
- fix: Fix alignment in tests by @howjmay in #523
New Contributors
- @sleepybishop made their first contribution in #508
- @luzpaz made their first contribution in #509
- @andrewevstyukhin made their first contribution in #517
- @jonathanhue made their first contribution in #520
Full Changelog: v1.5.0...v1.5.1