Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX2 Support #431

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

wx257osn2
Copy link
Contributor

  • Add ADA_AVX2 macro in include/ada/common_defs.h enabled if defined(__AVX2__)
  • Implement has_tabs_or_newline using AVX2

@lemire
Copy link
Member

lemire commented Jun 4, 2023

I built this PR while passing -mavx2 to the compiler. Note that, right now, this optimization won't build otherwise (AVX2 is a compile time flag that is not defined by default).

I use GCC 11 on an Ice Lake processor.

Before this PR:

BasicBench_AdaURL_aggregator_href   23545729 ns     23504034 ns           30 GHz=3.18576 cycle/byte=8.54734 cycles/url=742.415 instructions/byte=25.7505 instructions/cycle=3.0127 instructions/ns=9.59773 instructions/url=2.23667k ns/url=233.042 speed=369.643M/s time/byte=2.70532ns time/url=234.982ns url/s=4.25565M/s
BasicBench_AdaURL_aggregator_href   23525308 ns     23484552 ns           30 GHz=3.18608 cycle/byte=8.54269 cycles/url=742.011 instructions/byte=25.7505 instructions/cycle=3.01433 instructions/ns=9.60391 instructions/url=2.23667k ns/url=232.892 speed=369.949M/s time/byte=2.70307ns time/url=234.787ns url/s=4.25918M/s
BasicBench_AdaURL_aggregator_href   23436257 ns     23393300 ns           30 GHz=3.18577 cycle/byte=8.5561 cycles/url=743.176 instructions/byte=25.7519 instructions/cycle=3.00977 instructions/ns=9.58844 instructions/url=2.23679k ns/url=233.28 speed=371.392M/s time/byte=2.69257ns time/url=233.875ns url/s=4.2758M/s

After this PR:

BasicBench_AdaURL_aggregator_href   23419502 ns     23375945 ns           30 GHz=3.18859 cycle/byte=8.75773 cycles/url=760.689 instructions/byte=25.7505 instructions/cycle=2.94032 instructions/ns=9.37548 instructions/url=2.23667k ns/url=238.566 speed=371.668M/s time/byte=2.69057ns time/url=233.701ns url/s=4.27897M/s
BasicBench_AdaURL_aggregator_href   23561403 ns     23517519 ns           30 GHz=3.18576 cycle/byte=8.56581 cycles/url=744.02 instructions/byte=25.7519 instructions/cycle=3.00636 instructions/ns=9.57755 instructions/url=2.23679k ns/url=233.545 speed=369.431M/s time/byte=2.70687ns time/url=235.116ns url/s=4.25321M/s
BasicBench_AdaURL_aggregator_href   23543378 ns     23502914 ns           30 GHz=3.18865 cycle/byte=8.56245 cycles/url=743.728 instructions/byte=25.7519 instructions/cycle=3.00754 instructions/ns=9.58998 instructions/url=2.23679k ns/url=233.242 speed=369.66M/s time/byte=2.70519ns time/url=234.97ns url/s=4.25586M/s

So I am not saying any robust difference. It is possible that I made a mistake, but we need quantified benefits one way or another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants