-
-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range insertion speeds up generate_primes #123
Comments
You are right, this should theoretically be faster since the current code does a size check for each inserted prime while your range insert code only does a size check for each inserted range of primes. Funnily I tried exactly the same code a few weeks ago, then I integrated this new libprimesieve version into my primecount project and ran the primecount benchmarks. However I was not able to measure any speedup, primecount uses mostly There is another issue: |
I tested with Whether the standard guarantees that conversion is handled correctly - I tried to dig in here. What I got from the standard (which I'm hopefully reading correctly) is that the handling of But other cases might not work. If you want to support destination vectors with custom allocators then my impression is that a caller could be using |
I have checked the STL template<typename _Tp, typename _Alloc>
template<typename _InputIterator>
void vector<_Tp, _Alloc>::_M_range_insert(iterator __pos, _InputIterator __first, _InputIterator __last, std::input_iterator_tag)
{
if (__pos == end())
{
// This does the actual insert for our use case
for (; __first != __last; ++__first)
insert(end(), *__first);
}
template<typename _Tp, typename _Alloc>
typename vector<_Tp, _Alloc>::iterator
vector<_Tp, _Alloc>::
// The value_type is different from _InputIterator, hence it seems to me
// that there is an implicit conversion happening here from _InputIterator (uint64_t)
// to value_type of the std::vector which could be e.g. uint32_t. This could
// cause a compiler warning depending on the compiler & compiler flags.
insert(const_iterator __position, const value_type& __x)
{
// ...
} If we want to guarantee that our code cannot cause any warnings then we can only use your The question is now whether the added complexity is worth the effort. Can you please post your benchmark test program? I want to run it on my latest Intel Alder Lake CPU and check for myself how much faster it is... |
Another option that just came to my mind would be to disable implicit conversion warnings using E.g. for MSVC we could use: // Disable warning C4244: conversion from X to Y, possible loss of data
#pragma warning(push)
#pragma warning(disable : 4244)
vect.insert(vect.end(), arr, arr + size);
#pragma warning(pop) The latest versions of GCC & Clang do not warn even using |
I have merged your feature request! Would you mind rerunning your benchmark using the latest libprimesieve? |
Thanks! That looks good to me and gave the expected speedup in my benchmark. For reference, the benchmark I used was (compiled with clang 12, flags included -O3 -march=native, running on an Azure cloud machine with a Xeon Platinum 8171M CPU running Ubuntu 20.04):
|
I noticed a roughly 10% speedup for building vectors of primes when I use range insertion rather than copying one prime at a time. (Full disclosure that I only timed it on one system, one compiler, and one set of parameters.) I did this with a modified version of store_primes from the current master branch:
This would speed up generate_primes and something similar would presumably speed up generate_n_primes. I suspect it's not the best thing that could be done -- in particular, if the iterator could be forced to put its primes directly into the target vector then the copy could be eliminated entirely -- but since this is a fairly small change I wanted to go ahead and share.
The text was updated successfully, but these errors were encountered: