Add nmod_vec_invert: invert an array of nmod coefficients#2432
Add nmod_vec_invert: invert an array of nmod coefficients#2432vneiger merged 7 commits intoflintlib:mainfrom
Conversation
|
On my list to finalize this:
|
- reduce length of temporary vector to 3*n instead of 4*n - check if len == 1 and do naive approach in that case
|
This PR is ready for review. Performance on three machines attached. On recent-ish machines, the non naive approach is basically always interesting. On older machines with a slower integer division, the gain is less significant, and the naive approach would actually be faster for moduli of very small bitsizes (up to bitsize 5 or so). This was not enough to convince me that having thresholds for this could be useful, but this can be discussed. |
Cool! I think it is okay, Cascade Lake is 6 years old anyway. |
Co-authored-by: Albin Ahlbäck <albin.ahlback@gmail.com>
|
This is a nice speedup. Do you have an application? I know some functions where we need to construct [1, 1/2, 1/3, 1/4, ...], but this is a special case where one should be able to do a bit better than a general algorithm (there is a slightly-less than naive |
This appears in functions for Cauchy / Cauchy-like matrices, but I'm not sure about any plans for such structured linear algebra being in FLINT in the near future. Recently I needed this when writing draft code for rational reconstruction, which I would like to add to FLINT in the somehow-near future. More precisely this was the Cauchy interpolation case of rational reconstruction (as in [von zur Gathen and Gerhard, Section 5.8]), where I had to invert a bunch of evaluations of a polynomial. This might also be useful more generally for related algorithms that compute with 2 x 2 univariate polynomial matrices like the half-gcd or Padé approximation, when multiplications are done through FFT evaluation-interpolation. |
|
This is also useful for BSGS algorithms. Many times one needs to patch the output of the main loop with an inverse coming from a known sequence apriori. |
Thanks, that's good to know. I don't have more to add to this PR, and regarding the initial PR message, I am not sure about adding a function like because I don't have a use case in mind. Unless someone has suggestions, for me this is ready for merge. |
|
Thanks! |
This inverts each entry in an
nmod_vec, doing mostly multiplications instead of inversions.See the table below for efficiency comparisons.
In short, this is in most cases significantly faster than the naive approach which uses
nmod_invrepeatedly. When the vector length grows, the speed-up factor becomes close to the ratio between the time spent in a modular inversion vs. the time spent in a modular multiplication. The table below includes cases with factors beyond 20.The only exceptions for "faster" are for very small moduli (like bitsize 2 or 3) since inversion is then quite fast; and for very small vector lengths. But already for bitsize 3 and length 5, this starts to be beneficial.
And actually, for not-too-small moduli (say bitsize 16 or more), this has an interesting speed-up already for very small lengths (more than 1.5x for length 2, more than 2x for length 3)... if this operation of inverting 2 or 3 (or a small number) of elements is not rare (I'm not sure about this), it could make sense to add some function for this in
nmod? Something like:Any insight about this is welcome before this PR gets finalized.