Skip to content

Add nmod_vec_invert: invert an array of nmod coefficients#2432

Merged
vneiger merged 7 commits intoflintlib:mainfrom
vneiger:add_nmod_vec_inv
Oct 28, 2025
Merged

Add nmod_vec_invert: invert an array of nmod coefficients#2432
vneiger merged 7 commits intoflintlib:mainfrom
vneiger:add_nmod_vec_inv

Conversation

@vneiger
Copy link
Collaborator

@vneiger vneiger commented Oct 27, 2025

This inverts each entry in an nmod_vec, doing mostly multiplications instead of inversions.

See the table below for efficiency comparisons.

In short, this is in most cases significantly faster than the naive approach which uses nmod_inv repeatedly. When the vector length grows, the speed-up factor becomes close to the ratio between the time spent in a modular inversion vs. the time spent in a modular multiplication. The table below includes cases with factors beyond 20.

The only exceptions for "faster" are for very small moduli (like bitsize 2 or 3) since inversion is then quite fast; and for very small vector lengths. But already for bitsize 3 and length 5, this starts to be beneficial.

And actually, for not-too-small moduli (say bitsize 16 or more), this has an interesting speed-up already for very small lengths (more than 1.5x for length 2, more than 2x for length 3)... if this operation of inverting 2 or 3 (or a small number) of elements is not rare (I'm not sure about this), it could make sense to add some function for this in nmod? Something like:

void nmod_inv2(ulong * inv_a1, ulong * inv_a2, ulong a1, ulong a2, nmod_t mod)
void nmod_inv3(ulong * inv_a1, ulong * inv_a2, ulong * inv_a3, ulong a1, ulong a2, ulong a3, nmod_t mod)

Any insight about this is welcome before this PR gets finalized.

╰─ ./p-invert
unit: all measurements in c/l (up to constant multiplicative factor)
profiled: naive | precomp shoup | generic
bit/len 1               2               3               4               5               6               7               8               9               10              11              12              13              14              15              16              1024            65536
2       0.26|0.56|0.45  0.22|0.36|0.33  0.22|0.30|0.30  0.20|0.28|0.29  0.20|0.25|0.30  0.19|0.24|0.30  0.19|0.23|0.29  0.18|0.23|0.29  0.18|0.21|0.28  0.18|0.21|0.29  0.18|0.22|0.30  0.19|0.21|0.28  0.18|0.21|0.29  0.18|0.20|0.28  0.17|0.20|0.28  0.19|0.21|0.29  0.17|0.18|0.28  0.17|0.18|0.28
3       0.33|0.72|0.53  0.32|0.45|0.40  0.31|0.36|0.34  0.29|0.32|0.34  0.28|0.28|0.33  0.27|0.27|0.34  0.27|0.25|0.31  0.27|0.25|0.31  0.28|0.22|0.30  0.27|0.22|0.31  0.27|0.23|0.31  0.27|0.22|0.31  0.27|0.22|0.30  0.27|0.21|0.29  0.27|0.21|0.31  0.27|0.21|0.30  0.25|0.18|0.28  0.24|0.18|0.28
4       0.38|0.80|0.60  0.37|0.49|0.43  0.36|0.37|0.37  0.36|0.33|0.35  0.34|0.29|0.34  0.33|0.27|0.35  0.33|0.26|0.32  0.34|0.25|0.33  0.33|0.23|0.33  0.33|0.23|0.31  0.34|0.23|0.31  0.33|0.24|0.31  0.32|0.23|0.30  0.32|0.22|0.30  0.32|0.22|0.30  0.32|0.22|0.30  0.30|0.18|0.28  0.30|0.18|0.28
5       0.48|0.89|0.68  0.48|0.55|0.48  0.45|0.40|0.41  0.44|0.35|0.37  0.43|0.31|0.36  0.42|0.29|0.38  0.42|0.27|0.34  0.43|0.26|0.34  0.43|0.25|0.33  0.42|0.24|0.33  0.42|0.25|0.32  0.42|0.24|0.32  0.41|0.23|0.31  0.41|0.23|0.31  0.42|0.22|0.31  0.43|0.22|0.31  0.39|0.18|0.29  0.39|0.18|0.28
6       0.50|0.91|0.70  0.48|0.56|0.49  0.47|0.41|0.40  0.46|0.36|0.38  0.47|0.31|0.37  0.44|0.29|0.37  0.44|0.29|0.35  0.46|0.26|0.34  0.44|0.25|0.33  0.44|0.24|0.34  0.44|0.24|0.32  0.45|0.24|0.32  0.44|0.24|0.32  0.43|0.23|0.31  0.43|0.23|0.31  0.43|0.23|0.32  0.41|0.18|0.28  0.41|0.18|0.28
7       0.61|1.02|0.79  0.59|0.59|0.53  0.57|0.45|0.44  0.57|0.38|0.41  0.55|0.33|0.39  0.54|0.30|0.38  0.54|0.29|0.35  0.56|0.28|0.35  0.55|0.26|0.35  0.54|0.25|0.33  0.54|0.25|0.33  0.54|0.25|0.34  0.56|0.24|0.32  0.53|0.23|0.32  0.53|0.24|0.32  0.53|0.24|0.32  0.51|0.19|0.28  0.52|0.18|0.28
8       0.67|1.06|0.85  0.65|0.61|0.56  0.63|0.46|0.45  0.64|0.40|0.42  0.63|0.34|0.40  0.61|0.32|0.40  0.61|0.29|0.36  0.61|0.29|0.36  0.64|0.27|0.36  0.61|0.26|0.34  0.60|0.26|0.34  0.60|0.25|0.34  0.60|0.25|0.32  0.61|0.24|0.32  0.60|0.24|0.33  0.59|0.24|0.32  0.57|0.18|0.28  0.57|0.19|0.29
9       0.79|1.19|0.97  0.76|0.67|0.61  0.76|0.50|0.49  0.73|0.42|0.45  0.73|0.37|0.42  0.73|0.33|0.41  0.71|0.31|0.37  0.72|0.30|0.37  0.73|0.28|0.35  0.71|0.28|0.36  0.72|0.27|0.36  0.72|0.26|0.34  0.71|0.25|0.34  0.71|0.24|0.34  0.72|0.24|0.33  0.71|0.24|0.33  0.69|0.18|0.29  0.68|0.18|0.28
10      0.85|1.25|1.09  0.83|0.73|0.65  0.83|0.52|0.51  0.82|0.44|0.46  0.81|0.39|0.43  0.80|0.33|0.44  0.81|0.32|0.39  0.79|0.31|0.38  0.78|0.29|0.36  0.78|0.28|0.36  0.79|0.28|0.35  0.79|0.27|0.35  0.78|0.26|0.34  0.78|0.26|0.34  0.78|0.25|0.33  0.77|0.25|0.34  0.78|0.18|0.28  0.75|0.18|0.28
11      0.93|1.32|1.11  0.92|0.73|0.69  0.92|0.55|0.54  0.96|0.47|0.48  0.90|0.39|0.45  0.89|0.35|0.44  0.94|0.33|0.40  0.91|0.32|0.39  0.91|0.31|0.37  0.94|0.30|0.37  0.90|0.28|0.36  0.90|0.28|0.36  0.93|0.28|0.36  0.89|0.26|0.34  0.89|0.25|0.34  0.91|0.25|0.34  0.91|0.19|0.29  0.88|0.18|0.28
12      0.92|1.35|1.18  0.99|0.77|0.72  0.99|0.57|0.57  0.99|0.48|0.51  0.94|0.40|0.45  0.96|0.36|0.44  0.92|0.33|0.42  0.96|0.32|0.39  0.92|0.31|0.38  0.93|0.29|0.38  0.92|0.28|0.36  0.92|0.27|0.36  0.92|0.27|0.36  0.93|0.26|0.36  0.94|0.25|0.34  0.92|0.25|0.35  0.89|0.18|0.28  0.89|0.18|0.28
13      1.09|1.59|1.35  1.14|0.83|0.78  1.13|0.61|0.60  1.12|0.52|0.53  1.10|0.43|0.48  1.09|0.38|0.46  1.10|0.37|0.42  1.10|0.34|0.42  1.09|0.32|0.40  1.09|0.30|0.38  1.13|0.32|0.39  1.10|0.29|0.38  1.09|0.28|0.36  1.10|0.27|0.35  1.08|0.26|0.35  1.07|0.26|0.37  1.09|0.18|0.28  1.06|0.18|0.28
14      1.30|1.73|1.58  1.32|0.94|0.89  1.29|0.70|0.68  1.29|0.55|0.59  1.29|0.48|0.53  1.25|0.42|0.50  1.25|0.39|0.46  1.30|0.38|0.46  1.32|0.35|0.42  1.25|0.33|0.41  1.29|0.32|0.40  1.25|0.31|0.39  1.24|0.31|0.38  1.30|0.28|0.37  1.26|0.28|0.37  1.26|0.28|0.36  1.27|0.18|0.28  1.23|0.19|0.29
15      1.36|1.79|1.58  1.37|0.97|0.92  1.34|0.69|0.69  1.33|0.58|0.60  1.35|0.52|0.54  1.32|0.42|0.51  1.30|0.40|0.46  1.36|0.38|0.45  1.30|0.36|0.43  1.30|0.35|0.42  1.36|0.32|0.40  1.30|0.32|0.39  1.29|0.30|0.38  1.33|0.29|0.38  1.29|0.28|0.39  1.32|0.28|0.37  1.31|0.18|0.28  1.27|0.18|0.28
16      1.49|1.93|1.72  1.50|1.09|1.00  1.47|0.74|0.74  1.45|0.60|0.64  1.47|0.52|0.57  1.43|0.45|0.54  1.42|0.42|0.49  1.51|0.39|0.47  1.43|0.37|0.45  1.42|0.35|0.42  1.42|0.34|0.41  1.42|0.32|0.42  1.45|0.31|0.40  1.42|0.31|0.39  1.41|0.29|0.38  1.41|0.29|0.38  1.39|0.18|0.29  1.41|0.18|0.28
17      1.53|1.97|1.76  1.51|1.05|1.01  1.49|0.76|0.75  1.48|0.64|0.67  1.53|0.53|0.58  1.47|0.46|0.54  1.46|0.42|0.50  1.47|0.39|0.47  1.47|0.38|0.47  1.53|0.36|0.43  1.46|0.34|0.42  1.46|0.33|0.41  1.53|0.32|0.40  1.46|0.30|0.39  1.45|0.30|0.38  1.51|0.29|0.38  1.43|0.18|0.28  1.42|0.18|0.28
18      1.64|2.07|1.81  1.57|1.08|1.04  1.53|0.77|0.77  1.57|0.63|0.65  1.50|0.54|0.57  1.48|0.46|0.57  1.56|0.43|0.48  1.51|0.40|0.47  1.49|0.38|0.45  1.49|0.36|0.43  1.48|0.34|0.42  1.57|0.34|0.41  1.49|0.32|0.40  1.48|0.30|0.39  1.48|0.30|0.38  1.47|0.30|0.39  1.48|0.18|0.28  1.46|0.18|0.28
19      1.70|2.10|1.87  1.62|1.13|1.08  1.62|0.80|0.86  1.69|0.67|0.68  1.60|0.55|0.60  1.59|0.48|0.57  1.58|0.45|0.51  1.59|0.42|0.49  1.63|0.39|0.47  1.59|0.38|0.45  1.58|0.35|0.43  1.58|0.34|0.42  1.58|0.33|0.41  1.59|0.31|0.40  1.56|0.30|0.39  1.56|0.30|0.39  1.54|0.18|0.28  1.54|0.18|0.28
20      1.76|2.21|1.94  1.68|1.15|1.12  1.69|0.83|0.83  1.67|0.67|0.69  1.66|0.56|0.61  1.68|0.50|0.56  1.65|0.46|0.52  1.66|0.42|0.49  1.65|0.39|0.47  1.64|0.38|0.46  1.70|0.36|0.43  1.66|0.35|0.43  1.65|0.34|0.41  1.66|0.32|0.40  1.64|0.31|0.39  1.68|0.31|0.38  1.62|0.18|0.28  1.62|0.18|0.28
21      1.85|2.25|2.03  1.77|1.19|1.16  1.86|0.86|0.86  1.75|0.68|0.71  1.73|0.58|0.62  1.72|0.50|0.58  1.72|0.47|0.53  1.73|0.45|0.52  1.75|0.39|0.47  1.72|0.39|0.47  1.72|0.36|0.44  1.71|0.35|0.43  1.71|0.34|0.42  1.72|0.33|0.42  1.72|0.32|0.40  1.71|0.32|0.39  1.70|0.18|0.28  1.68|0.20|0.28
22      1.98|2.56|2.16  1.91|1.27|1.22  1.88|0.90|0.89  1.87|0.75|0.75  1.86|0.63|0.66  1.86|0.52|0.61  1.85|0.48|0.57  1.90|0.45|0.52  1.85|0.41|0.49  1.85|0.39|0.49  1.86|0.37|0.45  1.83|0.36|0.44  1.82|0.34|0.44  1.85|0.33|0.42  1.81|0.32|0.42  1.92|0.31|0.39  1.79|0.18|0.28  1.80|0.18|0.28
23      2.03|2.48|2.27  1.96|1.27|1.27  1.95|0.91|0.92  1.96|0.73|0.76  1.90|0.61|0.66  1.90|0.54|0.61  1.89|0.49|0.57  1.96|0.46|0.52  1.91|0.41|0.49  1.89|0.39|0.48  1.95|0.37|0.45  1.88|0.37|0.45  1.93|0.34|0.43  1.95|0.34|0.42  1.88|0.33|0.40  1.87|0.32|0.40  1.92|0.18|0.30  1.90|0.18|0.28
24      2.12|2.53|2.29  2.09|1.31|1.28  2.00|0.93|0.93  1.98|0.74|0.78  2.11|0.65|0.68  1.97|0.54|0.62  1.97|0.50|0.57  1.98|0.46|0.53  1.97|0.42|0.50  2.10|0.40|0.49  1.97|0.38|0.46  1.97|0.37|0.46  1.96|0.35|0.44  1.98|0.34|0.43  1.98|0.33|0.41  1.96|0.33|0.41  1.94|0.18|0.28  1.93|0.18|0.28
25      2.15|2.55|2.31  2.05|1.36|1.31  2.05|0.95|0.95  2.02|0.77|0.78  2.01|0.63|0.68  2.00|0.56|0.63  2.00|0.52|0.58  2.05|0.47|0.54  2.01|0.43|0.51  2.01|0.41|0.49  2.01|0.38|0.46  1.99|0.38|0.48  2.11|0.36|0.45  1.99|0.35|0.43  1.99|0.33|0.41  1.99|0.33|0.43  2.02|0.18|0.30  2.01|0.18|0.28
26      2.27|2.66|2.49  2.18|1.38|1.35  2.13|0.98|0.97  2.12|0.78|0.81  2.18|0.66|0.71  2.11|0.58|0.65  2.11|0.52|0.59  2.18|0.49|0.54  2.11|0.44|0.52  2.24|0.42|0.50  2.21|0.39|0.47  2.11|0.39|0.46  2.11|0.36|0.44  2.21|0.35|0.43  2.09|0.34|0.43  2.11|0.33|0.41  2.07|0.18|0.28  2.07|0.18|0.28
27      2.34|2.73|2.48  2.34|1.48|1.42  2.24|1.00|1.00  2.20|0.79|0.82  2.19|0.68|0.72  2.16|0.58|0.66  2.17|0.52|0.60  2.20|0.51|0.54  2.18|0.45|0.53  2.18|0.43|0.50  2.17|0.40|0.49  2.18|0.39|0.48  2.20|0.37|0.45  2.17|0.36|0.45  2.18|0.34|0.43  2.16|0.33|0.42  2.14|0.18|0.29  2.17|0.18|0.28
28      2.41|2.79|2.55  2.28|1.50|1.43  2.27|1.01|1.01  2.25|0.82|0.89  2.29|0.68|0.73  2.24|0.60|0.67  2.23|0.54|0.60  2.24|0.50|0.56  2.24|0.46|0.53  2.38|0.45|0.52  2.26|0.41|0.49  2.26|0.40|0.47  2.26|0.38|0.46  2.25|0.37|0.45  2.30|0.35|0.44  2.26|0.34|0.43  2.23|0.18|0.28  2.22|0.18|0.28
29      2.49|2.85|2.64  2.41|1.50|1.46  2.36|1.07|1.05  2.45|0.85|0.86  2.33|0.69|0.75  2.32|0.61|0.69  2.46|0.57|0.62  2.33|0.52|0.58  2.33|0.48|0.54  2.33|0.46|0.53  2.32|0.41|0.49  2.32|0.41|0.48  2.33|0.38|0.48  2.32|0.37|0.45  2.30|0.35|0.44  2.31|0.35|0.44  2.30|0.19|0.30  2.32|0.18|0.28
30      2.59|2.99|2.77  2.47|1.54|1.50  2.44|1.10|1.08  2.42|0.86|0.93  2.48|0.71|0.77  2.42|0.62|0.71  2.42|0.57|0.63  2.48|0.51|0.59  2.41|0.48|0.58  2.45|0.44|0.53  2.49|0.41|0.50  2.49|0.41|0.49  2.46|0.40|0.48  2.51|0.38|0.46  2.45|0.36|0.45  2.41|0.36|0.44  2.49|0.19|0.28  2.36|0.18|0.28
31      2.62|2.99|2.87  2.60|1.59|1.53  2.48|1.08|1.09  2.45|0.87|0.90  2.45|0.73|0.77  2.43|0.63|0.69  2.55|0.59|0.64  2.46|0.54|0.60  2.44|0.49|0.56  2.44|0.45|0.53  2.44|0.43|0.51  2.55|0.41|0.50  2.44|0.39|0.47  2.43|0.39|0.47  2.44|0.36|0.45  2.42|0.36|0.44  2.40|0.19|0.29  2.38|0.18|0.28
32      2.72|3.10|2.86  2.60|1.68|1.60  2.58|1.15|1.13  2.57|0.93|0.93  2.59|0.74|0.80  2.56|0.65|0.73  2.55|0.59|0.65  2.56|0.55|0.61  2.56|0.51|0.58  2.60|0.47|0.55  2.55|0.43|0.52  2.55|0.42|0.49  2.55|0.40|0.48  2.54|0.39|0.49  2.67|0.38|0.46  2.54|0.36|0.45  2.52|0.18|0.29  2.58|0.18|0.29
33      2.79|3.17|2.94  2.68|1.64|1.59  2.63|1.17|1.14  2.61|0.93|0.94  2.59|0.75|0.81  2.58|0.65|0.74  2.68|0.60|0.66  2.59|0.54|0.61  2.59|0.51|0.57  2.59|0.46|0.56  2.59|0.43|0.52  2.74|0.43|0.51  2.60|0.40|0.50  2.60|0.39|0.47  2.57|0.37|0.46  2.57|0.36|0.46  2.60|0.18|0.28  2.55|0.18|0.28
34      2.87|3.29|3.03  2.73|1.68|1.64  2.71|1.20|1.17  2.70|0.96|1.00  2.74|0.77|0.82  2.67|0.67|0.75  2.67|0.61|0.67  2.75|0.54|0.63  2.69|0.54|0.61  2.82|0.48|0.56  2.79|0.44|0.53  2.70|0.43|0.51  2.70|0.41|0.49  2.79|0.40|0.49  2.72|0.38|0.47  2.67|0.37|0.46  2.75|0.18|0.28  2.63|0.18|0.28
35      2.91|3.41|3.09  2.85|1.68|1.66  2.74|1.18|1.18  2.73|0.93|0.96  2.83|0.77|0.82  2.71|0.68|0.76  2.76|0.61|0.67  2.86|0.57|0.63  2.72|0.51|0.59  2.71|0.48|0.56  2.71|0.46|0.56  2.87|0.44|0.51  2.72|0.41|0.49  2.73|0.41|0.49  2.74|0.38|0.46  2.73|0.37|0.46  2.76|0.19|0.28  2.71|0.18|0.28
36      2.98|3.33|3.08  2.82|1.74|1.68  2.79|1.21|1.26  2.83|0.96|0.99  2.82|0.79|0.85  2.80|0.70|0.77  2.79|0.63|0.68  2.80|0.57|0.64  2.95|0.53|0.59  2.78|0.48|0.56  2.76|0.45|0.53  2.77|0.43|0.51  2.77|0.41|0.49  2.76|0.41|0.49  2.77|0.38|0.47  2.76|0.37|0.46  2.73|0.18|0.28  2.73|0.18|0.28
37      3.10|3.51|3.25  2.92|1.76|1.72  2.90|1.25|1.22  2.87|0.98|1.00  2.86|0.80|0.85  2.85|0.72|0.79  2.89|0.64|0.69  2.86|0.57|0.63  2.86|0.53|0.60  2.85|0.48|0.59  2.88|0.46|0.55  2.90|0.44|0.53  2.87|0.42|0.53  2.91|0.41|0.49  2.84|0.39|0.47  2.84|0.38|0.49  2.90|0.18|0.28  2.82|0.18|0.28
38      3.14|3.57|3.34  3.03|1.82|1.77  3.00|1.30|1.27  3.00|0.99|1.06  3.05|0.81|0.87  2.93|0.71|0.78  2.93|0.63|0.70  3.08|0.59|0.65  3.14|0.56|0.63  2.97|0.50|0.59  2.97|0.48|0.56  2.96|0.45|0.54  2.96|0.43|0.51  2.96|0.43|0.51  2.96|0.40|0.48  2.96|0.39|0.47  2.94|0.19|0.29  2.93|0.18|0.28
39      3.27|3.63|3.38  3.22|1.85|1.82  3.08|1.33|1.29  3.07|1.02|1.05  3.06|0.86|0.90  3.04|0.73|0.82  3.06|0.66|0.73  3.06|0.60|0.67  3.05|0.55|0.62  3.04|0.52|0.58  3.04|0.48|0.59  3.14|0.46|0.54  3.04|0.44|0.52  3.04|0.43|0.52  3.08|0.40|0.49  3.03|0.40|0.50  3.13|0.18|0.28  2.98|0.18|0.28
40      3.29|3.66|3.39  3.12|1.88|1.88  3.14|1.32|1.34  3.16|1.03|1.06  3.10|0.86|0.91  3.09|0.74|0.81  3.09|0.66|0.72  3.09|0.60|0.68  3.17|0.56|0.63  3.10|0.51|0.60  3.09|0.48|0.56  3.08|0.45|0.54  3.09|0.44|0.52  3.27|0.44|0.50  3.08|0.40|0.49  3.07|0.40|0.48  3.05|0.18|0.28  3.05|0.18|0.28
41      3.41|3.73|3.48  3.21|1.92|1.88  3.20|1.35|1.33  3.18|1.05|1.07  3.18|0.86|0.92  3.16|0.78|0.86  3.22|0.70|0.73  3.17|0.62|0.69  3.22|0.56|0.64  3.18|0.53|0.63  3.21|0.49|0.58  3.21|0.46|0.55  3.16|0.44|0.55  3.25|0.43|0.51  3.15|0.41|0.49  3.15|0.40|0.51  3.31|0.19|0.28  3.13|0.18|0.28
42      3.45|3.80|3.69  3.35|1.94|1.90  3.25|1.35|1.42  3.31|1.06|1.09  3.40|0.88|0.93  3.22|0.76|0.84  3.22|0.67|0.73  3.23|0.64|0.68  3.49|0.58|0.66  3.31|0.53|0.61  3.29|0.51|0.59  3.28|0.47|0.56  3.29|0.45|0.54  3.30|0.46|0.54  3.32|0.42|0.50  3.28|0.40|0.49  3.27|0.19|0.29  3.26|0.18|0.28
43      3.53|3.79|3.58  3.30|2.01|1.91  3.26|1.38|1.35  3.25|1.05|1.09  3.25|0.88|0.96  3.29|0.79|0.85  3.29|0.69|0.75  3.26|0.62|0.73  3.33|0.58|0.65  3.31|0.53|0.61  3.30|0.50|0.61  3.42|0.47|0.56  3.29|0.46|0.54  3.26|0.44|0.52  3.37|0.41|0.50  3.26|0.41|0.52  3.27|0.19|0.28  3.29|0.18|0.28
44      3.61|3.98|3.70  3.44|2.03|1.99  3.53|1.39|1.49  3.60|1.12|1.14  3.41|0.92|0.97  3.41|0.79|0.87  3.39|0.71|0.77  3.41|0.65|0.72  3.63|0.59|0.66  3.39|0.54|0.62  3.35|0.51|0.59  3.40|0.49|0.57  3.39|0.46|0.55  3.39|0.45|0.54  3.41|0.43|0.51  3.39|0.41|0.50  3.37|0.19|0.28  3.35|0.18|0.28
45      3.73|4.07|3.82  3.54|2.08|2.04  3.50|1.46|1.43  3.49|1.14|1.15  3.47|0.93|0.98  3.46|0.84|0.88  3.52|0.75|0.76  3.47|0.65|0.73  3.48|0.60|0.67  3.46|0.55|0.65  3.48|0.52|0.63  3.65|0.50|0.58  3.45|0.47|0.57  3.51|0.45|0.53  3.44|0.43|0.52  3.45|0.42|0.50  3.53|0.19|0.28  3.43|0.18|0.28
46      3.73|4.15|3.91  3.57|2.09|2.05  3.54|1.43|1.45  3.58|1.13|1.18  3.62|0.93|0.99  3.51|0.81|0.88  3.50|0.72|0.79  3.67|0.67|0.73  3.49|0.61|0.69  3.52|0.55|0.64  3.50|0.54|0.61  3.47|0.49|0.57  3.46|0.46|0.55  3.45|0.47|0.54  3.48|0.43|0.51  3.45|0.41|0.50  3.43|0.18|0.29  3.43|0.18|0.28
47      3.76|4.27|4.04  3.71|2.18|2.06  3.61|1.46|1.45  3.54|1.13|1.17  3.53|0.97|1.01  3.54|0.80|0.90  3.56|0.73|0.79  3.56|0.66|0.76  3.59|0.60|0.68  3.52|0.55|0.64  3.52|0.53|0.64  3.63|0.50|0.58  3.52|0.47|0.55  3.52|0.46|0.54  3.61|0.43|0.52  3.52|0.42|0.51  3.62|0.19|0.28  3.50|0.18|0.28
48      3.90|4.24|3.96  3.69|2.17|2.24  3.85|1.52|1.66  3.82|1.18|1.20  3.68|0.96|1.02  3.65|0.83|0.91  3.65|0.74|0.80  3.65|0.67|0.74  3.91|0.64|0.71  3.69|0.56|0.65  3.65|0.54|0.61  3.64|0.52|0.59  3.64|0.48|0.56  3.64|0.48|0.55  3.69|0.44|0.53  3.65|0.43|0.51  3.62|0.18|0.28  3.62|0.18|0.28
49      4.02|4.29|4.03  3.76|2.18|2.13  3.71|1.51|1.51  3.70|1.20|1.20  3.68|0.97|1.02  3.68|0.87|0.92  3.71|0.78|0.80  3.67|0.68|0.75  3.72|0.61|0.70  3.67|0.58|0.68  3.72|0.54|0.65  3.75|0.51|0.60  3.68|0.48|0.59  3.75|0.47|0.54  3.66|0.44|0.53  3.66|0.43|0.53  3.88|0.19|0.28  3.68|0.18|0.28
50      4.04|4.42|4.22  3.88|2.23|2.20  3.84|1.55|1.55  3.88|1.21|1.29  3.79|0.99|1.05  3.80|0.86|0.94  3.80|0.77|0.83  3.97|0.69|0.76  3.98|0.63|0.72  3.82|0.58|0.67  3.83|0.56|0.64  3.80|0.53|0.61  3.80|0.49|0.57  3.79|0.50|0.57  3.85|0.45|0.54  3.80|0.43|0.53  3.76|0.19|0.29  3.74|0.18|0.28
51      4.17|4.51|4.23  3.96|2.25|2.24  3.91|1.59|1.57  3.90|1.22|1.26  3.89|1.03|1.07  3.86|0.90|0.99  3.98|0.78|0.84  3.88|0.70|0.79  3.89|0.63|0.71  3.86|0.59|0.68  3.86|0.56|0.67  4.07|0.55|0.62  3.85|0.50|0.58  3.86|0.48|0.59  3.95|0.45|0.54  3.85|0.44|0.53  3.98|0.19|0.29  3.84|0.18|0.28
52      4.22|4.57|4.28  4.03|2.31|2.34  4.05|1.60|1.69  4.07|1.26|1.29  3.98|1.03|1.09  3.97|0.89|0.96  3.98|0.79|0.86  3.97|0.72|0.78  3.98|0.65|0.74  3.99|0.60|0.69  3.97|0.57|0.65  3.96|0.54|0.62  3.96|0.51|0.59  3.96|0.51|0.60  4.08|0.46|0.55  3.97|0.45|0.54  3.93|0.19|0.28  3.93|0.18|0.28
53      4.32|4.56|4.38  4.09|2.36|2.31  4.05|1.62|1.62  4.03|1.28|1.30  4.00|1.03|1.09  4.00|0.90|0.98  4.06|0.80|0.85  4.02|0.72|0.79  4.00|0.65|0.73  4.01|0.61|0.72  4.05|0.57|0.68  4.08|0.54|0.63  4.01|0.51|0.59  4.14|0.49|0.57  4.00|0.47|0.55  4.00|0.46|0.54  4.18|0.19|0.29  3.97|0.18|0.28
54      4.34|4.70|4.52  4.18|2.37|2.33  4.11|1.63|1.64  4.32|1.30|1.32  4.21|1.05|1.10  4.08|0.91|0.96  4.07|0.80|0.87  4.27|0.75|0.80  4.05|0.67|0.75  4.10|0.62|0.70  4.08|0.59|0.66  4.06|0.54|0.63  4.06|0.51|0.60  4.07|0.52|0.59  4.10|0.47|0.56  4.07|0.46|0.54  4.04|0.19|0.29  4.03|0.18|0.28
55      4.51|4.78|4.47  4.22|2.39|2.35  4.17|1.65|1.66  4.19|1.28|1.32  4.17|1.06|1.14  4.19|0.90|1.04  4.19|0.82|0.89  4.19|0.75|0.85  4.27|0.66|0.74  4.16|0.63|0.71  4.18|0.58|0.67  4.39|0.58|0.65  4.17|0.52|0.61  4.17|0.51|0.58  4.35|0.48|0.56  4.15|0.47|0.55  4.26|0.19|0.29  4.15|0.18|0.29
56      4.52|4.84|4.55  4.29|2.46|2.44  4.36|1.68|1.69  4.34|1.32|1.36  4.27|1.09|1.14  4.25|0.93|1.01  4.25|0.83|0.89  4.24|0.75|0.82  4.24|0.68|0.75  4.22|0.63|0.71  4.20|0.58|0.67  4.20|0.55|0.63  4.18|0.52|0.60  4.19|0.49|0.59  4.24|0.48|0.56  4.19|0.46|0.55  4.17|0.18|0.28  4.17|0.18|0.28
57      4.57|5.03|4.77  4.39|2.46|2.43  4.30|1.71|1.69  4.30|1.35|1.36  4.28|1.09|1.14  4.26|0.93|1.03  4.31|0.84|0.91  4.32|0.77|0.83  4.32|0.69|0.77  4.30|0.64|0.75  4.43|0.58|0.67  4.39|0.56|0.64  4.27|0.53|0.61  4.41|0.50|0.58  4.24|0.49|0.58  4.32|0.46|0.56  4.51|0.19|0.29  4.29|0.18|0.28
58      4.64|5.01|4.79  4.46|2.52|2.48  4.40|1.75|1.74  4.55|1.34|1.40  4.36|1.11|1.16  4.38|0.95|1.03  4.38|0.85|0.91  4.59|0.78|0.84  4.36|0.73|0.81  4.56|0.65|0.74  4.44|0.62|0.71  4.43|0.57|0.66  4.46|0.55|0.62  4.44|0.51|0.62  4.48|0.50|0.59  4.42|0.47|0.57  4.39|0.19|0.29  4.37|0.18|0.28
59      4.69|5.14|4.79  4.51|2.61|2.48  4.44|1.77|1.76  4.44|1.35|1.40  4.42|1.12|1.19  4.42|0.96|1.08  4.49|0.86|0.92  4.43|0.78|0.87  4.48|0.69|0.77  4.40|0.64|0.73  4.41|0.60|0.69  4.51|0.57|0.65  4.42|0.54|0.62  4.42|0.51|0.60  4.55|0.49|0.58  4.50|0.49|0.57  4.40|0.19|0.30  4.37|0.18|0.28
60      4.82|5.14|4.83  4.57|2.60|2.65  4.62|1.78|1.79  4.65|1.39|1.44  4.56|1.15|1.19  4.53|0.98|1.06  4.53|0.86|0.92  4.53|0.79|0.86  4.55|0.71|0.79  4.58|0.67|0.75  4.54|0.61|0.70  4.54|0.58|0.66  4.53|0.55|0.63  4.52|0.54|0.62  4.56|0.51|0.59  4.53|0.49|0.57  4.51|0.19|0.28  4.49|0.18|0.28
61      4.90|5.15|4.98  4.68|2.65|2.61  4.64|1.83|1.82  4.63|1.44|1.45  4.59|1.15|1.21  4.60|1.03|1.12  4.76|0.88|0.92  4.60|0.80|0.87  4.61|0.71|0.79  4.59|0.67|0.78  4.69|0.62|0.74  4.67|0.59|0.67  4.60|0.56|0.64  4.78|0.54|0.61  4.58|0.51|0.59  4.58|0.48|0.58  4.78|0.19|0.29  4.58|0.18|0.28
62      4.93|5.31|5.12  4.75|2.65|2.62  4.70|1.82|1.91  4.72|1.42|1.47  4.67|1.17|1.22  4.67|1.00|1.08  4.66|0.88|0.95  4.88|0.82|0.87  4.68|0.73|0.81  4.68|0.68|0.76  4.67|0.64|0.72  4.64|0.59|0.68  4.65|0.56|0.64  4.65|0.55|0.65  4.79|0.51|0.60  4.68|0.50|0.58  4.62|0.19|0.29  4.61|0.18|0.28
63      5.02|5.25|5.11  4.85|2.69|2.67  4.79|1.85|1.87  4.78|1.43|1.48  4.77|1.23|1.26  4.74|1.05|1.11  4.83|0.89|0.97  4.78|0.82|0.91  4.82|0.73|0.81  4.76|0.69|0.77  4.75|0.63|0.76  4.93|0.60|0.68  4.78|0.57|0.65  4.75|0.54|0.62  4.91|0.52|0.60  4.73|0.49|0.59  4.69|0.19|0.28  4.68|0.18|0.28
64      5.10| na |5.10  4.81| na |2.79  4.90| na |1.96  5.06| na |1.50  4.84| na |1.25  4.82| na |1.11  4.80| na |0.96  4.80| na |0.89  5.06| na |0.82  4.83| na |0.78  4.83| na |0.72  4.80| na |0.69  4.79| na |0.66  4.79| na |0.64  4.83| na |0.61  4.80| na |0.59  4.76| na |0.29  4.76| na |0.28

@vneiger
Copy link
Collaborator Author

vneiger commented Oct 27, 2025

On my list to finalize this:

  • add documentation
  • see if memory consumption of the Shoup variant can be reduced to 2*len instead of 4*len
  • add case for doing the naive approach when len == 1
  • check performance a bit more carefully on a few machines, and add some logic in the main function to refine the choice of variant depending on length/bitsize (notably for very small lengths and very small bitsize)

- reduce length of temporary vector to 3*n instead of 4*n
- check if len == 1 and do naive approach in that case
@vneiger
Copy link
Collaborator Author

vneiger commented Oct 27, 2025

This PR is ready for review. Performance on three machines attached. On recent-ish machines, the non naive approach is basically always interesting.

On older machines with a slower integer division, the gain is less significant, and the naive approach would actually be faster for moduli of very small bitsizes (up to bitsize 5 or so). This was not enough to convince me that having thresholds for this could be useful, but this can be discussed.

profile.txt

@vneiger vneiger marked this pull request as ready for review October 27, 2025 21:32
@albinahlback
Copy link
Collaborator

On older machines with a slower integer division, the gain is less significant, and the naive approach would actually be faster for moduli of very small bitsizes (up to bitsize 5 or so). This was not enough to convince me that having thresholds for this could be useful, but this can be discussed.

Cool! I think it is okay, Cascade Lake is 6 years old anyway.

vneiger and others added 3 commits October 27, 2025 23:57
@fredrik-johansson
Copy link
Collaborator

This is a nice speedup. Do you have an application?

I know some functions where we need to construct [1, 1/2, 1/3, 1/4, ...], but this is a special case where one should be able to do a bit better than a general algorithm (there is a slightly-less than naive _gr_nmod_vec_reciprocals which isn't really optimized).

@vneiger
Copy link
Collaborator Author

vneiger commented Oct 28, 2025

This is a nice speedup. Do you have an application?

This appears in functions for Cauchy / Cauchy-like matrices, but I'm not sure about any plans for such structured linear algebra being in FLINT in the near future.

Recently I needed this when writing draft code for rational reconstruction, which I would like to add to FLINT in the somehow-near future. More precisely this was the Cauchy interpolation case of rational reconstruction (as in [von zur Gathen and Gerhard, Section 5.8]), where I had to invert a bunch of evaluations of a polynomial. This might also be useful more generally for related algorithms that compute with 2 x 2 univariate polynomial matrices like the half-gcd or Padé approximation, when multiplications are done through FFT evaluation-interpolation.

@edgarcosta
Copy link
Member

This is also useful for BSGS algorithms. Many times one needs to patch the output of the main loop with an inverse coming from a known sequence apriori.

@vneiger
Copy link
Collaborator Author

vneiger commented Oct 28, 2025

This is also useful for BSGS algorithms. Many times one needs to patch the output of the main loop with an inverse coming from a known sequence apriori.

Thanks, that's good to know.

I don't have more to add to this PR, and regarding the initial PR message, I am not sure about adding a function like

void nmod_inv2(ulong * inv_a1, ulong * inv_a2, ulong a1, ulong a2, nmod_t mod)

because I don't have a use case in mind. Unless someone has suggestions, for me this is ready for merge.

@vneiger vneiger merged commit c6147f4 into flintlib:main Oct 28, 2025
14 checks passed
@fredrik-johansson
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants