-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Groebner basis related functions for fmpz_mpoly_vec #191
Conversation
It is good to add this but I am not sure how well developed this part of Flint actually is. I haven't looked through the code or anything but for example the only Groebner basis function is It looks like it computes a non-reduced Groebner basis by default: In [13]: x, y, z = ctx.gens()
In [14]: polys = [x**2 - 2, y**2 - 3, z - x - y]
In [15]: vec = fmpz_mpoly_vec(polys, ctx)
In [17]: list(vec.buchberger_naive())
Out[17]:
[x^2 - 2,
y^2 - 3,
x + y - z,
2*y*z - z^2 - 1,
2*y + z^3 - 11*z,
z^4 - 10*z^2 + 1]
In [18]: list(vec.buchberger_naive().autoreduction())
Out[18]: [z^4 - 10*z^2 + 1, 2*y + z^3 - 11*z, 2*x - z^3 + 9*z] I'm no expert on these things but my understanding is that to make Buchberger's algorithm efficient the reduction is used as part of the algorithm so you would never actually have a non-reduced basis like this. This looks like perhaps a very literal interpretation of Buchberger's algorithm as originally described although modified for The SymPy default algorithm is Buchberger but it is a modified version of the algorithm and computes a reduced basis automatically: In [9]: print(list(groebner([x**2 - 2, y**2 - 3, z - x - y])))
[2*x - z**3 + 9*z, 2*y + z**3 - 11*z, z**4 - 10*z**2 + 1]
In [10]: print(list(groebner([x**2 - 2, y**2 - 3, z - x - y], method='f5b')))
[2*x - z**3 + 9*z, 2*y + z**3 - 11*z, z**4 - 10*z**2 + 1] SymPy also has the In [11]: print(list(groebner([x**2 - 2, y**2 - 3, z - x - y], order='grevlex').fglm('lex')))
[2*x - z**3 + 9*z, 2*y + z**3 - 11*z, z**4 - 10*z**2 + 1] That is often a lot more efficient than computing a lex basis directly. It might be that at least for SymPy it actually makes more sense to use the existing Groebner basis implementation but just with The complexity growth for computing a Groebner basis is extreme so actually using the best high-level algorithm ends up being more important than micro-optimising arithmetic if you have to choose between implementations. Ideally you would of course optimise the arithmetic and have the best algorithms but that's why we want to use Here is a script for timing things: import flint
import sympy
def groebner_sympy(n, method='buchberger', order='lex', fglm=False):
xs = sympy.symbols(f'x:{n}')
a = sympy.symbols('a')
ps = [sympy.prime(i) for i in range(1, n+1)]
polys = [x**2 - p for x, p in zip(xs, ps)]
polys.append(a - sum(xs))
if fglm:
gb = sympy.groebner(polys, [*xs, a], method=method, order='grevlex')
gb = gb.fglm('lex')
else:
gb = sympy.groebner(polys, [*xs, a], method=method, order=order)
return list(gb)
def groebner_flint(n):
names = [f'x{i}' for i in range(n)]
a = 'a'
ctx = flint.fmpz_mpoly_ctx(n+1, flint.Ordering.lex, names + [a])
ps = [int(sympy.prime(i)) for i in range(1, n+1)]
[*xs, a] = ctx.gens()
polys = [x**2 - p for x, p in zip(xs, ps)]
polys.append(a - sum(xs))
vec = flint.fmpz_mpoly_vec(polys, ctx)
gb = vec.buchberger_naive().autoreduction()
return list(gb) This is basically computing the Groebner basis of: x0^2 - 2
x1^2 - 3
...
xn^2 - prime(n)
s - (x0 + x1 + x2 + ...) The example is a way to compute a primitive element for the extension field Here are some timings with SymPy using the default Buchberger algorithm: In [2]: %time ok = groebner_sympy(2)
CPU times: user 90 ms, sys: 5 µs, total: 90 ms
Wall time: 89.1 ms
In [3]: %time ok = groebner_sympy(3)
CPU times: user 58.3 ms, sys: 3.57 ms, total: 61.8 ms
Wall time: 60.2 ms
In [4]: %time ok = groebner_sympy(4)
CPU times: user 351 ms, sys: 0 ns, total: 351 ms
Wall time: 348 ms
In [5]: %time ok = groebner_sympy(5)
CPU times: user 36.6 s, sys: 20.5 ms, total: 36.6 s
Wall time: 36.6 s As you can see this scales very badly. Going from n=5 to n=6 made it 100x slower. Likely going to n=7 would be more than 100x slower i.e. more than an hour. These are timings with the PR here and python-flint: In [8]: %time ok = groebner_flint(2)
CPU times: user 938 µs, sys: 6 µs, total: 944 µs
Wall time: 962 µs
In [9]: %time ok = groebner_flint(3)
CPU times: user 1.89 ms, sys: 0 ns, total: 1.89 ms
Wall time: 1.9 ms
In [10]: %time ok = groebner_flint(4)
CPU times: user 47.9 ms, sys: 1e+03 ns, total: 47.9 ms
Wall time: 47.5 ms
In [11]: %time ok = groebner_flint(5)
Terminated I don't know how long it would have taken for What this suggests is that while Flint's This is why I like the sound of Let's try SymPy's f5b algorithm: In [3]: %time ok = groebner_sympy(2, method='f5b')
CPU times: user 99.8 ms, sys: 7.99 ms, total: 108 ms
Wall time: 107 ms
In [4]: %time ok = groebner_sympy(3, method='f5b')
CPU times: user 49.9 ms, sys: 0 ns, total: 49.9 ms
Wall time: 48.7 ms
In [5]: %time ok = groebner_sympy(4, method='f5b')
CPU times: user 235 ms, sys: 4.01 ms, total: 239 ms
Wall time: 238 ms
In [6]: %time ok = groebner_sympy(5, method='f5b')
^C^C
KeyboardInterrupt What about using FGLM? In [7]: %time ok = groebner_sympy(2, method='f5b', fglm=True)
CPU times: user 43.1 ms, sys: 0 ns, total: 43.1 ms
Wall time: 42.4 ms
In [8]: %time ok = groebner_sympy(3, method='f5b', fglm=True)
CPU times: user 49.9 ms, sys: 0 ns, total: 49.9 ms
Wall time: 48.9 ms
In [9]: %time ok = groebner_sympy(4, method='f5b', fglm=True)
CPU times: user 199 ms, sys: 7.98 ms, total: 207 ms
Wall time: 204 ms
In [10]: %time ok = groebner_sympy(5, method='f5b', fglm=True)
CPU times: user 1.34 s, sys: 3.98 ms, total: 1.34 s
Wall time: 1.34 s
In [11]: %time ok = groebner_sympy(6, method='f5b', fglm=True)
CPU times: user 12.7 s, sys: 7.96 ms, total: 12.7 s
Wall time: 12.7 s
In [14]: %time ok = groebner_sympy(7, method='f5b', fglm=True)
CPU times: user 2min 57s, sys: 80.1 ms, total: 2min 57s
Wall time: 2min 58s Now we got to Note that if SymPy's PolyElement could wrap The main speedup here comes from first computing a grevlex basis with the fglm part being less significant then the Groebner part I think. Flint doesn't have the FGLM algorithm but let's just try computing a grevlex basis (using In [20]: %time ok = groebner_flint(2)
CPU times: user 1.22 ms, sys: 2 µs, total: 1.22 ms
Wall time: 1.24 ms
In [21]: %time ok = groebner_flint(3)
CPU times: user 1.55 ms, sys: 2 µs, total: 1.55 ms
Wall time: 1.57 ms
In [22]: %time ok = groebner_flint(4)
CPU times: user 3.81 ms, sys: 0 ns, total: 3.81 ms
Wall time: 3.94 ms
In [23]: %time ok = groebner_flint(5)
^C^CTerminated Again I killed it after about 5 minutes. I think the problem here is that the unreduced Groebner basis explodes in size which is why you want the modified Buchberger algorithm that is designed to reduce the basis efficiently as it progresses. Something that the multivariate polynomials in python-flint do not yet have but that would absolutely be useful are resultants and discriminants which are also used as an alternative to Groebner bases in some situations. |
Thanks for such a detailed reply! Funny you say this because this is exactly what I'm currently working on measuring but with profiling and more benchmarks over (some of) the PHCpack database of polynomial systems. I'm making use of the functions added in this PR and thought best to just upstream them. |
Have you tried anything else besides SymPy and python-flint in your benchmarks? I would be interested to know how these things compare for asymptotic scaling like if there are any systems that can go a lot higher than SymPy can. Note that a 20x speedup in this area might sound like a lot but it also potentially doesn't by you much like maybe you can go from |
Not on these benchmarks in particular. I compared sympys dmp_python, python-flint, and Mathematica a while back with gcd and factoring on a rather pathological polynomial (5 generators, 18k terms, 6mb in text form) and both python-flint and Mathematica beat sympy by about 280x (~100ms compared to ~29s). I've attempted to get access to Maple but my university denied my application for a license.
As would I, I'm not aware of any personally |
That doesn't surprise me. It is possible that Mathematica actually uses Flint for this. You should probably expect a good algorithm well implemented in pure Python in SymPy to be at least 20x slower than Flint anyway but also different algorithms make a big difference in different cases. It is probably not difficult to make some basic improvements to this in SymPy but the main thing is that you need a few different algorithms and good heuristics for choosing between them. I am sure that there are cases where the difference is a lot worse than 280x slower. |
Sage wraps a bunch of different libraries. I assume it has several different implementations and algorithms for computing Groebner bases although I don't know how you can choose between them. The docs claim that Singular is mostly used and that
|
msolve is another library which is supported by sagemath, which was added mroe recently: https://doc.sagemath.org/html/en/reference/polynomial_rings/sage/rings/polynomial/msolve.html I think the "winner" of CAS for these computations is Magma, which is very sad as it's closed source so all the magic is essentially hidden behind licenses. |
I've just turned the doc examples and the example shown here into tests as well as add equality comparison. |
I'll mark this as ready for review, CI should pass |
Thank you both, I will check those out. |
I left a few comments but we can just merge if you prefer. |
As a total aside, would you be interested in adding some linting to the CI? I think something like https://docs.astral.sh/ruff/ (ruff) has been designed to work with cython (indeed, they measure performance against the cython repository). If we decided on some config then we could quite easily automatically format with |
That's fine by me if others are happy with it. I haven't used ruff much and didn't realise that it works on Cython. If you run ruff now does it completely transform the entire codebase? Perhaps a first step would be to use ruff's checking functionality rather than the auto-formatting part. Does it reveal any errors? It might need some configuration to limit things like reformatting auto-generated files. |
I just checked and it indeed finds some small things. I have a few minutes now, let me run |
For formatting, maybe cython still isn't supported, but at least the linter is For cython formatting, I don't know what's best, but there must be some kind of solution... |
There might not be. Cython is not such a mainstream language as Python is. Some basic linting would be good though. |
I don't believe there to be. Theres cython-lint which can catch common errors, unused variables/imports, and line lengths but it can't format. It's what I use and have a small config in
But it can do a bit more that just that. |
It sounds like we're all on board with using a linter then so maybe we should just commit it. Or would it throw up the whole codebase? |
It does indeed, I get ~1200 lines of warnings from
with this config [tool.cython-lint]
max-line-length = 120
ignore = ['E225', 'E231'] most of which are line length warnings (~800). |
I think we at least want this in the config:
Those files are all automatically generated and have long lines. It is not worth trying to format the auto-generated code nicely I think. That brings the number of complaints down to about 500 already. Of those E501 (line too long) is only 35 lines. Reducing the line length limit to 88 gives 480 lines too long though. There are 73 E701 which is multiple statements on one line like:
I prefer to see this split onto multiple lines myself but don't have strong feelings about it. There are 57 W605 (invalid escape sequence). This is something like:
That is a valid complaint: it should be a raw string for embedded LaTeX. Many of the complaints are about whitespace and indentation which I think are mostly reasonable although I haven't looked through what it wants for indentation. There are also a lot of complaints about things being defined or imported but not being used anywhere which definitely seem like reasonable complaints to me. The only fixer it has is apparently changing single quote to double quotes which seems like the least useful kind of fixer. It should be trivially possible to implement fixers for things like Maybe we can just add something: #!/usr/bin/env python
# bin/strip_whitespace.py
from pathlib import Path
def clean(filename):
with open(filename) as fin:
lines = fin.readlines()
lines = [line.rstrip() + '\n' for line in lines]
while lines and lines[-1] == "\n":
lines.pop()
with open(filename, 'w') as fout:
fout.writelines(lines)
src_dir = Path('src')
for root, dirs, files in src_dir.walk():
for file in files:
path = root / file
if path.suffix in ('.py', '.pyx', '.pxd'):
clean(path) That script removes 50 of the complaints from current code. |
Okay, looks good to me. Thanks! |
That all sounds pretty reasonable to me. A little upfront elbow (finger?) grease for a more consistent codebase. It's a shame there isn't better tooling but parsing Cython doesn't seem easy, not even Cython gets it right sometimes. |
Actual tests (only doctests present) are missing currently only because it is late here, I'll write some up tomorrow.
I'm not entirely confident with Groebner bases, so I've lifted the Groebner, auto-reduction, and BuckBerger examples from Wikipedia, and used similar polynomials in the S-poly and
reduction_primitive_part
examples, though they may not be motivating.