Optimize `lagrange_poly()` #432

mhostetter · 2022-11-08T20:31:03Z

Fixes #427.

@Popeyef5 I don't think it is possible to significantly beat the "vanilla" implementation you have for your case in #427 of a very large prime. The reason is that I'm doing everything you're doing (and in the same way, ie in pure Python). But I'm also doing more -- argument verification, bounds checking in FieldArray, and other overhead when creating the FieldArray and Poly classes -- that I can't remove. However, I should be able to come close to the "vanilla" implementation when ufunc_mode="python-calculate".

Now in cases where I can use Numba and JIT compile (ie ufunc_mode="jit-lookup" or ufunc_mode="jit-calculate"), I do outperform the "vanilla" version.

I agree galois is not currently close in your use case. I found a few issues:

There was an inefficiency in the division ufunc under the condition when the divisor was scalar -- fixed in Fix inefficiency when dividing by a scalar #429.
I discovered that the convolution being used in the lagrange_poly() routine was the O(N^2) version. That was replaced with the more efficient O(N*log(N)) version in Make convolution JIT function more efficient for prime fields #431.
I added a special JIT-able Lagrange polynomial routine that minimizes Poly object overhead during the routine.

Below are some performance examples.

Note lagrange_poly() is the old version and lagrange_poly2() is the new version. Once done testing, the old will be replaced with the new.

Your example modified `test_427.py`

import galois
import numpy as np


def mod_inv(x, p):
  x = x % p
  t = 0
  nt = 1
  r = p
  nr = x
  while nr != 0:
    q = r // nr
    t, nt = nt, t - q * nt
    r, nr = nr, r - q * nr
  if r != 1:
    raise ValeError("number has no inverse")
  return t % p


def polydivmod(u, v, p):
  m = len(u)
  n = len(v)
  s = mod_inv(v[0], p)
  q = np.array([0]*max(m-n+1, 1), dtype=object)
  r = np.copy(u)

  for i in range(m - n +1):
    d = s * r[i] % p
    q[i] = d
    r[i:i+n] = np.mod(r[i:i+n] - v * d, p)

  first = 0
  for i in range(len(r)-1):
    if r[i] != 0:
      break
    first += 1

  r = r[first:]

  return q, r


class fpoly1d:

  def __init__(self, c, prime):
    self.prime = prime

    if isinstance(c, fpoly1d):
      self.coeffs = c.coeffs
      return

    c = np.atleast_1d(c).astype(object)
    c = np.trim_zeros(c, 'f')
    self.coeffs = np.mod(c, self.prime)

  def __mul__(self, other):
    if isinstance(other, fpoly1d):
      return fpoly1d(np.convolve(self.coeffs, other.coeffs), self.prime)
    else:
      return fpoly1d(self.coeffs * other, self.prime)

  def __rmul__(self, other):
    if np.isscalar(other):
      return fpoly1d(other * self.coeffs, self.prime)
    else:
      other = fpoly1d(other, self.prime)
      return fpoly1d(np.convolve(self.coeffs, other.coeffs), self.prime)

  def __add__(self, other):
    other = fpoly1d(other, self.prime)
    return fpoly1d(np.polyadd(self.coeffs, other.coeffs), self.prime)

  def __radd__(self, other):
    other = fpoly1d(other, self.prime)
    return fpoly1d(np.polyadd(self.coeffs, other.coeffs), self.prime)

  def __div__(self, other):
    if np.isscalar(other):
      other_inv = mod_inv(other, self.prime)
      return fpoly1d(self.coeffs * other_inv, self.prime)
    q, r = polydivmod(self.coeffs, other.coeffs, self.prime)
    return fpoly1d(q, self.prime), fpoly1d(r, self.prime)

  __truediv__ = __div__

  def __rdiv__(self, other):
    if np.isscalar(other):
      other_inv = mod_inv(other, self.prime)
      return fpoly1d(self.coeffs * other_inv, self.prime)
    q, r = polydivmod(self.coeffs, other.coeffs, self.prime)
    return fpoly1d(q, self.prime), fpoly1d(r, self.prime)


def vanilla_interpolate(x, y, prime):
  # Emulating SciPy's Lagrange implementation
  poly = fpoly1d(0, prime)
  for i in np.flatnonzero(y):
    pt = fpoly1d(y[i], prime)
    for j in range(len(x)):
      if i == j:
        continue
      fac = x[i] - x[j]
      pt *= fpoly1d([1, -x[j]], prime) / fac
    poly += pt
  return poly

Example 1: JIT + lookup tables -- 50x

In [1]: %run test_427

In [2]: GF = galois.GF(13693)

In [3]: GF.ufunc_mode
Out[3]: 'jit-lookup'

In [4]: X = GF.Random(100, seed=1)

In [5]: Y = GF.Random(100, seed=2)

In [6]: %timeit p_g = galois.lagrange_poly(X, Y)
2.77 s ± 6.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit p_g2 = galois.lagrange_poly2(X, Y)
4.64 ms ± 76.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit p_v = vanilla_interpolate(X.view(np.ndarray).astype(object), Y.view(np.ndarray).astype(object), GF.order)
235 ms ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Example 2: JIT + explicit calculation -- 43x

In [1]: %run test_427

In [2]: GF = galois.GF(2424113)

In [3]: GF.ufunc_mode
Out[3]: 'jit-calculate'

In [4]: X = GF.Random(100, seed=1)

In [5]: Y = GF.Random(100, seed=2)

In [6]: %timeit p_g = galois.lagrange_poly(X, Y)
2.55 s ± 12.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit p_g2 = galois.lagrange_poly2(X, Y)
5.35 ms ± 34.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit p_v = vanilla_interpolate(X.view(np.ndarray).astype(object), Y.view(np.ndarray).astype(object), GF.order)
254 ms ± 1.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Example 3: Pure Python calculation -- 1.13x

In [1]: %run test_427

In [2]: GF = galois.GF(21888242871839275222246405745257275088696311157297823662689037894645226208583, primitive_element=3, verify=False)

In [3]: GF.ufunc_mode
Out[3]: 'python-calculate'

In [4]: X = GF.Random(100, seed=1)

In [5]: Y = GF.Random(100, seed=2)

In [6]: %timeit p_g = galois.lagrange_poly(X, Y)
3.9 s ± 6.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit p_g2 = galois.lagrange_poly2(X, Y)
709 ms ± 8.83 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [8]: %timeit p_v = vanilla_interpolate(X.view(np.ndarray).astype(object), Y.view(np.ndarray).astype(object), GF.order)
802 ms ± 4.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

codecov · 2022-11-08T20:41:29Z

Codecov Report

Base: 97.06% // Head: 96.97% // Decreases project coverage by -0.08% ⚠️

Coverage data is based on head (65d4d75) compared to base (8b8a40e).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #432      +/-   ##
==========================================
- Coverage   97.06%   96.97%   -0.09%     
==========================================
  Files          42       42              
  Lines        5346     5396      +50     
==========================================
+ Hits         5189     5233      +44     
- Misses        157      163       +6

Impacted Files	Coverage Δ
src/galois/_fields/_array.py	`98.90% <100.00%> (-0.01%)`	⬇️
src/galois/_polys/_dense.py	`96.79% <100.00%> (-1.15%)`	⬇️
src/galois/_polys/_functions.py	`100.00% <100.00%> (ø)`
src/galois/_polys/_poly.py	`96.65% <100.00%> (ø)`
src/galois/_polys/_conversions.py	`98.19% <0.00%> (-0.91%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Fixes #434. This is done to be more consistent with NumPy's repr()/str() styling.

Popeyef5 · 2023-01-03T04:02:27Z

Awesome, thank you for replying so quickly and sorry for the late reply. I got caight up with work and had to put the project on hold for a bit.

Regarding this, my issue is that I'm working with large prime finite fields, so JIT compilation isn't an option. I'm looking into different alternatives because it's taking prohibitely long to interpolate.

mhostetter added performance Affects speed/performance poly Related to polynomials labels Nov 8, 2022

mhostetter added 3 commits November 9, 2022 11:54

Make lagrange_poly() unit tests more robust

baddcc3

JIT compile polynomial addition routine

db565a7

JIT compile polynomial subtraction routine

afd247d

mhostetter force-pushed the optimize-lagrange-poly branch from 771b8bb to a2b99f9 Compare November 9, 2022 21:17

JIT compile construction of Lagrange polys

7386de3

mhostetter force-pushed the optimize-lagrange-poly branch from a2b99f9 to 8b05a31 Compare November 9, 2022 22:24

mhostetter added 2 commits November 9, 2022 17:46

Don't display comma separator in FieldArray.__str__()

d881cc8

Fixes #434. This is done to be more consistent with NumPy's repr()/str() styling.

Add release notes for v0.1.2

65d4d75

mhostetter force-pushed the optimize-lagrange-poly branch from 8b05a31 to 65d4d75 Compare November 9, 2022 22:47

mhostetter merged commit e07461b into master Nov 10, 2022

mhostetter deleted the optimize-lagrange-poly branch November 10, 2022 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `lagrange_poly()` #432

Optimize `lagrange_poly()` #432

mhostetter commented Nov 8, 2022

codecov bot commented Nov 8, 2022 •

edited

Loading

Popeyef5 commented Jan 3, 2023

Optimize lagrange_poly() #432

Optimize lagrange_poly() #432

Conversation

mhostetter commented Nov 8, 2022

Example 1: JIT + lookup tables -- 50x

Example 2: JIT + explicit calculation -- 43x

Example 3: Pure Python calculation -- 1.13x

codecov bot commented Nov 8, 2022 • edited Loading

Codecov Report

Popeyef5 commented Jan 3, 2023

Optimize `lagrange_poly()` #432

Optimize `lagrange_poly()` #432

codecov bot commented Nov 8, 2022 •

edited

Loading