Replacing JAX backend with custom autodiff, possibly complex-step? #3349

jpivarski · 2024-12-18T15:00:56Z

jpivarski
Dec 18, 2024
Maintainer

As presented at https://indico.cern.ch/event/1387764/
and discussed in https://iris-hep.slack.com/archives/C0155BGPGE4/p1734127343384079

The idea is to replace Awkward Array's JAX backend with

nothing (give up on autodiff)
a mini-library with an API that would be a better backend
functionality built right into Awkward Array itself.

This is based on the observation that autodiff is not hard to implement, and all of the troubles we're going through to partially implement autodiff through JAX might be more easily solved by a custom autodiff with a friendlier API.

Here's a complete demonstrator of eager, forward-mode autodiff using the complex-step technique:

awkward/studies/autodiff/eager_forward.py

Lines 11 to 123 in c59a49c

    
           import numpy as np 
        
           from numpy.lib.mixins import NDArrayOperatorsMixin 
        
           class diffarray(NDArrayOperatorsMixin): 
        
               __slots__ = ("_array",) 
        
               @classmethod 
        
               def _build(cls, complex_array): 
        
                   "Manual constructor from a `complex_array`." 
        
                   self = cls.__new__(cls) 
        
                   self._array = complex_array 
        
                   return self 
        
               def __init__(self, primal, tangent=None, *, dtype=None): 
        
                   "Constructor for floating-point `primal` and (optional) `tangent`." 
        
                   if dtype is None: 
        
                       dtype = primal.dtype.type 
        
                   elif isinstance(dtype, np.dtype): 
        
                       dtype = dtype.type 
        
                   if issubclass(dtype, np.float32): 
        
                       self._array = primal.astype(np.complex64) 
        
                   elif issubclass(dtype, np.float64): 
        
                       self._array = primal.astype(np.complex128) 
        
                   else: 
        
                       raise TypeError("only float32 or float64 arrays can be differentiated") 
        
                   self._array += (1 if tangent is None else tangent) * 1j * self._step_scale 
        
               @property 
        
               def _step_scale(self): 
        
                   "Size of the complex step; half precision of 1.0." 
        
                   return 1e-4 if issubclass(self._array.dtype.type, np.complex128) else 1e-8 
        
               @property 
        
               def primal(self): 
        
                   "Array of primary values." 
        
                   return np.real(self._array) 
        
               @property 
        
               def tangent(self): 
        
                   "Array of derivatives." 
        
                   return np.imag(self._array) / self._step_scale 
        
               def __str__(self): 
        
                   primal = str(self.primal).replace("\n", "\n         ") 
        
                   tangent = str(self.tangent).replace("\n", "\n         ") 
        
                   return f"primal:  {primal}\ntangent: {tangent}" 
        
               def __repr__(self): 
        
                   primal = str(self.primal).replace("\n", "\n          ") 
        
                   tangent = str(self.tangent).replace("\n", "\n          ") 
        
                   dtype = "" 
        
                   if issubclass(self._array.dtype.type, np.complex64): 
        
                       dtype = ",\n          dtype=np.float32" 
        
                   return f"diffarray({primal},\n          {tangent}{dtype})" 
        
               def _prepare(self, args, kwargs): 
        
                   "Used in NEP-13 and NEP-18 overrides." 
        
                   cls = type(self) 
        
                   args = [x._array if isinstance(x, cls) else x for x in args] 
        
                   kwargs = {k: v._array if isinstance(x, cls) else v for k, v in kwargs.items()} 
        
                   return cls, args, kwargs 
        
               def __array_ufunc__(self, ufunc, method, *args, **kwargs): 
        
                   "https://numpy.org/neps/nep-0013-ufunc-overrides.html" 
        
                   if ufunc.__name__ == "absolute": 
        
                       # interpret `absolute` only on the primal 
        
                       if len(kwargs) != 0: 
        
                           raise NotImplementedError("kwargs in np.absolute") 
        
                       arg = args[0]._array 
        
                       out = arg.copy() 
        
                       out[arg.real < 0] *= -1 
        
                       return type(self)._build(out) 
        
                   if ufunc.__name__ in ( 
        
                       "less", "less_equal", "equal", "not_equal", "greater", "greater_equal" 
        
                   ): 
        
                       # do comparisons only on the primal 
        
                       cls = type(self) 
        
                       args = [x._array.real if isinstance(x, cls) else x for x in args] 
        
                       return getattr(ufunc, method)(*args, **kwargs) 
        
                   cls, prepared_args, prepared_kwargs = self._prepare(args, kwargs) 
        
                   out = getattr(ufunc, method)(*prepared_args, **prepared_kwargs) 
        
                   if issubclass(out.dtype.type, np.complexfloating): 
        
                       return cls._build(out) 
        
                   else: 
        
                       return out 
        
               def __array_function__(self, func, types, args, kwargs): 
        
                   "https://numpy.org/neps/nep-0018-array-function-protocol.html" 
        
                   if func.__name__ == "real": 
        
                       # interpret `real` only on the primal 
        
                       return type(self)._build(args[0]._array) 
        
                   if func.__name__ == "imag": 
        
                       # interpret `imag` only on the primal 
        
                       return type(self)._build(args[0]._array * 0) 
        
                   cls, prepared_args, prepared_kwargs = self._prepare(args, kwargs) 
        
                   out = func(*prepared_args, **prepared_kwargs) 
        
                   if issubclass(out.dtype.type, np.complexfloating): 
        
                       return cls._build(out) 
        
                   else: 
        
                       return out 
        
               def __getitem__(self, where): 
        
                   out = self._array[where] 
        
                   if isinstance(out, np.complexfloating): 
        
                       # NumPy returns a scalar; CuPy and Array API return an array 
        
                       # we return an array to keep derivatives 
        
                       return type(self)._build(np.asarray(out)) 
        
                   return out

which can be called like

>>> x = np.linspace(-20, 20, 10000)
>>> da_x = diffarray(x)
>>> da_y = np.sin(da_x) / da_x
>>> da_x
diffarray([-20.        -19.9959996 -19.9919992 ...  19.9919992  19.9959996
            20.       ],
          [1. 1. 1. ... 1. 1. 1.])
>>> da_y
diffarray([0.04564726 0.04557439 0.04550076 ... 0.04550076 0.04557439 0.04564726],
          [-0.01812174 -0.01831149 -0.01850102 ...  0.01850102  0.01831149
            0.01812174])
>>> abs(da_y.tangent - ((x*np.cos(x) - np.sin(x)) / x**2)).max()
3.9683650809863025e-10
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, da_y.tangent)
>>> plt.plot(x, (x*np.cos(x) - np.sin(x)) / x**2, ls="--")

to get

However, we'll probably need backpropagation, which will require some sort of DAG to reverse the steps in the calculation, and maybe we'd use our typetracers or dask-awkward to make that DAG. (Our typetracers can be used to make a low-level DAG, with one node per buffer operation, whereas a dask-awkward DAG is high-level. Thinking about this now, I think it has to be a low-level DAG.)

I want to ask everyone (@matthewfeickert, @kratsg, @pfackeldey, @alexander-held, @gordonwatts) whether this is sufficient, and generally what your thoughts are on requirements. I'll try answering a few questions that came up in the talk in subsequent comments.

jpivarski · 2024-12-18T15:30:07Z

jpivarski
Dec 18, 2024
Maintainer Author

The reason I didn't take this seriously 4 months ago was because the "holomorphic" requirement for the complex function seemed like a tight constraint that few functions would satisfy. What changed my mind is that regular autodiff blithely ignores not-actually-differentiable points such as $x = 0$ in ReLU, and that's okay for our purposes, so maybe the not-actually-correct answers you get with this method are okay, too.

In particular, logarithms have a branch cut where they fail to be holomorphic. What does this method do there?

>>> import numpy as np
>>> from eager_forward import *
>>> x = np.linspace(0, 100, 10000)
>>> da_x = diffarray(x)
>>> da_y = np.log(da_x)
>>> truth = 1/x
<stdin>:1: RuntimeWarning: divide by zero encountered in divide
>>> da_y.tangent
array([1.57079633e+04, 9.99866679e+01, 4.99945835e+01, ...,
       1.00020006e-02, 1.00010002e-02, 1.00000000e-02])
>>> truth
array([           inf, 9.99900000e+01, 4.99950000e+01, ...,
       1.00020006e-02, 1.00010002e-02, 1.00000000e-02])
>>> abs(da_y.tangent - truth)[1:].max()
0.0033321335475875458
>>> abs(da_y.tangent - truth)[1:].argmax()
0

Instead of reporting Inf or NaN, this technique reports a large number. Moreover, the numerical error is maximal near the singularity. Well, that's not surprising, but the errors are close to the singular point.

How about using the logarithm with chain rule?

>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> from eager_forward import *
>>> x = np.linspace(-10, 10, 10000)
>>> da_x = diffarray(x)
>>> da_y = np.log(np.sin(da_x)**2)
>>> truth = 2*np.sin(x)*np.cos(x) / np.sin(x)**2
>>> da_y.tangent
array([-3.08470206, -3.09826064, -3.11190351, ...,  3.11190351,
        3.09826064,  3.08470206])
>>> truth
array([-3.08470209, -3.09826068, -3.11190355, ...,  3.11190355,
        3.09826068,  3.08470209])
>>> plt.plot(x, da_y.tangent)
[<matplotlib.lines.Line2D object at 0x7d67681b0a50>]
>>> plt.plot(x, truth, ls="--")
[<matplotlib.lines.Line2D object at 0x7d67681b1650>]
>>> plt.show()

1 reply

gordonwatts Dec 19, 2024

Reading a bit about how this works - seems like many frameworks actually do some work to deal with this (as ReLU is quite common).

It very often does not hit exactly zero, which is where ReLU is not differentiable.
When you do hit exactly zero, it will return 0 or 1 for the slope - so has an artificial "cap"
Training with ReLU can be done in smaller batches, etc., so that most batches will report reasonable gradients and smooth out the ones that do not.

Apparently, these are all techniques that are used (according to search engines and GPT's).

jpivarski · 2024-12-19T16:46:01Z

jpivarski
Dec 19, 2024
Maintainer Author

About ReLU'(0), @pfackeldey pointed me to this: https://dl.acm.org/doi/10.5555/3540261.3540297

But from my perspective, the ReLU'(0) case is just what changed my mind about considering this technique that has known errors for functions that aren't perfectly smooth. If we can get away with saying that ReLU'(0) has a value, what else can we get away with? :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing JAX backend with custom autodiff, possibly complex-step? #3349

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Replacing JAX backend with custom autodiff, possibly complex-step? #3349

jpivarski Dec 18, 2024 Maintainer

Replies: 2 comments · 1 reply

jpivarski Dec 18, 2024 Maintainer Author

gordonwatts Dec 19, 2024

jpivarski Dec 19, 2024 Maintainer Author

jpivarski
Dec 18, 2024
Maintainer

Replies: 2 comments 1 reply

jpivarski
Dec 18, 2024
Maintainer Author

jpivarski
Dec 19, 2024
Maintainer Author