Applications

This section provides a list of examples demonstrating the use autodiff in common scenarios. The examples assume that you imported the package as follows:

from autodiff.array import Function, var, d

Control flow

Function calls

# This function can take any combination of
# literals and variables (both scalar and array).
def logistic(x, k):
    return 1 / (1 + exp(-k * x))  # scalars broadcast to arrays

k = var(4)
y = var(logistic(0, k))
print("y =", y())  # y = 0.5

x = var(np.array([-1, 0, 1]))
y = var(logistic(x, 4))
print("y =", y())  # y = [0.01798621 0.5        0.98201379]

k = var(np.array([4, 4, 4]))
y = var(logistic(x, k))
print("y =", y())  # y = [0.01798621 0.5        0.98201379]

Loops

initial = var(0)
state = initial
for i in range(10):
    state = var(state + 1)   # evaluate to a NEW variable

print("state =", state())    # state = 10.0

f = Function(state)
f.pull_gradient_at(state)
print("∂state/∂initial =", d(initial)) # ∂state/∂initial = 1.0

For more details on when to use var, see Variables vs expressions.

Branches

# Caution: if statements are not differentiable
def bad_relu(x):
    return x if x > 0 else 0 # cannot differentiate this

def good_relu(x):
    return maximum(x)

x = var(np.array([-1, 0, 1]))
y = var(good_relu(x))
print("y =", y())            # y = [0. 0. 1.]

You can, however, use conditionals to decide which expression to evaluate:

# Caution: if statements cannot depend on variables
x = var(1)
if x() > 0:        # true
    y = var(x)
else:
    y = var(0)     # never evaluated!
# from now on y = var(x)

f = Function(y)
x.set(-1)
f.evaluate()
print("y =", y())  # y = -1.0

Computing the Jacobian matrix

Given a function $f \colon \mathbb{R}^m \to \mathbb{R}^n$, $x \mapsto y$, the Jacobian matrix $J_f(x) \in \mathbb{R}^{n \times m}$ is defined as

$$ J_f(x) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} \ \ldots\ \frac{\partial f_1}{\partial x_m} \\ \vdots \\ \frac{\partial f_n}{\partial x_1} \ \ldots\ \frac{\partial f_n}{\partial x_m} \end{bmatrix} . $$

x = var(np.array([1, 2, 3]))
m = var(np.array([[1, 2, 3], [4, 5, 6]]))
y = var(m @ x)            # matrix-vector product

f = Function(y)
f.pull_gradient_at(y)
print("∂f/∂x =\n", d(x))  # ∂f/∂x =
                          #  [[1. 2. 3.]
                          #   [4. 5. 6.]]
print("∂f/∂m =\n", d(m))  # ∂f/∂m =
                          #  [[1. 0. 2. 0. 3. 0.]
                          #   [0. 1. 0. 2. 0. 3.]]

Note

During differentiation, matrices are flattened column-wise. Therefore, d(m) returns a $2 \times 6$ Jacobian matrix instead of a $2 \times 2 \times 3$ tensor. For more details, see Matrix-valued expressions.

For more details on the pull_gradient_at method, see Reverse-mode differentiation (aka backpropagation).

Gradient computation

Given a scalar function $f \colon \mathbb{R}^m \to \mathbb{R}$, $x \mapsto y$, the gradient $\nabla f(x) \in \mathbb{R}^m$ is defined as

$$ \nabla f = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots \right] . $$

x = var(np.array([1, 2, 3]))
y = var(norm(x))       # L²-norm

f = Function(y)        # f : R³ → R, x ↦ y = ||x||
f.pull_gradient_at(y)
print("∇f =", d(x))    # ∇f = [[0.26726124 0.53452248 0.80178373]]

Note that the gradient of a scalar function is a $1 \times n$ Jacobian matrix (aka "row vector").

For more details, see Reverse-mode differentiation (aka backpropagation).

Element-wise gradient computation

x = var(np.array([1, 2, 3]))
y = var(np.array([4, 5, 6]))
z = var(x * y)          # element-wise multiplication

f = Function(z)
z.set_derivative(np.ones((1, 3))) # element-wise identity
f.pull_gradient()
print("∇_x f =", d(x))  # ∇_x f = [[4. 5. 6.]]
print("∇_y f =", d(y))  # ∇_y f = [[1. 2. 3.]]

If you used pull_gradient_at(z) (as in the previous example), you would instead get $3 \times 3$ diagonal matrices for $\nabla_x f$ and $\nabla_y f$ (with the same values).

Jacobian-vector products

Given a function $f \colon \mathbb{R}^m \to \mathbb{R}^n$, $x \mapsto y$. If you are only interested in the product of the Jacobian matrix $J_f(x)$ with a given direction (tangent vector) $\delta x$,

$$ \delta y = J_f \cdot \delta x \ , $$

then the following code using the push_tangent method is much more efficient than computing the full Jacobian matrix first and then multiplying it with $\delta x$.

# Directional derivative (Jacobian-vector product)
x = var(np.array([1, 2, 3]))
m = np.array([[1, 2, 3], [4, 5, 6]])
y = var(matmul(m, x))     # matrix-vector product

f = Function(y)           # f : R³ → R², x ↦ y = m @ x
δx = np.array([1, 1, 1])  # direction vector
x.set_derivative(δx)
f.push_tangent()
print("δy =\n", d(y))     # δy =
                          # [[ 6.]
                          #  [15.]]

For more details, see Forward-mode differentiation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

applications.md

applications.md

Applications

Control flow

Function calls

Loops

Branches

Computing the Jacobian matrix

Gradient computation

Element-wise gradient computation

Jacobian-vector products

Files

applications.md

Latest commit

History

applications.md

File metadata and controls

Applications

Control flow

Function calls

Loops

Branches

Computing the Jacobian matrix

Gradient computation

Element-wise gradient computation

Jacobian-vector products