Numerically stable `log_sigmoid` #1548

laggui · 2024-03-28T13:58:41Z

While making progress on a fine-tuning classification example I stumbled upon an issue with our log_sigmoid implementation which returned -inf for large negative values.

I first attempted to use this common log-sum-exp trick

log(sigmoid(x)) = log(1/(1 + exp(-x)))
                = log(1) - log(1 + exp(-x))
                = -log(1 + exp(-x))
                = x - log(1 + exp(x))

Which resulted in this implementation:

if x >= 0 {
    -log(1 + exp(-x))  // ok for positive values
} else {
    x - log(1 + exp(x)) // ok for negative values
}

That worked on wgpu but gave me NaNs for large values near the min and max on ndarray. That's when I stumbled upon the pytorch implementation that goes a step further, as implemented in this PR.

Checklist

Confirmed that run-checks all script has been executed.

Changes

Changed our log_sigmoid implementation to be numerically stable for large values.

Testing

Added unit tests for log_sigmoid.

louisfd

LGTM

codecov · 2024-03-28T14:19:37Z

Codecov Report

Attention: Patch coverage is 98.03922% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 86.36%. Comparing base (3a1d520) to head (bcd599e).

Files	Patch %	Lines
crates/burn-tensor/src/tensor/activation/base.rs	96.29%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1548   +/-   ##
=======================================
  Coverage   86.35%   86.36%           
=======================================
  Files         682      683    +1     
  Lines       77849    77898   +49     
=======================================
+ Hits        67230    67280   +50     
+ Misses      10619    10618    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nathanielsimard · 2024-03-28T14:32:46Z

crates/burn-tensor/src/tensor/activation/base.rs

 pub fn log_sigmoid<const D: usize, B: Backend>(tensor: Tensor<B, D>) -> Tensor<B, D> {
+    /// To avoid overflow, we use the log-sum-exp trick.
+    ///
+    /// ```ignore
+    /// log(sigmoid(x)) = log(1/(1 + exp(-x)))
+    ///                 = log(1) - log(1 + exp(-x))
+    ///                 = -log(1 + exp(-x))
+    ///                 = -log(exp(0) + exp(-x))
+    /// ```
+    /// The `exp(t)` of even a moderate-magnitude positive number can be astronomically huge, so we
+    /// subtract the `max(t, 0)` of each value (where `t = -x` in this case). This results in the
+    /// following equivalence:
+    /// ```ignore
+    /// log(sigmoid(x)) = -(max(-x, 0) + log(exp(-max(-x, 0)) + exp(-x - max(-x, 0))))
+    /// ```
+    ///
+    /// This extends the range of values for which we obtain accurate results.
+    fn numerically_stable_log_sigmoid<const D: usize, B: Backend>(x: Tensor<B, D>) -> Tensor<B, D> {
+        // max(-x, 0)
+        let max_elem = x.clone().neg().max_pair(x.zeros_like());
+
+        // log(exp(-max(-x, 0)) + exp(-x - max(-x, 0)))
+        let z = (max_elem.clone().neg().exp() + (x.neg() - max_elem.clone()).exp()).log();
+
+        z.neg() - max_elem
+    }
    match B::FloatElem::precision() {
        Precision::Half => {
            let tensor_full = tensor.into_full_precision();
-            let tensor_tmp = tensor_full.neg().exp().add_scalar(1.0_f32).log().neg();
+            let tensor_tmp = numerically_stable_log_sigmoid(tensor_full);
            Tensor::from_full_precision(tensor_tmp)
        }
-        _ => tensor.neg().exp().add_scalar(1.0_f32).log().neg(),
+        _ => numerically_stable_log_sigmoid(tensor),
    }
 }


I think a descent speedup for backends that don't implement fusion would be to move log_sigmoid and sigmoid into burn_tensor::ops::activation with the default implementation provided. We could then override those activations in backends that don't support fusion such as tch and candle.

Agreed, I just tackled the scope of the current log_sigmoid implementation but that definitely came to mind.

Btw sigmoid is already in ActivationOps just not log_sigmoid yet.

Should I tackle this in a new PR or expand this one?

antimora

LGTM

Numerically stable log_sigmoid

bcd599e

laggui requested review from antimora and nathanielsimard March 28, 2024 13:58

louisfd approved these changes Mar 28, 2024

View reviewed changes

nathanielsimard reviewed Mar 28, 2024

View reviewed changes

antimora approved these changes Mar 28, 2024

View reviewed changes

laggui merged commit b8fc3f1 into main Mar 28, 2024
15 checks passed

laggui deleted the fix/log-sigmoid branch March 28, 2024 15:54

laggui mentioned this pull request Mar 28, 2024

Move log_sigmoid to ActivationOps #1549

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerically stable `log_sigmoid` #1548

Numerically stable `log_sigmoid` #1548

laggui commented Mar 28, 2024

louisfd left a comment

codecov bot commented Mar 28, 2024

nathanielsimard Mar 28, 2024

laggui Mar 28, 2024

antimora left a comment

Numerically stable log_sigmoid #1548

Numerically stable log_sigmoid #1548

Conversation

laggui commented Mar 28, 2024

Checklist

Changes

Testing

louisfd left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 28, 2024

Codecov Report

nathanielsimard Mar 28, 2024

Choose a reason for hiding this comment

laggui Mar 28, 2024

Choose a reason for hiding this comment

antimora left a comment

Choose a reason for hiding this comment

Numerically stable `log_sigmoid` #1548

Numerically stable `log_sigmoid` #1548