Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add binary remainder for tensor #2427

Merged
merged 25 commits into from
Nov 11, 2024
Merged

Conversation

med1844
Copy link
Contributor

@med1844 med1844 commented Oct 27, 2024

Pull Request Template

Checklist

  • Confirmed that run-checks all script has been executed.
  • Made sure the book is up to date with changes in this PR.
    • Note: The burn book already documents remainder without specifying it's only remainder_scalar. This PR implements the expected behavior, so no book changes needed.

Related Issues/PRs

Closes #1510

Changes

Added binary remainder to existing backends, including tests & autodiff.

The operation follows PyTorch's behavior where the remainder has the same sign as the divisor (e.g., 2.0 % -1.5 = -1.0), rather than Rust's default behavior (2.0 % -1.5 = 0.5).

Testing

Added tests for burn-candle, burn-ndarray, burn-autodiff, and burn-tensor.

Questions

Copy link

codecov bot commented Oct 27, 2024

Codecov Report

Attention: Patch coverage is 82.06785% with 111 lines in your changes missing coverage. Please review.

Project coverage is 82.79%. Comparing base (69de0ef) to head (300d392).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/burn-fusion/src/ops/int.rs 0.00% 26 Missing ⚠️
crates/burn-router/src/ops/op_int.rs 0.00% 18 Missing ⚠️
crates/burn-candle/src/ops/tensor.rs 0.00% 14 Missing ⚠️
crates/burn-candle/src/ops/int_tensor.rs 0.00% 12 Missing ⚠️
crates/burn-tch/src/ops/int_tensor.rs 0.00% 12 Missing ⚠️
crates/burn-tch/src/ops/base.rs 0.00% 9 Missing ⚠️
crates/burn-tensor/src/tensor/api/numeric.rs 80.00% 4 Missing ⚠️
crates/burn-autodiff/src/ops/int_tensor.rs 0.00% 3 Missing ⚠️
crates/burn-jit/src/ops/int_ops.rs 0.00% 3 Missing ⚠️
crates/burn-ndarray/src/ops/int_tensor.rs 0.00% 3 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2427      +/-   ##
==========================================
- Coverage   82.79%   82.79%   -0.01%     
==========================================
  Files         809      810       +1     
  Lines      104191   104804     +613     
==========================================
+ Hits        86270    86773     +503     
- Misses      17921    18031     +110     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the operation 🙂

Implementation looks good overall, just a few comments below.

To answer your questions...

How to display Instruction

I don't believe this is required for your PR. All of the compiler stuff mentioned from the linked PR has been moved to cubecl anyway.

Quantization tests
I tried to add tests for quantized tensors (also for rounding functions, see #2372) but couldn't figure out how to generate quantized tensors exactly the same as existing tests, please shed some light on this!

The values were manually calculated to represent the float tensors with the correct quantized values and parameters. In your tests, you can reuse the existing values for the inputs and, as with the other tests, check for equality on the dequantized values. That way the tests are easy to validate.

Take the exp() test for example:

#[test]
fn should_support_exp_ops() {
    // Quantized [[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]
    // NOTE: we use affine quantization to reduce quantization errors since `exp()` amplifies the error
    let data = TensorData::quantized(
        vec![-128i8, -77, -26, 25, 76, 127],
        [2, 3],
        QuantizationStrategy::PerTensorAffineInt8(AffineQuantization::init(0.019607844, -128)),
    );
    let tensor = TestTensor::<2>::from_data(data, &Default::default());

    let output = tensor.exp();
    let expected = TensorData::from([[1.0, 2.71830, 7.3891], [20.0855, 54.5981, 148.4132]]);

    // Precision 1 to approximate de/quantization errors
    output
        .dequantize()
        .into_data()
        .assert_approx_eq(&expected, 1);
}

Update on segfault when running run-checks.sh std
Turns out it's caused by bridge::byte::tests::should_support_dual_byte_bridge when trying to create a tensor on device2, which panics: cubecl-wgpu/src/runtime.rs:297:17: No adapter found for graphics API AutoGraphicsApi. Not sure how to solve this issue.

Ahhh this might just be due to your setup. For the multi-backend we use ndarray and wgpu as test backends, but wgpu might not be able to detect any device (possibly missing drivers) for your machine. Could be fixed if you install vulkan drivers, but otherwise you could ignore the failure locally and let it run on CI.

crates/burn-tensor/src/tensor/api/numeric.rs Outdated Show resolved Hide resolved
crates/burn-tensor/src/tensor/api/numeric.rs Outdated Show resolved Hide resolved
crates/burn-tensor/src/tests/ops/remainder.rs Outdated Show resolved Hide resolved
@med1844
Copy link
Contributor Author

med1844 commented Oct 31, 2024

Thank you for the extra info on the quantization part! I have been able to produce the exact same output using python:

import numpy as np
import numpy.typing as npt
from dataclasses import dataclass


@dataclass
class QuantizedTensor:
    quantized_tensor: npt.NDArray[np.int8]
    scale: float
    zero_point: int

    @classmethod
    def from_tensor(
        cls, in_tensor: npt.NDArray[np.float32] | list[float | int]
    ) -> "QuantizedTensor":
        dtype = np.int8
        tensor = np.array(in_tensor, dtype=np.float32)
        qmin, qmax = np.iinfo(dtype).min, np.iinfo(dtype).max
        vmin, vmax = tensor.min(), tensor.max()

        scale = (vmax - vmin) / (qmax - qmin)
        v_zero_point = -((vmax - vmin) / 2 + vmin)
        q_zero_point = (qmax - qmin) / 2 + qmin
        zero_point = round(v_zero_point / scale + q_zero_point)

        if not qmin <= zero_point <= qmax:
            zero_point = min(qmax, max(qmin, zero_point))
            oob_tensor = np.round(tensor / scale + zero_point)
            omin, omax = min(oob_tensor.min(), qmin), max(oob_tensor.max(), qmax)
            scale /= (qmax - qmin) / (omax - omin)

        quantized_values = (
            np.round(tensor / scale + zero_point).clip(qmin, qmax).astype(dtype)
        )

        return cls(quantized_values, scale, int(zero_point))

    def to_tensor(self) -> npt.NDArray[np.float32]:
        return (self.quantized_tensor.astype(np.float32) - self.zero_point) * self.scale


if __name__ == "__main__":
    quant = QuantizedTensor.from_tensor(list(map(float, range(6))))
    print(quant)
    print(quant.to_tensor())

Which prints:

QuantizedTensor(quantized_tensor=array([-128,  -77,  -26,   25,   76,  127], dtype=int8), scale=0.0196078431372549, zero_point=-128.0)
[0. 1. 2. 3. 4. 5.]

I will use this to generate tests. This might also help other contributors in case in the future they are also adding new operators.

Edit: turns out zero_point is int. Modified the code to round zero_point
Edit 2: turns out zero_point is i8. Modified the code to recalculate scale.

@laggui
Copy link
Member

laggui commented Nov 1, 2024

If you keep the same quantized input values as one that is already generated for an existing test, you can assume that the dequantized output values should be approximately equal to the floating point results for the operation. So the reference can be kept as float and you don't necessarily need to dig into quantization, but it is a good learning experience 😄

@med1844
Copy link
Contributor Author

med1844 commented Nov 4, 2024

I have added quantized tests for new operators (remainder, roundings). Maybe they should be in a separate PR, idk.

I have also looked at previous CI checks. Seems that remainder is not working properly on Mac, it returns the input when it should return pure 0. I don't own any Mac thus have no idea how to debug this. Any help on this would be great!

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just missing a few things, mostly some stuff in the added quantization tests.

Regarding the macos CI failing.. I'm not quite sure why honestly 🤔 it's failing for the candle implementation of the op but it looks good. A bit weird, though I also do not own a mac to check this out further.

crates/burn-ndarray/src/ops/base.rs Outdated Show resolved Hide resolved
crates/burn-tensor/src/tests/quantization/ops/remainder.rs Outdated Show resolved Hide resolved
crates/burn-tensor/src/tests/quantization/ops/remainder.rs Outdated Show resolved Hide resolved
crates/burn-tensor/src/tests/quantization/ops/remainder.rs Outdated Show resolved Hide resolved
@med1844
Copy link
Contributor Author

med1844 commented Nov 8, 2024

I have updated the code according to the comments. Please review again, thank you.
For the MacOS CI failure, maybe we should find someone who has Apple hardware to replicate and figure out what happens; or we could just add to the book that the remainder operator may not work properly on Mac.

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing 🙏

Regarding the macos test failure, it appears to always fail with candle so perhaps we could simply disable the test there specifically.

Once this is addressed it'll be good to merge!

crates/burn-candle/src/lib.rs Outdated Show resolved Hide resolved
crates/burn-candle/src/lib.rs Outdated Show resolved Hide resolved
@med1844
Copy link
Contributor Author

med1844 commented Nov 9, 2024

I have commented out both of them and CI is finally happy. Please review, thank you!

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🎉

@laggui laggui merged commit 2d874ab into tracel-ai:main Nov 11, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement modulo or remainder operation.
2 participants