-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary remainder
for tensor
#2427
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2427 +/- ##
==========================================
- Coverage 82.79% 82.79% -0.01%
==========================================
Files 809 810 +1
Lines 104191 104804 +613
==========================================
+ Hits 86270 86773 +503
- Misses 17921 18031 +110 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the operation 🙂
Implementation looks good overall, just a few comments below.
To answer your questions...
How to display Instruction
I don't believe this is required for your PR. All of the compiler stuff mentioned from the linked PR has been moved to cubecl anyway.
Quantization tests
I tried to add tests for quantized tensors (also for rounding functions, see #2372) but couldn't figure out how to generate quantized tensors exactly the same as existing tests, please shed some light on this!
The values were manually calculated to represent the float tensors with the correct quantized values and parameters. In your tests, you can reuse the existing values for the inputs and, as with the other tests, check for equality on the dequantized values. That way the tests are easy to validate.
Take the exp()
test for example:
#[test]
fn should_support_exp_ops() {
// Quantized [[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]
// NOTE: we use affine quantization to reduce quantization errors since `exp()` amplifies the error
let data = TensorData::quantized(
vec![-128i8, -77, -26, 25, 76, 127],
[2, 3],
QuantizationStrategy::PerTensorAffineInt8(AffineQuantization::init(0.019607844, -128)),
);
let tensor = TestTensor::<2>::from_data(data, &Default::default());
let output = tensor.exp();
let expected = TensorData::from([[1.0, 2.71830, 7.3891], [20.0855, 54.5981, 148.4132]]);
// Precision 1 to approximate de/quantization errors
output
.dequantize()
.into_data()
.assert_approx_eq(&expected, 1);
}
Update on segfault when running run-checks.sh std
Turns out it's caused by bridge::byte::tests::should_support_dual_byte_bridge when trying to create a tensor on device2, which panics: cubecl-wgpu/src/runtime.rs:297:17: No adapter found for graphics API AutoGraphicsApi. Not sure how to solve this issue.
Ahhh this might just be due to your setup. For the multi-backend we use ndarray and wgpu as test backends, but wgpu might not be able to detect any device (possibly missing drivers) for your machine. Could be fixed if you install vulkan drivers, but otherwise you could ignore the failure locally and let it run on CI.
Thank you for the extra info on the quantization part! I have been able to produce the exact same output using python: import numpy as np
import numpy.typing as npt
from dataclasses import dataclass
@dataclass
class QuantizedTensor:
quantized_tensor: npt.NDArray[np.int8]
scale: float
zero_point: int
@classmethod
def from_tensor(
cls, in_tensor: npt.NDArray[np.float32] | list[float | int]
) -> "QuantizedTensor":
dtype = np.int8
tensor = np.array(in_tensor, dtype=np.float32)
qmin, qmax = np.iinfo(dtype).min, np.iinfo(dtype).max
vmin, vmax = tensor.min(), tensor.max()
scale = (vmax - vmin) / (qmax - qmin)
v_zero_point = -((vmax - vmin) / 2 + vmin)
q_zero_point = (qmax - qmin) / 2 + qmin
zero_point = round(v_zero_point / scale + q_zero_point)
if not qmin <= zero_point <= qmax:
zero_point = min(qmax, max(qmin, zero_point))
oob_tensor = np.round(tensor / scale + zero_point)
omin, omax = min(oob_tensor.min(), qmin), max(oob_tensor.max(), qmax)
scale /= (qmax - qmin) / (omax - omin)
quantized_values = (
np.round(tensor / scale + zero_point).clip(qmin, qmax).astype(dtype)
)
return cls(quantized_values, scale, int(zero_point))
def to_tensor(self) -> npt.NDArray[np.float32]:
return (self.quantized_tensor.astype(np.float32) - self.zero_point) * self.scale
if __name__ == "__main__":
quant = QuantizedTensor.from_tensor(list(map(float, range(6))))
print(quant)
print(quant.to_tensor()) Which prints:
I will use this to generate tests. This might also help other contributors in case in the future they are also adding new operators. Edit: turns out zero_point is int. Modified the code to round zero_point |
If you keep the same quantized input values as one that is already generated for an existing test, you can assume that the dequantized output values should be approximately equal to the floating point results for the operation. So the reference can be kept as float and you don't necessarily need to dig into quantization, but it is a good learning experience 😄 |
I have added quantized tests for new operators ( I have also looked at previous CI checks. Seems that remainder is not working properly on Mac, it returns the input when it should return pure 0. I don't own any Mac thus have no idea how to debug this. Any help on this would be great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just missing a few things, mostly some stuff in the added quantization tests.
Regarding the macos CI failing.. I'm not quite sure why honestly 🤔 it's failing for the candle implementation of the op but it looks good. A bit weird, though I also do not own a mac to check this out further.
I have updated the code according to the comments. Please review again, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for addressing 🙏
Regarding the macos test failure, it appears to always fail with candle so perhaps we could simply disable the test there specifically.
Once this is addressed it'll be good to merge!
I have commented out both of them and CI is finally happy. Please review, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 🎉
Pull Request Template
Checklist
run-checks all
script has been executed.remainder_scalar
. This PR implements the expected behavior, so no book changes needed.Related Issues/PRs
Closes #1510
Changes
Added binary
remainder
to existing backends, including tests & autodiff.The operation follows PyTorch's behavior where the remainder has the same sign as the divisor (e.g., 2.0 % -1.5 = -1.0), rather than Rust's default behavior (2.0 % -1.5 = 0.5).
Testing
Added tests for
burn-candle
,burn-ndarray
,burn-autodiff
, andburn-tensor
.Questions
Instruction
https://github.com/tracel-ai/burn/pull/1597/files#diff-f71eb223e26e9a2944748c2e095167cb4ed9e7fcb6ccfda110c6c3f3869eab99R258-R259
Currently,
remainder_scalar
is implemented as((lhs % rhs) + rhs) % rhs
, whereasremainder
is implemented aslhs - (lhs / rhs).floor() * rhs
. I'm not sure which to display.I tried to add tests for quantized tensors (also for rounding functions, see Add
round
,floor
,ceil
for float tensor #2372) but couldn't figure out how to generate quantized tensors exactly the same as existing tests, please shed some light on this!run-checks.sh std
Turns out it's caused by
bridge::byte::tests::should_support_dual_byte_bridge
when trying to create a tensor on device2, which panics:cubecl-wgpu/src/runtime.rs:297:17: No adapter found for graphics API AutoGraphicsApi
. Not sure how to solve this issue.