Floating point square root

sweeksBigBlue · jcfarwe · curtisan-intel · lucasphillips · commit cf0ccefa6759 · 2025-03-07T15:27:28.000-08:00
Add a base class for floating point square root. Implement a basic floating point square root module. Mantissa is variable, but limited to odd numbers from 1 to 51. No rounding implemented. Implement a fixed point square root for values with three integral bits and up to 51 fractional bits. Add a fixed point square root method. Add tests for fixed-point and floating-point square root. Implements intel#171 Co-authored-by: James Farwell <james.c.farwell@intel.com> Co-authored-by: Curtis Anderson <curtis.anderson@intel.com> Signed-off-by: Stephen Weeks <stephen.weeks@intel.com> Signed-off-by: James Farwell <james.c.farwell@intel.com> Signed-off-by: Curtis Anderson <curtis.anderson@intel.com>
diff --git a/doc/components/fixed_point.md b/doc/components/fixed_point.md
@@ -25,3 +25,7 @@ Currently, the FloatToFixed converter, when in lossy mode, is not performing any
 ## Float8ToFixed
 
 This component converts an 8-bit floating-point (FP8) representation ([FloatingPoint8E4M3Value](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint8E4M3Value-class.html) or [FloatingPoint8E5M2Value](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint8E5M2Value-class.html)) to a signed fixed-point representation. This component offers using the same hardware for both FP8 formats. Therefore, both input and output are of type [Logic](https://intel.github.io/rohd/rohd/Logic-class.html) and can be cast from/to floating point/fixed point by the producer/consumer based on the selected `mode`. Infinities and NaN's are not supported. The output width is 33bits to accommodate [FloatingPoint8E5M2Value](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint8E5M2Value-class.html) without loss.
+
+## FixedPointSqrt
+
+This component computes the square root of a 3.x fixed-point value, returning a result in the same format. The square root value is rounded to the ordered number of bits. The integral part must be 3 bits, and the fractional part may be any odd value <= 51. Even numbers of bits are currently not supported, integral bits in numbers other than 3 are currently not supported.
diff --git a/doc/components/floating_point.md b/doc/components/floating_point.md
@@ -83,6 +83,17 @@ Currently, the [FloatingPointAdderSimple](https://intel.github.io/rohd-hcl/rohd_
 
 A second [FloatingPointAdderRound](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointAdderRound-class.html) component is available which does perform rounding.  It is based on "Delay-Optimized Implementation of IEEE Floating-Point Addition", by Peter-Michael Seidel and Guy Even, using an R-path and an N-path to process far-apart exponents and use rounding and an N-path for exponents within 2 and subtraction, which is exact.  If you pass in an optional clock, a pipe stage will be added to help optimize frequency; an optional reset and enable are can control the pipe stage.
 
+## FloatingPointSqrt
+
+A very basic [FloatingPointSqrtSimple] component is available which does not perform any
+rounding. It also only operates on variable mantissas of an odd value (1,3,5,etc) but these odd mantissas can be of variable length up to 51. It takes one
+[FloatingPoint](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint-class.html) [LogicStructure](https://intel.github.io/rohd/rohd/LogicStructure-class.html) and
+performs a square root on it, returning the [FloatingPoint](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint-class.html) value on the output.
+
+Currently, the [FloatingPointSqrtSimple](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointSqrtSimple-class.html) is close in accuracy (as it has no rounding) and is not
+optimized for circuit performance, but provides the key functionalities of floating-point square root. Still, this component is a starting point for more realistic
+floating-point components that leverage the the logical [FloatingPoint](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint-class.html) and literal [FloatingPointValue](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPointValue-class.html) type abstractions.
+
 ## FloatingPointMultiplier
 
 A very basic [FloatingPointMultiplierSimple] component is available which does not perform any rounding. It takes two [FloatingPoint](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint-class.html) [LogicStructure](https://intel.github.io/rohd/rohd/LogicStructure-class.html)s and multiplies them, returning a normalized [FloatingPoint](https://intel.github.io/rohd-hcl/rohd_hcl/FloatingPoint-class.html) on the output 'product'.  
diff --git a/lib/src/arithmetic/arithmetic.dart b/lib/src/arithmetic/arithmetic.dart
@@ -6,6 +6,7 @@ export 'arithmetic_utils.dart';
 export 'carry_save_mutiplier.dart';
 export 'compound_adder.dart';
 export 'divider.dart';
+export 'fixed_sqrt.dart';
 export 'fixed_to_float.dart';
 export 'float_to_fixed.dart';
 export 'floating_point/floating_point.dart';
diff --git a/lib/src/arithmetic/fixed_sqrt.dart b/lib/src/arithmetic/fixed_sqrt.dart
@@ -0,0 +1,100 @@
+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Indentifier: BSD-3-Clause
+//
+// fixed_point_sqrt.dart
+// An abstract base class defining the API for floating-point square root.
+//
+// 2025 March 3
+// Authors: James Farwell <james.c.farwell@intel.com>, Stephen
+// Weeks <stephen.weeks@intel.com>
+
+/// An abstract API for fixed point square root.
+library;
+
+import 'package:meta/meta.dart';
+import 'package:rohd/rohd.dart';
+import 'package:rohd_hcl/rohd_hcl.dart';
+
+/// Abstract base class
+abstract class FixedPointSqrtBase extends Module {
+  /// Width of the input and output fields.
+  final int numWidth;
+
+  /// The value [a], named this way to allow for a local variable 'a'.
+  @protected
+  late final FixedPoint a;
+
+  /// getter for the computed output.
+  late final FixedPoint sqrtF = a.clone(name: 'sqrtF')..gets(output('sqrtF'));
+
+  /// Square root a fixed point number [a], returning result in [sqrtF].
+  FixedPointSqrtBase(FixedPoint a,
+      {super.name = 'fixed_point_square_root', String? definitionName})
+      : numWidth = a.width,
+        super(
+            definitionName:
+                definitionName ?? 'FixedPointSquareRoot${a.width}') {
+    this.a = a.clone(name: 'a')..gets(addInput('a', a, width: a.width));
+
+    addOutput('sqrtF', width: numWidth);
+  }
+}
+
+/// Implementation
+/// Algorithm explained here;
+/// https://projectf.io/posts/square-root-in-verilog/
+class FixedPointSqrt extends FixedPointSqrtBase {
+  /// Constructor
+  FixedPointSqrt(super.a) {
+    Logic solution =
+        FixedPoint(signed: a.signed, name: 'solution', m: a.m + 1, n: a.n + 1);
+    Logic remainder =
+        FixedPoint(signed: a.signed, name: 'remainder', m: a.m + 1, n: a.n + 1);
+    Logic subtractionValue =
+        FixedPoint(signed: a.signed, name: 'subValue', m: a.m + 1, n: a.n + 1);
+    Logic aLoc =
+        FixedPoint(signed: a.signed, name: 'aLoc', m: a.m + 1, n: a.n + 1);
+
+    solution = Const(0, width: aLoc.width);
+    remainder = Const(0, width: aLoc.width);
+    subtractionValue = Const(0, width: aLoc.width);
+    aLoc = [Const(0), a, Const(0)].swizzle();
+
+    final outputSqrt = a.clone(name: 'sqrtF');
+    output('sqrtF') <= outputSqrt;
+
+    // loop once through input value
+    for (var i = 0; i < ((numWidth + 2) >> 1); i++) {
+      // append bits from a, two at a time
+      remainder = [
+        remainder.slice(numWidth + 2 - 3, 0),
+        aLoc.slice(aLoc.width - 1 - (i * 2), aLoc.width - 2 - (i * 2))
+      ].swizzle();
+      subtractionValue =
+          [solution.slice(numWidth + 2 - 3, 0), Const(1, width: 2)].swizzle();
+      solution = [
+        solution.slice(numWidth + 2 - 2, 0),
+        subtractionValue.lte(remainder)
+      ].swizzle();
+      remainder = mux(subtractionValue.lte(remainder),
+          remainder - subtractionValue, remainder);
+    }
+
+    // loop again to finish remainder
+    for (var i = 0; i < ((numWidth + 2) >> 1) - 1; i++) {
+      // don't try to append bits from a, they are done
+      remainder =
+          [remainder.slice(numWidth + 2 - 3, 0), Const(0, width: 2)].swizzle();
+      subtractionValue =
+          [solution.slice(numWidth + 2 - 3, 0), Const(1, width: 2)].swizzle();
+      solution = [
+        solution.slice(numWidth + 2 - 2, 0),
+        subtractionValue.lte(remainder)
+      ].swizzle();
+      remainder = mux(subtractionValue.lte(remainder),
+          remainder - subtractionValue, remainder);
+    }
+    solution = solution + 1;
+    outputSqrt <= solution.slice(aLoc.width - 1, aLoc.width - a.width);
+  }
+}
diff --git a/lib/src/arithmetic/floating_point/floating_point.dart b/lib/src/arithmetic/floating_point/floating_point.dart
@@ -8,4 +8,6 @@ export 'floating_point_converter.dart';
 export 'floating_point_multiplier.dart';
 export 'floating_point_multiplier_simple.dart';
 export 'floating_point_rounding.dart';
+export 'floating_point_sqrt.dart';
+export 'floating_point_sqrt_simple.dart';
 export 'floating_point_utilities.dart';
diff --git a/lib/src/arithmetic/floating_point/floating_point_sqrt.dart b/lib/src/arithmetic/floating_point/floating_point_sqrt.dart
@@ -0,0 +1,83 @@
+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Indentifier: BSD-3-Clause
+//
+// floating_point_sqrt.dart
+// An abstract base class defining the API for floating-point square root.
+//
+// 2025 March 3
+// Authors: James Farwell <james.c.farwell@intel.com>,
+//Stephen Weeks <stephen.weeks@intel.com>,
+//Curtis Anderson <curtis.anders@intel.com>
+
+import 'package:meta/meta.dart';
+import 'package:rohd/rohd.dart';
+import 'package:rohd_hcl/rohd_hcl.dart';
+
+/// An abstract API for floating point square root.
+abstract class FloatingPointSqrt<FpType extends FloatingPoint> extends Module {
+  /// Width of the output exponent field.
+  final int exponentWidth;
+
+  /// Width of the output mantissa field.
+  final int mantissaWidth;
+
+  /// The [clk] : if a non-null clock signal is passed in, a pipestage is added
+  /// to the square root to help optimize frequency.
+  @protected
+  late final Logic? clk;
+
+  /// Optional [reset], used only if a [clk] is not null to reset the pipeline
+  /// flops.
+  @protected
+  late final Logic? reset;
+
+  /// Optional [enable], used only if a [clk] is not null to enable the pipeline
+  /// flops.
+  @protected
+  late final Logic? enable;
+
+  /// The value [a], named this way to allow for a local variable 'a'.
+  @protected
+  late final FpType a;
+
+  /// getter for the computed [FloatingPoint] output.
+  late final FloatingPoint sqrtR = (a.clone(name: 'sqrtR') as FpType)
+    ..gets(output('sqrtR'));
+
+  /// getter for the [error] output.
+  late final Logic error = Logic(name: 'error')..gets(output('error'));
+
+  /// The internal error signal to pass through
+  late final Logic errorSig;
+
+  /// Square root a floating point number [a], returning result in [sqrtR].
+  /// - [clk], [reset], [enable] are optional inputs to control a pipestage
+  /// (only inserted if [clk] is provided)
+  FloatingPointSqrt(FpType a,
+      {Logic? clk,
+      Logic? reset,
+      Logic? enable,
+      super.name = 'floating_point_square_root',
+      String? definitionName})
+      : exponentWidth = a.exponent.width,
+        mantissaWidth = a.mantissa.width,
+        super(
+            definitionName: definitionName ??
+                'FloatingPointSquareRoot_E${a.exponent.width}'
+                    'M${a.mantissa.width}') {
+    this.clk = (clk != null) ? addInput('clk', clk) : null;
+    this.reset = (reset != null) ? addInput('reset', reset) : null;
+    this.enable = (enable != null) ? addInput('enable', enable) : null;
+    this.a = (a.clone(name: 'a') as FpType)
+      ..gets(addInput('a', a, width: a.width));
+
+    addOutput('sqrtR', width: exponentWidth + mantissaWidth + 1);
+    errorSig = Logic(name: 'error');
+    addOutput('error');
+    output('error') <= errorSig;
+  }
+
+  /// Pipelining helper that uses the context for signals clk/enable/reset
+  Logic localFlop(Logic input) =>
+      condFlop(clk, input, en: enable, reset: reset);
+}
diff --git a/lib/src/arithmetic/floating_point/floating_point_sqrt_simple.dart b/lib/src/arithmetic/floating_point/floating_point_sqrt_simple.dart
@@ -0,0 +1,102 @@
+// Copyright (C) 2025 Intel Corporation
+// SPDX-License-Indentifier: BSD-3-Clause
+//
+// floating_point_sqrt.dart
+// An abstract base class defining the API for floating-point square root.
+//
+// 2025 March 4
+// Authors: James Farwell <james.c.farwell@intel.com>,
+//Stephen Weeks <stephen.weeks@intel.com>,
+//Curtis Anderson <curtis.anders@intel.com>
+
+import 'package:rohd/rohd.dart';
+import 'package:rohd_hcl/rohd_hcl.dart';
+
+/// An square root module for FloatingPoint values
+class FloatingPointSqrtSimple<FpType extends FloatingPoint>
+    extends FloatingPointSqrt<FpType> {
+  /// Square root one floating point number [a], returning results
+  /// [sqrtR] and [error]
+  FloatingPointSqrtSimple(super.a,
+      {super.clk,
+      super.reset,
+      super.enable,
+      super.name = 'floatingpoint_square_root_simple'})
+      : super(
+            definitionName: 'FloatingPointSquareRootSimple_'
+                'E${a.exponent.width}M${a.mantissa.width}') {
+    final outputSqrt = FloatingPoint(
+        exponentWidth: exponentWidth,
+        mantissaWidth: mantissaWidth,
+        name: 'sqrtR');
+    output('sqrtR') <= outputSqrt;
+
+    // check to see if we do sqrt at all or just return a
+    final isInf = a.isAnInfinity.named('isInf');
+    final isNaN = a.isNaN.named('isNan');
+    final isZero = a.isAZero.named('isZero');
+    final enableSqrt = ~((isInf | isNaN | isZero) | a.sign).named('enableSqrt');
+
+    // debias the exponent
+    final deBiasAmt = (1 << a.exponent.width - 1) - 1;
+
+    // deBias math
+    final deBiasExp = a.exponent - deBiasAmt;
+
+    // shift exponent
+    final shiftedExp =
+        [deBiasExp[-1], deBiasExp.slice(a.exponent.width - 1, 1)].swizzle();
+
+    // check if exponent was odd
+    final isExpOdd = deBiasExp[0];
+
+    // use fixed sqrt unit
+    final aFixed = FixedPoint(signed: false, m: 3, n: a.mantissa.width);
+    aFixed <= [Const(1, width: 3), a.mantissa.getRange(0)].swizzle();
+
+    // mux if we shift left by 1 if exponent was odd
+    final aFixedAdj = aFixed.clone()
+      ..gets(mux(isExpOdd, [aFixed.slice(-2, 0), Const(0)].swizzle(), aFixed)
+          .named('oddMantissaMux'));
+
+    // mux to choose if we do square root or not
+    final fixedSqrt = aFixedAdj.clone()
+      ..gets(mux(enableSqrt, FixedPointSqrt(aFixedAdj).sqrtF, aFixedAdj)
+          .named('sqrtMux'));
+
+    // convert back to floating point representation
+    final fpSqrt = FixedToFloat(fixedSqrt,
+        exponentWidth: a.exponent.width, mantissaWidth: a.mantissa.width);
+
+    // final calculation results
+    Combinational([
+      errorSig < Const(0),
+      If.block([
+        Iff(isInf & ~a.sign, [
+          outputSqrt < outputSqrt.inf(),
+        ]),
+        ElseIf(isInf & a.sign, [
+          outputSqrt < outputSqrt.inf(negative: true),
+          errorSig < Const(1),
+        ]),
+        ElseIf(isNaN, [
+          outputSqrt < outputSqrt.nan,
+        ]),
+        ElseIf(isZero, [
+          outputSqrt.sign < a.sign,
+          outputSqrt.exponent < a.exponent,
+          outputSqrt.mantissa < a.mantissa,
+        ]),
+        ElseIf(a.sign, [
+          outputSqrt < outputSqrt.nan,
+          errorSig < Const(1),
+        ]),
+        Else([
+          outputSqrt.sign < a.sign,
+          outputSqrt.exponent < (shiftedExp + deBiasAmt),
+          outputSqrt.mantissa < fpSqrt.float.mantissa,
+        ])
+      ])
+    ]);
+  }
+}
diff --git a/lib/src/arithmetic/signals/fixed_point_logic.dart b/lib/src/arithmetic/signals/fixed_point_logic.dart
@@ -123,6 +123,13 @@ class FixedPoint extends Logic {
     }
   }
 
+  /// Multiply
+  Logic fpMultiply(dynamic other) {
+    _verifyCompatible(other);
+    final product = Multiply(this, other).out;
+    return FixedPoint.of(product, signed: false, m: product.width - n, n: n);
+  }
+
   /// Greater-than.
   @override
   Logic operator >(dynamic other) => gt(other);
@@ -131,6 +138,10 @@ class FixedPoint extends Logic {
   @override
   Logic operator >=(dynamic other) => gte(other);
 
+  /// multiply
+  @override
+  Logic operator *(dynamic other) => fpMultiply(other);
+
   @override
   Logic eq(dynamic other) {
     _verifyCompatible(other);
diff --git a/test/arithmetic/fixed_sqrt_test.dart b/test/arithmetic/fixed_sqrt_test.dart
diff --git a/test/arithmetic/floating_point/floating_point_sqrt_test.dart b/test/arithmetic/floating_point/floating_point_sqrt_test.dart