You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
proposal to left-justify the mantissa making vf128 more consistent with IEEE 754 floating-point. this is a significant change that will more closely align the vf128 encoding with the IEEE 754 encoding.
mantissa justification background
ASN.1 Real format uses a right-justified mantissa with the fraction point on the right and explicit leading one.
IEEE 754 floating-point format uses a left-justified mantissa with the fraction point on the left and implicit leading one.
vf128 variable-length floating-point format currently uses a right-justified mantissa and explicit leading one.
the vf128 variable-length floating-point format presently uses a right-justified mantissa and explicit leading one. the format was created as an evolution of the ASN.1 Real format with the intention to create a more succinct representation that more closely maps to the IEEE 754 floating-point format. the ASN.1 Real format was modelled first, so at least initially, it seemed natural to adopt its right-justified exponent with explicit leading one convention. the primary differences with ASN.1 Real format is the addition of the float7 header byte with an external bit to compact the ASN.1 Real encoding for several inlined values (+/-0.0, +/-0.5, +/-1.0, +/-2.0, +/-NaN, +/-Infinity, ...) as well as supporting out-of-line exponent and mantissa values.
compact normals
the root of this issue is the representation of fixed point values using a succinct encoding with implicit exponent.
values in the range -0.99999.. to +0.99999.. are normal values encoded with an out-of-line mantissa and zero-length exponent and they allow a one byte saving for an implied exponent. it made sense to use this encoding for succinct representation of fixed-point values in the range -0.99999.. to +0.99999.. because the one-byte saving results in many more single-precision values that are encodable in four bytes, and double-precision values that are encodable in eight bytes.
the text currently reads:
### normal values with unary exponent
Normal values in the range -0.99999.. to +0.99999.. with a binary exponent
from e-1 to e-8 inclusive are encoded with zero in the exponent field, and the
exponent is encoded as a unary prefix of trailing zeros in the mantissa field.
this is a relatively complex issue so here is some background on the current encoding and how it came about.
normal values encoded with an out-of-line mantissa and zero-length exponent
reconstructing a left-justified mantissa with implicit leading one from a right-justified mantissa with an explicit leading one requires a count leading zeros to realign the point from the right of the least significant bit to the right of the explicit leading one.
trailing unary coded suffix work-around for right-justified normal values
this realignment is done to all out-of-line mantissa values to keep the code simple and consistent. the problem is that the realignment of the fraction based on the explicit leading one loses information about leading zeros that would otherwise be present in a fixed point fraction. for this reason, a trailing unary coded suffix was added as a work-around to recover the leading zeros count for these quasi fixed-point fractions, or more specifically normal values in the range -0.99999.. to +0.99999..
left-justified mantissa with implied leading zero or one
after analysis, it becomes evident that a mantissa with a left-justified point and implied leading zero or one is a more natural representation for IEEE 754 floating-point values using byte quantisation because when the mantissa retains its left justification, it retains the leading zero count which is otherwise lost and necessitated the explicit leading one that is used for realignment.
with a left-justified mantissa, the position of the point remains the same in its encoded form so it is no longer necessary to append a unary coded suffix to remember the leading zeros lost during realignment. although note that with a left-justified mantissa, it makes sense to use an implied leading zero for the special case of succinct coding of fixed-point fractions with a zero-length exponent i.e. values in the range -0.99999.. to +0.99999, similarly to what is done for subnormals.
this requires a special case to adjust fixed point values back to the implied leading one needed by IEEE 754 floating-point, but it simplifies the case for all other mantissa values as it is no longer necessary to count leading zeros during decoding if the exponent is present. the shift offset is based purely on the width in bytes of the mantissa.
this example gives an overview of the location of the point for a 12-bit fraction starting with a one:
12-bit fraction with right-justified mantissa and explicit leading one
[byte 1] [byte 2]
____1NNN NNNNNNNN.
12-bit fraction with left-justified mantissa and implicit leading zero
[byte 1] [byte 2]
[0].NNNNNNNN NNNN____
12-bit fraction with left-justified mantissa and implicit leading one
[byte 1] [byte 2]
[1].NNNNNNNN NNN_____
note the renormalization of the mantissa and exponent for fractions whose first digit is not a one is not shown. the case where a fixed point compressed normal is reformated to IEEE 754 normal form with implicit leading one requires a count leading zeros and adjustment to the exponent.
proposed convention
if we had reasoned about the encoding of fixed point normals at the outset, we would have started with a left-justified mantissa. i.e. the result of this analysis is the proposal is to left justify all out-of-line mantissa values:
for out-of-line mantissa and non-zero length exponent use left-justified mantissa with implied leading one.
for out-of-line mantissa and zero-length exponent use left-justified mantissa with implied leading zero.
it would be possible to only change normal values encoded with an out-of-line mantissa and zero-length exponent to use a left-justified fixed point fraction with implied leading zero, and remove the unary coded suffix special case, as that is the use case that prompted this analysis. but changing only the zero-length exponent encoding introduces more complexity overall because some parts of the format would have a left-justified fraction point with implied leading zero, and other parts would have a right-justified fraction point with an explicit leading one. it is simpler if the justification scheme is consistent.
ultimately a left-justified mantissa with implied leading digit leads to saving one bit of information. this increases the set of single-precision values that can be encoded in four bytes. the exponent calculation code also becomes simpler because there is no special case to append the unary coded suffix to recover alignment for fixed-point values.
this is a relatively intrusive change because it requires changing exponent calculation and shifts for all encodings that use an out-of-line mantissa, although, on the whole, it seems like a worthwhile change as it makes the format a lot closer to IEEE 754 floating-point format, requiring fewer adjustments when unpacking the mantissa, potentially making it easier to implement in hardware, which is something that would be unlikely for the ASN.1 Real format.
conclusion
this issue serves as a notice of intent to change the format. it is a significant change but the format is not yet v1.0 so it is okay.
The text was updated successfully, but these errors were encountered:
michaeljclark
changed the title
proposal to change left-justified mantissa to right-justified mantissa
proposal to use left-justified mantissa instead of right-justified mantissa
Feb 21, 2022
michaeljclark
changed the title
proposal to use left-justified mantissa instead of right-justified mantissa
proposal to left-justified mantissa
Feb 21, 2022
michaeljclark
changed the title
proposal to left-justified mantissa
proposal to left-justified the mantissa
Feb 21, 2022
michaeljclark
changed the title
proposal to left-justified the mantissa
proposal to left-justify the mantissa
Sep 16, 2022
to paraphrase a long story with lots of nuances: the implied exponent encoding encodes the mantissa a little like a denormal only the exponent is zero. the problem is that this scheme only works if the exponent is left justified.
proposal to left-justify the mantissa
proposal to left-justify the mantissa making vf128 more consistent with IEEE 754 floating-point. this is a significant change that will more closely align the vf128 encoding with the IEEE 754 encoding.
mantissa justification background
the vf128 variable-length floating-point format presently uses a right-justified mantissa and explicit leading one. the format was created as an evolution of the ASN.1 Real format with the intention to create a more succinct representation that more closely maps to the IEEE 754 floating-point format. the ASN.1 Real format was modelled first, so at least initially, it seemed natural to adopt its right-justified exponent with explicit leading one convention. the primary differences with ASN.1 Real format is the addition of the float7 header byte with an external bit to compact the ASN.1 Real encoding for several inlined values (
+/-0.0, +/-0.5, +/-1.0, +/-2.0, +/-NaN, +/-Infinity, ...
) as well as supporting out-of-line exponent and mantissa values.compact normals
the root of this issue is the representation of fixed point values using a succinct encoding with implicit exponent.
values in the range -0.99999.. to +0.99999.. are normal values encoded with an out-of-line mantissa and zero-length exponent and they allow a one byte saving for an implied exponent. it made sense to use this encoding for succinct representation of fixed-point values in the range -0.99999.. to +0.99999.. because the one-byte saving results in many more single-precision values that are encodable in four bytes, and double-precision values that are encodable in eight bytes.
the text currently reads:
the current code reads:
vf128/src/vf128.cc
Lines 1322 to 1328 in 429d24b
mantissa realignment
this is a relatively complex issue so here is some background on the current encoding and how it came about.
normal values encoded with an out-of-line mantissa and zero-length exponent
reconstructing a left-justified mantissa with implicit leading one from a right-justified mantissa with an explicit leading one requires a count leading zeros to realign the point from the right of the least significant bit to the right of the explicit leading one.
trailing unary coded suffix work-around for right-justified normal values
this realignment is done to all out-of-line mantissa values to keep the code simple and consistent. the problem is that the realignment of the fraction based on the explicit leading one loses information about leading zeros that would otherwise be present in a fixed point fraction. for this reason, a trailing unary coded suffix was added as a work-around to recover the leading zeros count for these quasi fixed-point fractions, or more specifically normal values in the range -0.99999.. to +0.99999..
left-justified mantissa with implied leading zero or one
after analysis, it becomes evident that a mantissa with a left-justified point and implied leading zero or one is a more natural representation for IEEE 754 floating-point values using byte quantisation because when the mantissa retains its left justification, it retains the leading zero count which is otherwise lost and necessitated the explicit leading one that is used for realignment.
with a left-justified mantissa, the position of the point remains the same in its encoded form so it is no longer necessary to append a unary coded suffix to remember the leading zeros lost during realignment. although note that with a left-justified mantissa, it makes sense to use an implied leading zero for the special case of succinct coding of fixed-point fractions with a zero-length exponent i.e. values in the range -0.99999.. to +0.99999, similarly to what is done for subnormals.
this requires a special case to adjust fixed point values back to the implied leading one needed by IEEE 754 floating-point, but it simplifies the case for all other mantissa values as it is no longer necessary to count leading zeros during decoding if the exponent is present. the shift offset is based purely on the width in bytes of the mantissa.
this example gives an overview of the location of the point for a 12-bit fraction starting with a one:
note the renormalization of the mantissa and exponent for fractions whose first digit is not a one is not shown. the case where a fixed point compressed normal is reformated to IEEE 754 normal form with implicit leading one requires a count leading zeros and adjustment to the exponent.
proposed convention
if we had reasoned about the encoding of fixed point normals at the outset, we would have started with a left-justified mantissa. i.e. the result of this analysis is the proposal is to left justify all out-of-line mantissa values:
it would be possible to only change normal values encoded with an out-of-line mantissa and zero-length exponent to use a left-justified fixed point fraction with implied leading zero, and remove the unary coded suffix special case, as that is the use case that prompted this analysis. but changing only the zero-length exponent encoding introduces more complexity overall because some parts of the format would have a left-justified fraction point with implied leading zero, and other parts would have a right-justified fraction point with an explicit leading one. it is simpler if the justification scheme is consistent.
ultimately a left-justified mantissa with implied leading digit leads to saving one bit of information. this increases the set of single-precision values that can be encoded in four bytes. the exponent calculation code also becomes simpler because there is no special case to append the unary coded suffix to recover alignment for fixed-point values.
this is a relatively intrusive change because it requires changing exponent calculation and shifts for all encodings that use an out-of-line mantissa, although, on the whole, it seems like a worthwhile change as it makes the format a lot closer to IEEE 754 floating-point format, requiring fewer adjustments when unpacking the mantissa, potentially making it easier to implement in hardware, which is something that would be unlikely for the ASN.1 Real format.
conclusion
this issue serves as a notice of intent to change the format. it is a significant change but the format is not yet v1.0 so it is okay.
The text was updated successfully, but these errors were encountered: