-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method to read the first bytes of a float (and assume the rest are 0) #70
Conversation
See #71 |
@BurntSushi I don't think this is related, is it (other than the fact that this method would also have a similar problem)? This PR is for a general float method that reads less than 8 bytes (and assumes the rest are 0), but it's likely that I'm just misunderstanding what you meant. |
@SamWhited It's related in that it would be adding more methods with the same problem as the existing FWIW, we do already have |
Ah yes, if you consider reading signaling NaN's a problem (although that sounds like it's up in the air?
Yes, it's more or less analogous |
One interesting idea brought up in the other thread was that we could mask out the signaling NaN bit before doing the transmute to a float. I don't really like that, but we either do that, or we need to change the API to return a cc @est31 @retep998 @nagisa @Amanieu @petrochenkov @dwrensha @valarauca @alexcrichton |
I don't love the idea of losing that information; if we are decoding something to a signaling NaN we either want to know in case it was intentional, or we want to know because it's a bug (or maybe it actually means something) in the encoded stream and we need to display some error, or take some action. Having float operations return a result feels poor to me too because there's the overhead of unwrapping a result on what appears at first glance to be a simple operation that should "just work", but since it's not actually that simple under the hood I think this is the lesser of two evils personally. I haven't really thought through any use cases but my own (decoding Go's Gob format), so I'll be curious to see what others say. |
I think masking out the signal NaN is fine. Flush to zero is fairly
standard behavior.
As long as an alternative function exists that *may* return a signal NaN.
…On Tue, Mar 28, 2017 at 11:56 AM Sam Whited ***@***.***> wrote:
we could mask out the signaling NaN bit
I don't love the idea of losing the error information; if we *are*
decoding something to a signaling NaN we either want to know in case it was
intentional, or we want to know because it's a bug (or maybe it actually
means something) in the encoded stream and we need to display some error,
or take some action.
Having float operations return a result feels poor to me too because
there's the overhead of unwrapping a result on what appears at first glance
to be a simple operation that should "just work", but since it's not
actually that simple under the hood I think this is the lesser of two evils
personally. I haven't really thought through any use cases but my own
though (decoding Go's Gob format), so I'll be curious to see what others
say.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#70 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ASViDPWl8gOfNEXqXQkNZw4rA1JPYDIXks5rqS2qgaJpZM4MPO82>
.
|
I didn't think about that, having a second unsafe version of the function that can return a signaling NaN also makes sense to me, but also bloats the API: I'm not sure if the tradeoff there for something people may never (or rarely) use is worth it. |
At the moment, I am somewhat inclined to clarify the contract of Whether we should add more (unsafe) methods to the API to support getting the |
The discussion on rust-lang/rust#39271 also seems to have gone that way; I pushed a new commit that I think should turn sNaN's into qNaN's by flipping the most significant fraction bit. Review by someone who's more comfortable with floating point math than I am would be appreciated. I'm also not 100% sure that the test I wrote for it doesn't introduce undefined behavior itself. EDIT: Yup, I did; fixed. |
Hmm maybe I should do it like you and just flip that single bit. |
@est31 I couldn't think of any cases where it would matter one way or the other, but I can't claim to have any real domain knowledge here. I'm not really sure who to go to for advice either; maybe some other project has done something similar and we could copy it or ask them? |
@SamWhited for my use cases it wouldn't be really useful either, but apparently you can use the lower part of the fraction field for a payload. I think I'll just do something like:
|
Hmm just realized that is_nan is not defined on u64/u32. I'll keep the old version. |
@est31 If you have a u64 you should be able to do something like: // The exponent is 1's && the mantissa has at least one bit set
(n & 0x7FF == 0x7FF) && (n & 0x000FFFFFFFFFFFFF != 0) to check for NaN EDIT: oops, probably need to shift the exponent left 52 places. |
src/lib.rs
Outdated
#[test] | ||
fn uint_bigger_buffer() { | ||
use {ByteOrder, LittleEndian}; | ||
let n = LittleEndian::read_uint(&[1, 2, 3, 4, 5, 6, 7, 8], 5); | ||
assert_eq!(n, 0x0504030201); | ||
} | ||
|
||
// TODO: How is transmute implemented? Does it count as an operation on the sNaN? | ||
// Is this test undefined behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the time you call transmute, I think you've already convert the sNaN to a qNaN, so I think you're safe.
(Even so, since it's only a test, I'm fine with being in murky territory.)
Rebased and removed TODO comments based on @BurntSushi's feedback. I think the last unopened question is what variants of this method do we want and what should they be named? Eg. do we want both an f32/f64 version fo this method? Do we want write methods (I haven't even thought about those), etc.
|
@SamWhited Thanks! The Also, since you're adding the sNaN masking to And yes, since we have |
Sure thing |
Done ⤴ To reiterate, I'm reasonably sure this is correct and that all my lengths for the various parts of IEEE floats are correct (thanks to the lovely diagrams on Wikipedia), but review by someone who knows floats would be appreciated. |
@SamWhited I'm not a floating point expert either unfortunately, but I will do some reading and try to do an independent review before merging. It's important that this is right. (It might take me a little bit to get to though.) |
src/lib.rs
Outdated
let mut u = Self::read_u32(buf); | ||
// The exponent is 1's && the mantissa has at least one bit set (aka. is_nan): | ||
if (u & 0xFF<<23 == 0xFF<<23) && (u & 0x3FFFFF != 0) { | ||
u |= 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Citing ieee754-2008:
All binary NaN bit strings have all the bits of the biased exponent field E set to 1 (see 3.4). A quiet NaN bit
string should be encoded with the first bit (d₁) of the trailing significand field T being 1. A signaling NaN
bit string should be encoded with the first bit of the trailing significand field being 0. If the first bit of the
trailing significand field is 0, some other bit of the trailing significand field must be non-zero to distinguish
the NaN from infinity. In the preferred encoding just described, a signaling NaN shall be quieted by setting
d₁ to 1, leaving the remaining bits of T unchanged.
I believe the d₁ here is the first bit after the exponent, not the least significant bit of the encoding, as per:
making this code not actually mask out the signalling bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
facepalm You're right, it should be the MSB. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed for the read_f{32,64} methods (I think, please double check my bit widths :) Thanks!).
Note to self: Outstanding bug: the read_float implmentations don't actually detect nan's properly (they need to check every byte in the mantisa and see if any of them contain a set bit). EDIT: Also fixed; it should be accounting for the entire mantissa now |
Another open question: I just noticed that |
Endianness is handled by |
Oh right, nevermind. *dissapears |
@BurntSushi ping; just wanted to make sure this didn't fall off the radar. No rush though. |
@BurntSushi friendly ping :) You said that you wanted to review the sNaN masking in this PR before we stabilize the transmute functions: rust-lang/rust#39271 (comment) Would be nice to have stable float<-> int transmute in Rust 1.20. |
I'm no expert, but according to wikipedia, some implementations of IEEE 754 use opposite meaning for the signalling/quiet bit. |
@le-jzr great link! Setting the value to a known quiet NAN without any masking might indeed be the way to go forward. |
Also of interest: https://sourceware.org/binutils/docs/as/MIPS-NaN-Encodings.html |
@SamWhited @est31 In the interest of moving things forward, I've just merged the part of this PR that makes reading floats safe. In particular, I updated it to use @est31's implementation that is now in @SamWhited I tried to salvage the |
I'm not sure if this is broadly useful enough to be worth adding to the library, but I've found myself using a similar function to this several times in a recent project so I thought I'd submit a PR in case you wanted it.
The idea is to read the first n bytes of an f64 as a uint and then assume that the rest of the bytes are 0. This is very useful if you're parsing lots of compressed floats in little-endian format where the low bits (which are often zero) can be dropped (eg. due to a form of run length encoding, or other compression that drops sequences of zeros).