Closes #5510: undefined behaviour #5832

HughParsonage · 2023-12-17T12:28:35Z

Closes #5510.

As issue #5510 notes, the affected code triggers undefined behaviour. In general this is because when checking whether an integer conversion is possible without losing precision, the following condition is used.

(int)v == v

However, this is undefined behaviour if v cannot be represented as an int. A helper function within_int32_repres is added to the top of utils.c to check this. This function is invoked prior to any use of the above condition. In addition, there was one instance of an attempt to convert nan to int64_t (aka long) which has been protected in gsumm.c to completely close the issue.

A consequence of this was a change to some of the warnings involving a loss of precision in integer64 conversions. I have modified 4 tests to reflect this change, which amounted to changing the warning text to include the possibility the value was "out-of-range".

Some of the tests have been suspended via comments to reflect bugs in the tests themselves uncovered by this PR, and addressed elsewhere. (#5835 at the time of writing)

Note that some double -> integer64 conversions are collapsed into the one warning, affecting tests, so these have been updated.

codecov · 2023-12-17T12:34:45Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (92105e8) 97.47% compared to head (6f2769f) 97.47%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5832   +/-   ##
=======================================
  Coverage   97.47%   97.47%           
=======================================
  Files          80       80           
  Lines       14823    14827    +4     
=======================================
+ Hits        14448    14452    +4     
  Misses        375      375

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

inst/tests/tests.Rraw

src/utils.c

MichaelChirico · 2023-12-17T12:54:33Z

great find! seems obvious in retrospect... in terms of completeness, have you grepped through to other (int) casts that might be affected but untested?

in terms of future-proofing, any ideas how we might prevent this issue from recurring? feels like there could be a compiler warning

HughParsonage · 2023-12-17T13:02:58Z

I believe I caught the ones relevant to the issue. I did a brief grep using

hutils::goto_pattern_in("(?<!(f))\\(int\\)", "src", file.ext = ".c")

The ones I saw that might benefit were checked by other, more stringent conditions (e.g. val<0.0). There were others but I could not easily tell whether it was possible for the rvalue to be out-of-range.

In terms of future-proofing, all I would do is just be mindful whenever I use (int) that I need to have already directed flow away from the case where it's not possible to convert. It's a similar scenario with arithmetic, where a check for defined behaviour can often itself be the cause of undefined behaviour.

int minus(int a, int b) {
  return a - b; // wrong
}

int minus2(int a, int b) {
  int64_t o = a - b; // still wrong
  return (o >= INT_MIN && o <= INT_MAX) ? o : NA_INTEGER;
}

inst/tests/tests.Rraw

src/assign.c

MichaelChirico · 2023-12-18T14:47:45Z

needs moar LLM

src/utils.c

tdhock · 2023-12-18T16:40:45Z

hi @HughParsonage please consider volunteering as a reviewer for these files, by adding yourself in CODEOWNERS, so you will be notified if there are any future PRs which affect them.

…table/data.table into hp-int-undefined-behaviour

HughParsonage · 2023-12-19T11:55:13Z

I've run on rhub with sanitizers and all appears to be ok: https://builder.r-hub.io/status/data.table_1.14.99.tar.gz-4e4196719d354e02850532ae9cf1ad5d

jangorecki · 2023-12-19T12:30:34Z

I've run on rhub with sanitizers and all appears to be ok: https://builder.r-hub.io/status/data.table_1.14.99.tar.gz-4e4196719d354e02850532ae9cf1ad5d

And running master branch detects those?

HughParsonage · 2023-12-19T23:58:30Z

I'm struggling to reproduce the UBSAN issues, but in my view this satisfies the undefined behaviour issue identified.

jangorecki · 2023-12-20T05:59:05Z

Before merging this would be nice to reproduce issues on master and reproduce no issues on PR, let's keep it open then till there are still other open issues on the milestone. Maybe in the meantime some will set ubsan locally to confirm it.

MichaelChirico · 2023-12-20T14:08:20Z

Anyone want to have a go at making a UBSAN GitHub Codespace? Would be quite useful.

ben-schwen · 2023-12-20T14:32:02Z

Anyone want to have a go at making a UBSAN GitHub Codespace? Would be quite useful.

Can't we just mirror the current dev container and use the devel-clang docker image instead? Will do later!

MichaelChirico · 2023-12-22T15:56:37Z

As noted in #5850, there's still one issue left, I found it's from this bug:

Running test id 6.48
assign.c:912:47: runtime error: -1.84467e+19 is outside the range of representable values of type 'long'
assign.c:1040:21: runtime error: -1.84467e+19 is outside the range of representable values of type 'long'

data.table/inst/tests/nafill.Rraw

Line 185 in eace83a

    
           test(6.48, identical(coerceFill(-(2^64)), list(NA_integer_, -(2^64), as.integer64(NA))), warning=c("precision lost","precision lost"))

HughParsonage · 2023-12-22T16:06:12Z

It appears to be the same problem, just with int64. I can attend to it after Christmas.

…

On Sat, 23 Dec 2023 at 2:56 am, Michael Chirico ***@***.***> wrote: As noted in #5850 <#5850>, there's still one issue left, I found it's from this bug: Running test id 6.48 assign.c:912:47: runtime error: -1.84467e+19 is outside the range of representable values of type 'long' assign.c:1040:21: runtime error: -1.84467e+19 is outside the range of representable values of type 'long' https://github.com/Rdatatable/data.table/blob/eace83a40c3ae3e95c52d6ec92ef193c2ded6e9b/inst/tests/nafill.Rraw#L185 — Reply to this email directly, view it on GitHub <#5832 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB54MDG3S7LGXNRHFGGOFO3YKWUUBAVCNFSM6AAAAABAYKG4USVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRXHA2DCNJWGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

MichaelChirico · 2023-12-22T16:50:37Z

It appears to be the same problem, just with int64. I can attend to it after Christmas.

NP, I have a fix just about ready.

MichaelChirico · 2023-12-22T17:21:51Z

Tried this as well to see if there's any residual issues, seems OK:

vals <- 2^(0:64)
vals <- c(-rev(vals), 0, vals)
vals <- c(vals, vals-1, vals+1)
data.table:::coerceAs(vals, bit64::as.integer64(1))

MichaelChirico

Think it's ready to go. Thanks! h/t @ben-schwen for getting the UBSAN codespace up & running to make debugging the last bit so much easier 🙇

src/utils.c

HughParsonage · 2023-12-26T03:05:04Z

LGTM. I propose that this be now merged.

MichaelChirico · 2023-12-26T03:08:16Z

Thanks @HughParsonage! If you get a minute it might help to clean up the PR description to reflect the final state as submitted, I think I see a few notes about the interim state of the PR there.

HughParsonage added 2 commits December 17, 2023 23:20

Fix UB via int conversion (#5510)

0471c4d

Note that some double -> integer64 conversions are collapsed into the one warning, affecting tests, so these have been updated.

Ensure nan is not coerced. Closes #5510

e3c6a78

HughParsonage requested a review from jangorecki December 17, 2023 12:28

HughParsonage self-assigned this Dec 17, 2023

HughParsonage requested review from ben-schwen and mattdowle as code owners December 17, 2023 12:28

MichaelChirico reviewed Dec 17, 2023

View reviewed changes

inst/tests/tests.Rraw Show resolved Hide resolved

Fix '1' mistakenly removed from earlier commit

90b167e

MichaelChirico reviewed Dec 17, 2023

View reviewed changes

src/utils.c Outdated Show resolved Hide resolved

jangorecki reviewed Dec 17, 2023

View reviewed changes

inst/tests/tests.Rraw Show resolved Hide resolved

ben-schwen reviewed Dec 17, 2023

View reviewed changes

src/assign.c Outdated Show resolved Hide resolved

ben-schwen reviewed Dec 17, 2023

View reviewed changes

src/assign.c Show resolved Hide resolved

ben-schwen reviewed Dec 17, 2023

View reviewed changes

src/assign.c Show resolved Hide resolved

ben-schwen reviewed Dec 18, 2023

View reviewed changes

src/assign.c Outdated Show resolved Hide resolved

HughParsonage added 4 commits December 18, 2023 22:39

Re #5834

47c87fd

Rename function more sensibly

772906d

Incorporate finite checks into the logic

c1bf61c

Suspend int64 coerce tests subject to #5834

2d76efd

jangorecki modified the milestones: 1.16.0, 1.15.0 Dec 18, 2023

This comment was marked as resolved.

Sign in to view

jangorecki reviewed Dec 18, 2023

View reviewed changes

src/utils.c Show resolved Hide resolved

Update CODEOWNERS

f0238b4

HughParsonage added 4 commits December 19, 2023 19:25

Simplify logic for integer check

9d05feb

Merge branch 'hp-int-undefined-behaviour' of https://github.com/Rdata…

a27ccf2

…table/data.table into hp-int-undefined-behaviour

Check for NA_INTEGER superfluous

50056ef

Test 6.53 should emit warning on -2^31

cd73555

jangorecki approved these changes Dec 20, 2023

View reviewed changes

ben-schwen mentioned this pull request Dec 21, 2023

Add ubsan devcontainer #5850

Merged

Merge branch 'master' into hp-int-undefined-behaviour

d32470e

MichaelChirico added 2 commits December 22, 2023 16:49

C++ toolkit included by default

46cf79c

first pass at fix

3e2f1c3

Simplify/unite with within_int64_repres()

7701738

MichaelChirico requested a review from jangorecki December 22, 2023 17:23

MichaelChirico approved these changes Dec 22, 2023

View reviewed changes

jangorecki reviewed Dec 22, 2023

View reviewed changes

src/utils.c Outdated Show resolved Hide resolved

MichaelChirico and others added 2 commits December 22, 2023 18:23

weak inequality

b269b39

Merge branch 'master' into hp-int-undefined-behaviour

6f2769f

MichaelChirico approved these changes Dec 26, 2023

View reviewed changes

MichaelChirico merged commit 19e1798 into master Dec 26, 2023
4 checks passed

MichaelChirico deleted the hp-int-undefined-behaviour branch December 26, 2023 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #5510: undefined behaviour #5832

Closes #5510: undefined behaviour #5832

HughParsonage commented Dec 17, 2023 •

edited

Loading

codecov bot commented Dec 17, 2023 •

edited

Loading

MichaelChirico commented Dec 17, 2023

HughParsonage commented Dec 17, 2023 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

MichaelChirico commented Dec 18, 2023

tdhock commented Dec 18, 2023

HughParsonage commented Dec 19, 2023

jangorecki commented Dec 19, 2023

HughParsonage commented Dec 19, 2023

jangorecki commented Dec 20, 2023

MichaelChirico commented Dec 20, 2023

ben-schwen commented Dec 20, 2023

MichaelChirico commented Dec 22, 2023 •

edited

Loading

HughParsonage commented Dec 22, 2023 via email

MichaelChirico commented Dec 22, 2023 •

edited

Loading

MichaelChirico commented Dec 22, 2023

MichaelChirico left a comment

HughParsonage commented Dec 26, 2023

MichaelChirico commented Dec 26, 2023

Closes #5510: undefined behaviour #5832

Closes #5510: undefined behaviour #5832

Conversation

HughParsonage commented Dec 17, 2023 • edited Loading

codecov bot commented Dec 17, 2023 • edited Loading

Codecov Report

MichaelChirico commented Dec 17, 2023

HughParsonage commented Dec 17, 2023 • edited Loading

This comment was marked as resolved.

This comment was marked as resolved.

MichaelChirico commented Dec 18, 2023

tdhock commented Dec 18, 2023

HughParsonage commented Dec 19, 2023

jangorecki commented Dec 19, 2023

HughParsonage commented Dec 19, 2023

jangorecki commented Dec 20, 2023

MichaelChirico commented Dec 20, 2023

ben-schwen commented Dec 20, 2023

MichaelChirico commented Dec 22, 2023 • edited Loading

HughParsonage commented Dec 22, 2023 via email

MichaelChirico commented Dec 22, 2023 • edited Loading

MichaelChirico commented Dec 22, 2023

MichaelChirico left a comment

Choose a reason for hiding this comment

HughParsonage commented Dec 26, 2023

MichaelChirico commented Dec 26, 2023

HughParsonage commented Dec 17, 2023 •

edited

Loading

codecov bot commented Dec 17, 2023 •

edited

Loading

HughParsonage commented Dec 17, 2023 •

edited

Loading

MichaelChirico commented Dec 22, 2023 •

edited

Loading

MichaelChirico commented Dec 22, 2023 •

edited

Loading