Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix type coercion in bmerge #6603

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

fix type coercion in bmerge #6603

wants to merge 5 commits into from

Conversation

ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Nov 3, 2024

Closes #6602

NB: I'm not happy that a user can get different messages, depending on the number of join conditions, but coercing both Dates to double when multiple column conditions exist, seems like the right decision.

Base does not encounter this problem since one join column can not be in multiple join conditions.

Copy link

github-actions bot commented Nov 3, 2024

Comparison Plot

Generated via commit 1d7a8c2

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 4 minutes and 39 seconds
Installing different package versions 7 minutes and 26 seconds
Running and plotting the test cases 2 minutes and 13 seconds

Copy link

codecov bot commented Nov 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.60%. Comparing base (6a15f86) to head (1d7a8c2).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6603   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files          79       79           
  Lines       14516    14520    +4     
=======================================
+ Hits        14314    14318    +4     
  Misses        202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ben-schwen ben-schwen marked this pull request as ready for review November 3, 2024 21:52
@ben-schwen
Copy link
Member Author

ben-schwen commented Nov 3, 2024

@MichaelChirico is it worth to make the if more specific to this corner case e.g.

(length(unique(icols))!=length(icols) || length(unique(xcols))!=length(xcols))

ensuring that either icols has a double condition or xcols has a double condition?

@@ -71,7 +71,13 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, xclass, iname, iclass)
}
if (xclass == iclass) {
if (verbose) catf("%s has same type (%s) as %s. No coercion needed.\n", iname, xclass, xname)
if (length(icols)>1 && is(x[[xc]], "Date") && is(i[[ic]], "Date")) {
Copy link
Contributor

@iago-pssjd iago-pssjd Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To modify only when strictly needed, could be added the next condition?

&& !identical(storage.mode(x[[xc]]), storage.mode(i[[ic]]))

(following https://bugs.r-project.org/show_bug.cgi?id=18782#c9)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use inherits(), is() is for S4 classes IMO (even though it happens to work for S3 classes).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, IINM POSIXct can have the same concern -- {base} doesn't care about storing POSIXct as integer sometimes if possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use inherits(), is() is for S4 classes IMO (even though it happens to work for S3 classes).

Previously I checked if class(x[[xc]])=="Date" are equal and our linter nudged me towards is, so we might also want to adjust the linter :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class(x)=="Date" is bad because class() can be length>1, c.f. class(Sys.time()), hence == is rarely intended.

The linter mentions is() as well as inherits() to cover all bases since there's no is.Date() and we didn't hand-code anything to cover base S3 classes:

lintr::lint("class(x) == 'Date'\n", lintr::class_equals_linter())
# <text>:1:1: warning: [class_equals_linter] Instead of comparing class(x) with ==, use inherits(x, 'class-name') or is.<class> or is(x, 'class').
# class(x) == 'Date'
# ^~~~~~~~~~~~~~~~~~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, seems that I overlooked the inherits part.

@MichaelChirico
Copy link
Member

Sorry, I just saw that this is a CRAN requirement, checking now.

@@ -71,7 +71,13 @@ bmerge = function(i, x, icols, xcols, roll, rollends, nomatch, mult, ops, verbos
stopf("Incompatible join types: %s (%s) and %s (%s). Factor columns must join to factor or character columns.", xname, xclass, iname, iclass)
}
if (xclass == iclass) {
Copy link
Member

@MichaelChirico MichaelChirico Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: {x,i}class seem like poor name choices here, given that they typically come from typeof() (and class() is never called, except maybe internally by inherits()).

{x,i}type would be more appropriate, but {x,i}_merge_type might be even better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another aside: It looks like we do x[[xc]] and i[[ic]] extraction at least one and up to several times. Should we just store the extracted column locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aside: {x,i}class seem like poor name choices here, given that they typically come from typeof() (and class() is never called, except maybe internally by inherits()).

Yes but didn't want to change more than needed for the quick fix

@MichaelChirico
Copy link
Member

Definitely thanks for triaging a fix here. I think we both agree it's pretty hack-y & ideally not needed.

I'm not sure I fully understand the bug yet, but as stated in OP we're doing as.Date().

IMO as.Date() should always coerce to double storage, even if it's possible to get Date objects with integer storage. Therefore I would hold off until (1) R-core confirms that yes, as.Date(x) might return integer sometimes; and/or (2) I get a clearer understanding of the underlying issue here.

@ben-schwen
Copy link
Member Author

IMO as.Date() should always coerce to double storage, even if it's possible to get Date objects with integer storage. Therefore I would hold off until (1) R-core confirms that yes, as.Date(x) might return integer sometimes; and/or (2) I get a clearer understanding of the underlying issue here.

Unfortunately this "contract" does not hold. I have a Windows dev version installed here and get the following

> typeof(.Date(0L))
[1] "integer"
> typeof(as.Date(.Date(0L)))
[1] "integer"

@MichaelChirico
Copy link
Member

I have a Windows dev version installed here and get the following

Thanks, also confirmed that on 4.4.1

@TysonStanley
Copy link
Member

This is strange as Kurt mentioned there wasn't an intention to do this but I figure it would have been fixed by now if that were the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

as.Date can result in different underlying types
4 participants