Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bump stated R dependency? #5757

Open
jangorecki opened this issue Nov 24, 2023 · 12 comments
Open

bump stated R dependency? #5757

jangorecki opened this issue Nov 24, 2023 · 12 comments
Labels

Comments

@jangorecki
Copy link
Member

jangorecki commented Nov 24, 2023

I am quite happy staying on R 3.1.0, so it is not like we need to upgrade.

One practical aspect could be to simplify CI. Yet we have couple of CI jobs (3.1, 3.4.4, 3.5) testing different corner cases present of each of those versions. It turned out that CI minutes got now quite limited for free plans...

Bumping to 4.0.0 will allow us to use reference counting, although I am not sure if we have dev time to really work on that. That would be also huge bump of 6 years, from supporting environments set in 2014 to 2020.
Such big change could be also postponed to be introduced when major breaking changes would be landing in master branch as well. Otherwise bumping to 3.5 is some middle step.

Before any change we should definitely investigate what R version are data.table users using based on data from http://cran-logs.rstudio.com (see @arunsrinivasan 's https://github.com/arunsrinivasan/cran.stats)

My personal preference would be to support as old as feasible R version, possibly removing R 3.4.4 and R 3.5.0 CI jobs, and leaving R 3.1.0 job.

@tdhock
Copy link
Member

tdhock commented Nov 24, 2023

I would think that if CI minutes are limited we should test

  • oldest supported R version
  • current R-release
  • current R-devel
    maybe keep the 3.4.4, 3.5 CI for now, and remove them later if we run into the CI time limit?

@jangorecki
Copy link
Member Author

Rather than 3.4.4 and 3.5. I would prefer to have 6 jobs win/macos * release/devel/oldrel so we can provide more binaries.

Also #5745 should help to reduce some minutes, so we don't have to freeze suggested deps and risk of breaking change sneaks unnoticed.

@tdhock
Copy link
Member

tdhock commented Nov 25, 2023

are you able to check how many minutes we are using on github actions? I can see that for my own account under settings -> billing and plans (but it says zero so i'm not sure that is correct). but I can not see usage for Rdatatable org.

@jangorecki
Copy link
Member Author

No I am not able to check for Rdatatable org. I am able to check, same as you, for my own namespace.
It says

2,000 Actions minutes/month
500MB of Packages storage

On GitLab free plan, as Rdatatable org, we have

50 000 minutes/month
10 GB

As a user, I have also 2000 minutes/month, but 10GB of storage

So the question is, how many compute minutes we get as Rdatatable org on github. For that I believe we need Matt as we privileges only on repository, an not the org.

@MichaelChirico
Copy link
Member

Keeping support for very old R is great, but I don't think it makes sense to keep tracking arbitrarily old R version indefinitely. We are just signing ourselves up for ever-increasing tech debt with limited benefit. Eventually people need to upgrade R.

Better to set a policy of support and gradually start bringing our dependency forward.

R 3.1.0 is nearly 10 years old:

https://github.com/wch/r-source/tree/tags/R-3-1-0

I think even a policy of 5 years old (R 3.5.0) is quite generous, but 6 or 7 would also be fine.

I don't think it's reasonable to expect {data.table} to cater to decade-old installations -- archived versions of {data.table} exist for this reason.

@jangorecki
Copy link
Member Author

Let's see if there are active users of data.table on R 3.1.0. Sticking blindly to 5 (or 6 or 7) years rule doesn't sound to be great idea.

Not sure what package that was, but once in production I had to use own fork of a package, and the only change was pushing stated dependency to older version. Everything worked fine. We don't want our users to be forced to do tricks like this by blindly following a "5 years rule".

I do believe we should bump stated dependency but when we are ready to follow up with benefits it gives.

@MichaelChirico
Copy link
Member

when we are ready to follow up with benefits it gives

so far we are too "greedy" about this. 3.2.0 does not bring (us) many direct benefits vs 3.1.0. But it gets us closer to more recent R where there are larger benefits. And I would rather gradually hit 3.2.0, 3.3.0, ... than jump suddenly to 3.6.0.

Everything worked fine

I'm not worried about this. we are quite good about earmarking which code can be updated once we depend on certain R version. We will very quickly become incompatible with older R upon upgrade.

@TysonStanley
Copy link
Member

Is there a way (that I'm not aware of) for us to know what versions of R are downloading data.table without a survey?

@jangorecki
Copy link
Member Author

Yes, described in the first post. CSV files have that field.

@jangorecki
Copy link
Member Author

jangorecki commented Dec 6, 2023

From December 2022 to November 2023. 365 days, 280 valid days (maybe missing or network error)

l = list.files() # obtained from cran.stats
d = rbindlist(lapply(l, function(f) {cat(f,"\n",sep=""); fread(f, showProgress=FALSE)[package=="data.table", .N, r_version]}))
d[,sum(N),substr(r_version,1,3)][order(-V1)]
 1: <NA> 2217752 3.364596e+01
 2:  4.2 2010680 3.050443e+01
 3:  4.3 1468433 2.227789e+01
 4:  4.1  446147 6.768585e+00
 5:  4.0  188071 2.853262e+00
 6:  3.6  149188 2.263361e+00
 7:  3.4   53199 8.070926e-01
 8:  3.5   32396 4.914862e-01
 9:  4.4   14310 2.170999e-01
10:  3.3   10819 1.641372e-01
11:  3.2     437 6.629814e-03
12:  3.1       5 7.585599e-05

almost 1% of users were on 3.4.
0.1% were on 3.3

@MichaelChirico
Copy link
Member

What do we know about the source for this data?

E.g. we see 5 people on 3.1, how likely is it that's just someone like us running really old R for testing purposes?

(either way I think it's clear we can bump to 3.2 ASAP and 3.3 in the subsequent release)

@jangorecki
Copy link
Member Author

This is from cloud.r-project.org, which is widely used in CI setups. Therefore bias for 4.4, 4.3 and 4.2 may be there. There are tens of different mirrors so it is just a, little biased, sample :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants