-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind package version into .dll/.so #3056
Comments
I have some experience on this and it's true I have to be very careful when updating the R package that uses compiled codes on Windows. Whenever the R package with compiled code is being loaded, the DLL file will be locked on Windows and can't be unlinked. If you are trying to upgrade the package via UPDATE Installed the binary version will only show a warning but the package can't be used anymore in a new session. However, I remember installing the package from the source will success but with a wrong DLL because I've fallen into this pit several times before (I will try later and report). UPDATE-again I can confirm my previous UPDATE that: install from binary will only show a warning but the package won't be installed. However, install from source will show a warning but the installing will succeeds and leads to a broken package that uses the wrong DLL file. (See the screenshot below) It's not an RStudio only issueI do the following tests without using RStudio. INSTALL FROM THE BINARY
INSTALL FROM THE SOURCE
The Steps to Reproduce
|
I think we have two options on this:
|
@shrektan thanks for confirming in R session. Your proposed solution looks interesting! |
Is this not an issue for, say |
I am fairly certain I recall snarky comments from Duncan Murdoch in the direction of Redmond, WA. This is AFAIK very much an OS-level issue R cannot do anything about [unless you consider the guerilla hacks outlines above legit; I am not sure I'd recommend going there]. (For what it is worth we get minor DLL hell on Linux too. Try loading package c (depending on a) when b (depending on a) is already loaded -- and you upgraded a. Happend to me yesterday when |
Both solutions mentioned above should be cross platform. The issue is pretty serious because it can lead to wrong results silently, so some solution should be implemented. The one with suffixing |
Principle of least surprise: don't do anything. Leaving dangling randome copies of shared libraries on my system is not something I would want an R package I respect to do, sorry. |
As a follow-up, I really don't want to sound like the Grinch but if it cannot be done reliably then maybe it should not be done. Consider eg what Duncan just posted about trying to debug rgl by loading / unloading: https://stat.ethz.ch/pipermail/r-package-devel/2018q3/003150.html I still say "Don't do it". But that maybe me. And how I work with littler on the command-line. |
Reading @shrektan's comment above, it seems R's @nilescbn Did you install from source or from binary? @shrektan tests show that if you installed from binary then the mismatch should not have occurred. You must have installed from source for this mismatch to happen. Did you see the message that @shrektan underlined in red in the screenshot? It would make sense you did install from source because you were very quick to report the problem just after 1.11.6 went to CRAN. So quick that it was likely that binaries weren't available yet and that's the very reason you installed from source. We need you to confirm so that we can suggest the correct fix to r-core. |
Do we know if Python, Ruby, ... manage this better? AFAIK (and I know little Windoze) this is a documented defiency in in the OS. Dancing around seems crazy. But then again, more power to you all as you have done some plainly "impossible" things before too. |
I'll be dancing if this gets fixed. That's for sure. |
@mattdowle
Anyway, I think the better solution is, as you said, |
@shrektan I've removed my misguided comment above and replaced it. I hadn't read your original comment properly when I wrote mine. Based on your earlier comment above with the screenshots, I think base R is already doing a pretty good job then, but there's a glitch in the case when installing from source. In other words, there's already a warning: R knows this dll-in-use problem has happened! It's not that the warning should be an error, because people will still miss it whether it's a warning or an error. The problem is that the package is left in a mismatching R/dll state when installed from source. When it is installed from binary, it is ok in the sense that the mismatch R/dll state does not occur (it's left in a non-working state instead). The difference being to do with type=source or type=binary is not directly to do with whether the dll was created locally or downloaded from CRAN, it's just that a different logic path happens inside install.packages due to the type and there's a logic weakness in one of those code paths. Just talking out loud. Relying completely on your comment earlier above with the screen shots. |
So the easiest solution would be for base R to raise error instead of warning in such cases. Then it won't be possible to upgrade if you are using pkg in another session, and this is not that big problem as wrong answer silently. |
No. That's why I wrote :
In other words, if install.packages() fails with error but still leaves the package in a mismatch R/dll state, that's still not good enough. You may be thinking that the warning message is at the beginning of install.packages() before it starts to upgrade and so turning it into error will solve the problem. Maybe that is the case but I doubt it. It feels to me that the R code gets installed first and then the .dll later. If that's the case and the .dll can't be installed, the just-upgraded R code should be removed or something like that so the package is left in a non-working state that can't be loaded. So that there's no possibility of users getting silent wrong results due to still using the old version's dll. Or the dll could be upgraded first by install.packages() and only if successful would it proceed to upgrade the R code. This seems to be this line and it happens at the end by the looks of it : with an ominous comment :
|
@mattdowle I've modified my previous comment and made it clearer. Actually, I double checked my steps (see the end of my first comment) and I'm pretty sure this is what happens (at least on my computer).
I agree with your options:
The key point is to prohibit users from using a mismatched DLL file silently. |
Maybe |
@eddelbuettel why (on earth) optional? |
Fair point. I was just wary of any change in behavior. For another classic on that from just yesterday, see here. |
I filed this issue with r-core : |
Looks like you all have moved well into the issue here... If still relevant, @mattdowle, I can't remember for certain but would bet I did install from source. I say this because if RStudio prompts me with the question of whether to install from source because the source version is "later" than the binary, I always choose source (unless it's a package like In trying to reproduce the error again this morning, RStudio did not prompt me. I understand that the binary "catches up" to the source version on CRAN after a few days, so perhaps that's what has happened. Also this morning, when attempting to install v 1.11.6 without restarting the session, I got this warning:
I don't recall ever seeing that message before. And I definitely didn't see it last week. |
@nilescbn Thanks for confirming. What you're describing all fits. |
Thanks to @gaborcsardi suggesting in R#17478 to use @shrektan When PR #3088 is merged, could you regenerate the broken state condition as you did before (having another session open with data.table loaded) but this time upgrading to v1.11.9 by setting type="source". You should see the warning as before about the in-use dll problem. But this time when you try to load v1.11.9 afterwards, data.table should emit the new error and halt asking you to reinstall data.table. I'm not sure this will work because the MD5 file is not created when installing from source, is it? At least, it's not created for me on Linux when I install from source. So if the user installs from the source on Windows (as they are prompted to do just after a new release to CRAN when CRAN source version > CRAN binary version) I guess there is no MD5 file. In that case, We could emit a warning on load on Windows if there is no MD5 file. To ensure Windows users only use binaries from CRAN. But is that too strong? |
Definitely too strong. Many people uses Rtools and latest version of data.table. We could emit warning once, or add some function to disable that warning easily, so user don't have to set options in Rprofile. Maybe we can have |
One possibility would be to error on load if (a) on Windows onLoad <- function(libname, pkgname) {
# Runs when loaded but not attached to search() path; e.g., when a package just Imports (not Depends on) data.table
if (.Platform$OS.type == "windows" &&
!isTRUE(tools::checkMD5sums("data.table")) &&
!identical(Sys.getenv("_R_DATATABLE_REQUIRES_BINARY_"), "FALSE")) {
# checkMD5sums outputs messages using cat() and returns NA when MD5 file is not available. The MD5 file is included in the
# binary builds that CRAN produces.
stop("Using data.table on Windows but the MD5 checksum is invalid.\n\n",
"On Windows, data.table should be installed from a binary ",
"with a valid MD5 checksum. If you are an advanced user and ",
"are certain the DLL will be properly updated, you can install ",
"from source and use data.table by setting the following ",
"environment variable before loading.\n\t",
'Sys.setenv("_R_DATATABLE_REQUIRES_BINARY_" = "FALSE").\n')
# https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478
} This way the unusual case of installing from source can still be used by setting an environment variable but for the majority of cases it's caught. |
But it seems that MD5 is not present when installing from source thus it is pretty much useless to depend on that. And setting ENV var is even more annoying then setting R option. Next time user will boot machine and try to run script it will see error again, and then user will need to google (and learn very useful stuff of course, but user might not appreciate it) how to set it permanently. I would emit warning once (per version), and mark somewhere internally that user has been warned. Or go for more reliable solution, as the one mentioned above, to store version in C and compare in |
Warning (not error) on load would be better, agreed. I don't think it's 'pretty much useless' to use MD5 though. Vast majority of WIndows users install and use the binary from CRAN. When they upgrade the day after a new release and CRAN source version > CRAN binary version, they are prompted to install from source, which many of them do. The absence of MD5 is probably (not yet confirmed) an indicator that they did that. It was a temporary measure for them to install from source. When the binary version is available on CRAN a day or two later, they should reinstall using the binary to remove the startup message/warning. I think they will appreciate that. Most users will never see the startup warning. |
I disagree jangorecki.
I claim that installing data.table from source is very unusual for Windows users. I for one have not been able to compile You can set system or user level environment variables, so if a user really really wants to install from source he only needs to set the environment variable once. It should persist between boots. And the error message I provided makes it clear how to install immediately. A big problem with a single warning for each install is that users might automatically install via I think the cost of inconvenience is worth the risk. |
OK, it was just me compiling data.table frequently. Back then (when I was on Windows) there were no openmp in data.table, so that might have been easier, just Rtools. It would be best to rely on md5 actually, but that requires R to compute it always. Or even better R should verify md5 after installation. |
Actually, I've never had difficulties when compiling from source on Windows (openmp is enabled by default?) but always on OSX (have to modify ~/.R/Makevars first)... @mattdowle I've tried installing 1.11.9 (since not on CRAN yet I compiled a source file and use |
@mattdowle It appears that when building a source package from packgage files, the file with the MD5 hashes (named MD5) is added to the .tar.gz in the top directory. When installing from that source, the MD5s are checked, but not saved into the library. |
What about renaming the The library name would need to be updated in the following places:
This could probably be automated with a configure script if need be. |
Thanks for the suggestion. I seem to remember something like that (version number in .so/.dll name) being suggested before but there were some reservations. How about adding this to the C: (I never got a response from R-core to the issue I raised.) |
@shrektan Can you try once more to reproduce the mismatch state please to see if PR #3237 catches it? Since the |
@mattdowle I can confirm it works. Installing from source leads to the error saying object dllVersion can't be found. |
Thanks @shrektan ! Let's close for now then and monitor closely. As we see reports of the new data.table check, we can follow up with R-core on the root cause which affects all packages (which have compiled code) in the issue I raised : https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17478. |
So that R code in a package calling an old version of that package's .dll on Windows can't happen. Almost every time we upgrade, someone somewhere on Windows hits this problem; e.g. #3055.
The worry is that it happens more than we know, but users get wrong results silently. It's quite fortunate when a mismatch is severe enough to cause an error, so at least the user knows something is wrong.
But how to achieve this? It might mean checking the version matches between R package and its .dll on each and every
.Call()
call, which might be expensive, involve a lot of code changes at every interface point, and might be insufficient if we miss some interface points either now or in the future.Maybe it's better to fix the issue upstream in R itself (
install.packages
) or RStudio if it's an RStudio-only problem. Since it must affect other packages too. We need someone with i) the will and ii) Windows, to solve it once and for all, and we need precise information from users on Windows who hit this problem to tell us the conditions under which they upgraded and the problem happens. (My guess is that they upgrade in one R session while another R session has the package loaded using the .dll).The text was updated successfully, but these errors were encountered: