Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38430: [R] Add test + fix corner cases after nixlibs.R refactor #38534

Merged
merged 8 commits into from
Nov 10, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 32 additions & 14 deletions r/tools/nixlibs.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,9 @@ exit <- function(..., .status = 1) {


# checks the nightly repo for the latest nightly version X.Y.Z.100<dev>
find_latest_nightly <- function(description_version) {
find_latest_nightly <- function(description_version,
list_uri = "https://nightlies.apache.org/arrow/r/src/contrib/PACKAGES",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that makes more sense ^^

hush = quietly) {
if (!startsWith(arrow_repo, "https://nightlies.apache.org/arrow/r")) {
lg("Detected non standard dev repo: %s, not checking latest nightly version.", arrow_repo)
return(description_version)
Expand All @@ -44,17 +46,28 @@ find_latest_nightly <- function(description_version) {
res <- try(
{
# Binaries are only uploaded if all jobs pass so can just look at the source versions.
urls <- readLines("https://nightlies.apache.org/arrow/r/src/contrib")
versions <- grep("arrow_.*\\.tar\\.gz", urls, value = TRUE)
versions <- sub(".*arrow_(.*)\\.tar\\.gz.*", "\\1", x = versions)
versions <- sapply(versions, package_version)
versions <- data.frame(do.call(rbind, versions))
matching_major <- versions[versions$X1 == description_version[1, 1], ]
latest <- matching_major[which.max(matching_major$X4), ]
package_version(paste0(latest, collapse = "."))
urls <- readLines(list_uri)
versions <- grep("Version:\\s*.*?", urls, value = TRUE)
versions <- sort(package_version(sub("Version:\\s*", "\\1", versions)))
major_versions <- versions$major

description_version_major <- as.integer(description_version[1, 1])
matching_major <- major_versions == description_version_major
if (!any(matching_major)) {
lg(
"No nightly binaries were found for version %s: falling back to libarrow build from source",
description_version
)

return(description_version)
}

versions <- versions[matching_major]
max(versions)
},
silent = quietly
silent = hush
)

if (inherits(res, "try-error")) {
lg("Failed to find latest nightly for %s", description_version)
latest <- description_version
Expand Down Expand Up @@ -832,16 +845,19 @@ quietly <- !env_is("ARROW_R_DEV", "true")

not_cran <- env_is("NOT_CRAN", "true")

if (is_release & !test_mode) {
if (is_release) {
VERSION <- VERSION[1, 1:3]
arrow_repo <- paste0(getOption("arrow.repo", sprintf("https://apache.jfrog.io/artifactory/arrow/r/%s", VERSION)), "/libarrow/")
} else if(!test_mode) {
} else {
not_cran <- TRUE
arrow_repo <- paste0(getOption("arrow.dev_repo", "https://nightlies.apache.org/arrow/r"), "/libarrow/")
}

if (!is_release && !test_mode) {
VERSION <- find_latest_nightly(VERSION)
}

# To collect dirs to rm on exit, use del() to add dirs
# To collect dirs to rm on exit, use cleanup() to add dirs
# we reset it to avoid errors on reruns in the same session.
options(.arrow.cleanup = character())
on.exit(unlink(getOption(".arrow.cleanup"), recursive = TRUE), add = TRUE)
Expand All @@ -867,6 +883,8 @@ build_ok <- !env_is("LIBARROW_BUILD", "false")
# https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
download_ok <- !test_mode && !env_is("TEST_OFFLINE_BUILD", "true")

download_libarrow_ok <- download_ok && !env_is("LIBARROW_DOWNLOAD", "false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this addition? Would not LIBARROW_BINARY=false do the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what exactly it caused to fail, but I think the idea is that LIBARROW_BINARY controls whether or not static libs are attempted, sort of, and LIBARROW_DOWNLOAD strictly controls whether or not an http request is issued to get that binary. Mostly I just noticed that it was already set in the CI job ( https://github.com/apache/arrow/actions/runs/6801463295/job/18492381028?pr=38534#step:5:1094 ) and it was somewhat complicated to figure out how to wire up LIBARROW_BINARY because of the levels of abstraction on top of the R CI jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, fine with me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding this to the other env-var documentation we have at

#### R package configuration
There are a number of other variables that affect the `configure` script and
the bundled build script. All boolean variables are case-insensitive.
| Name | Description | Default |
| --- | --- | :-: |
| `LIBARROW_BUILD` | Allow building from source | `true` |
| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building from source | (unset) |
| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` | `false` |
| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install | `true` |
| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
| `CMAKE` | Alternative CMake path | (unset) |
See below for more in-depth explanations of these environment variables.
* `LIBARROW_BINARY` : By default on many distributions, or if explicitly set to
`true`, the script will determine whether there is a prebuilt libarrow that
will work with your system. You can set it to `false` to skip this option
altogether, or you can specify a string "distro-version" that corresponds to
a binary that is available, to override what this function may discover by
default. Possible values are: "linux-openssl-1.0", "linux-openssl-1.1",
"linux-openssl-3.0".
* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working arrow R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
and resource-intensive, and ensure that you only use a prebuilt binary.
* `LIBARROW_MINIMAL` : If set to `false`, the build script
will enable some optional features, including S3
support and additional alternative memory allocators. This will increase the
source build time but results in a more fully functional library. If set to
`true` turns off Parquet, Datasets, compression libraries, and other optional
features. This is not commonly used but may be helpful if needing to compile
on a platform that does not support these features, e.g. Solaris.
* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
code in the package: see the developer guide article.
* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look for
Arrow libraries on your system and instead will look to download/build them.
Use this if you have a version mismatch between installed system libraries and
the version of the R package you're installing.
* `LIBARROW_DEBUG_DIR` : If the C++ library building from source fails (`cmake`),
there may be messages telling you to check some log file in the build directory.
However, when the library is built during R package installation,
that location is in a temp directory that is already deleted.
To capture those logs, set this variable to an absolute (not relative) path
and the log files will be copied there.
The directory will be created if it does not exist.
* `CMAKE` : When building the C++ library from source, you can specify a
`/path/to/cmake` to use a different version than whatever is found on the `$PATH`.
?

Doing so might also clarify better for all of us the differences between these / how they work together


# This "tools/thirdparty_dependencies" path, within the tar file, might exist if
# create_package_with_all_dependencies() was run, or if someone has created it
# manually before running make build.
Expand Down Expand Up @@ -904,7 +922,7 @@ if (!test_mode && !file.exists(api_h)) {
lg("File not found: %s ($ARROW_DOWNLOADED_BINARIES)", bin_zip)
bin_file <- NULL
}
} else if (download_ok) {
} else if (download_libarrow_ok) {
binary_flavor <- identify_binary()
if (!is.null(binary_flavor)) {
# The env vars say we can, and we've determined a lib that should work
Expand Down
65 changes: 65 additions & 0 deletions r/tools/test-nixlibs.R
Original file line number Diff line number Diff line change
Expand Up @@ -155,3 +155,68 @@ test_that("check_allowlist", {
expect_true(check_allowlist("redhat", tempfile())) # remote allowlist doesn't exist, so we fall back to the default list, which contains redhat
expect_false(check_allowlist("debian", tempfile()))
})

test_that("find_latest_nightly()", {
tf <- tempfile()
tf_uri <- paste0("file://", tf)
on.exit(unlink(tf))

writeLines(
c(
"Version: 13.0.0.100000333",
"Version: 13.0.0.100000334",
"Version: 13.0.0.100000335",
"Version: 14.0.0.100000001"
),
tf
)

expect_output(
expect_identical(
find_latest_nightly(package_version("13.0.1.9000"), list_uri = tf_uri),
package_version("13.0.0.100000335")
),
"Found latest nightly"
)

expect_output(
expect_identical(
find_latest_nightly(package_version("14.0.0.9000"), list_uri = tf_uri),
package_version("14.0.0.100000001")
),
"Found latest nightly"
)

expect_output(
expect_identical(
find_latest_nightly(package_version("15.0.0.9000"), list_uri = tf_uri),
package_version("15.0.0.9000")
),
"No nightly binaries were found for version"
)

# Check empty input
writeLines(character(), tf)
expect_output(
expect_identical(
find_latest_nightly(package_version("15.0.0.9000"), list_uri = tf_uri),
package_version("15.0.0.9000")
),
"No nightly binaries were found for version"
)

# Check input that will throw an error
expect_output(
expect_identical(
suppressWarnings(
find_latest_nightly(
package_version("15.0.0.9000"),
list_uri = "this is not a URI",
hush = TRUE
)
),
package_version("15.0.0.9000")
),
"Failed to find latest nightly"
)
})
1 change: 1 addition & 0 deletions r/vignettes/install.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ the bundled build script. All boolean variables are case-insensitive.
| --- | --- | :-: |
| `LIBARROW_BUILD` | Allow building from source | `true` |
| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building from source | (unset) |
| `LIBARROW_DOWNLOAD` | Set to `false` to explicitly forbid fetching a `libarrow` binary | (unset) |
| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` | `false` |
| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
Expand Down
Loading