Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38430: [R] Add test + fix corner cases after nixlibs.R refactor #38534

Merged
merged 8 commits into from
Nov 10, 2023

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Oct 31, 2023

Rationale for this change

A few rough edges exist after #38236 including:

  • When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
  • At least one CI job is pulling nightlies instead of using the version from the current commit

What changes are included in this PR?

  • Clean up find_latest_nightly() + add test
  • Ensure all CI jobs are not using find_latest_nightly()

Are these changes tested?

Yes (test added)

Are there any user-facing changes?

No

Copy link

⚠️ GitHub issue #38430 has been automatically assigned in GitHub to PR creator.

@paleolimbot
Copy link
Member Author

@github-actions crossbow submit r-binary-packages

Copy link

github-actions bot commented Nov 2, 2023

Revision: 1c25889

Submitted crossbow builds: ursacomputing/crossbow @ actions-84648ca1ab

Task Status
r-binary-packages Github Actions

@paleolimbot paleolimbot marked this pull request as ready for review November 3, 2023 16:07
@paleolimbot
Copy link
Member Author

@github-actions crossbow submit r-binary-packages

Copy link

github-actions bot commented Nov 3, 2023

Revision: 1e8d8f4

Submitted crossbow builds: ursacomputing/crossbow @ actions-c6ce43e763

Task Status
r-binary-packages Github Actions

@paleolimbot
Copy link
Member Author

@assignUser Can you take a look at this?

I'd like to add in a comment regarding whether we do or do not intend to run detection when source()ing the nixlibs in the test-nixlibs.R. It's not a particularly good test if that is the intention, although I think it's fine if sourcing the file in test mode happens to run all the way though without warning.

Copy link
Member

@assignUser assignUser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nice additions, looks good!

I think it makes sense to not look for latest nightly when sourceing the script. As that only happens for tests and there it makes more sense to test the function directly and not produce spurious output in the test file.

@@ -35,7 +35,9 @@ exit <- function(..., .status = 1) {


# checks the nightly repo for the latest nightly version X.Y.Z.100<dev>
find_latest_nightly <- function(description_version) {
find_latest_nightly <- function(description_version,
list_uri = "https://nightlies.apache.org/arrow/r/src/contrib/PACKAGES",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah that makes more sense ^^

@@ -867,6 +887,8 @@ build_ok <- !env_is("LIBARROW_BUILD", "false")
# https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
download_ok <- !test_mode && !env_is("TEST_OFFLINE_BUILD", "true")

download_libarrow_ok <- download_ok && !env_is("LIBARROW_DOWNLOAD", "false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this addition? Would not LIBARROW_BINARY=false do the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what exactly it caused to fail, but I think the idea is that LIBARROW_BINARY controls whether or not static libs are attempted, sort of, and LIBARROW_DOWNLOAD strictly controls whether or not an http request is issued to get that binary. Mostly I just noticed that it was already set in the CI job ( https://github.com/apache/arrow/actions/runs/6801463295/job/18492381028?pr=38534#step:5:1094 ) and it was somewhat complicated to figure out how to wire up LIBARROW_BINARY because of the levels of abstraction on top of the R CI jobs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, fine with me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding this to the other env-var documentation we have at

#### R package configuration
There are a number of other variables that affect the `configure` script and
the bundled build script. All boolean variables are case-insensitive.
| Name | Description | Default |
| --- | --- | :-: |
| `LIBARROW_BUILD` | Allow building from source | `true` |
| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building from source | (unset) |
| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` | `false` |
| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install | `true` |
| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
| `CMAKE` | Alternative CMake path | (unset) |
See below for more in-depth explanations of these environment variables.
* `LIBARROW_BINARY` : By default on many distributions, or if explicitly set to
`true`, the script will determine whether there is a prebuilt libarrow that
will work with your system. You can set it to `false` to skip this option
altogether, or you can specify a string "distro-version" that corresponds to
a binary that is available, to override what this function may discover by
default. Possible values are: "linux-openssl-1.0", "linux-openssl-1.1",
"linux-openssl-3.0".
* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working arrow R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
and resource-intensive, and ensure that you only use a prebuilt binary.
* `LIBARROW_MINIMAL` : If set to `false`, the build script
will enable some optional features, including S3
support and additional alternative memory allocators. This will increase the
source build time but results in a more fully functional library. If set to
`true` turns off Parquet, Datasets, compression libraries, and other optional
features. This is not commonly used but may be helpful if needing to compile
on a platform that does not support these features, e.g. Solaris.
* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
code in the package: see the developer guide article.
* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look for
Arrow libraries on your system and instead will look to download/build them.
Use this if you have a version mismatch between installed system libraries and
the version of the R package you're installing.
* `LIBARROW_DEBUG_DIR` : If the C++ library building from source fails (`cmake`),
there may be messages telling you to check some log file in the build directory.
However, when the library is built during R package installation,
that location is in a temp directory that is already deleted.
To capture those logs, set this variable to an absolute (not relative) path
and the log files will be copied there.
The directory will be created if it does not exist.
* `CMAKE` : When building the C++ library from source, you can specify a
`/path/to/cmake` to use a different version than whatever is found on the `$PATH`.
?

Doing so might also clarify better for all of us the differences between these / how they work together

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 9, 2023
@paleolimbot paleolimbot requested a review from jonkeane November 9, 2023 13:20
r/tools/nixlibs.R Outdated Show resolved Hide resolved
r/tools/nixlibs.R Outdated Show resolved Hide resolved
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Nov 9, 2023
@paleolimbot
Copy link
Member Author

@github-actions crossbow submit r-binary-packages

Copy link

github-actions bot commented Nov 9, 2023

Revision: 371e003

Submitted crossbow builds: ursacomputing/crossbow @ actions-2269729e3b

Task Status
r-binary-packages Github Actions

@paleolimbot
Copy link
Member Author

Hmm...I'm also getting

   *** Trying Arrow C++ in ARROW_HOME: /Users/deweydunnington/.r-arrow-dev-build/dist
   **** Not using: C++ library version (14.0.0-SNAPSHOT) does not match R package (14.0.0.9000)

locally, which is problematic!

@paleolimbot
Copy link
Member Author

Oh just kidding...the C++ version bumped since the last time I rebuilt.

@jonkeane
Copy link
Member

If the CI is good, I'm ok to merge this, though if you're looking for some other people to test this locally let me know and I'll pull it and try it (I will admit I have not yet!)

@paleolimbot
Copy link
Member Author

Thank you for taking a look! I checked this locally and I'm comfortable merging 🙂

@paleolimbot paleolimbot merged commit c260a24 into apache:main Nov 10, 2023
10 checks passed
@paleolimbot paleolimbot removed the awaiting change review Awaiting change review label Nov 10, 2023
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit c260a24.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

@jonkeane
Copy link
Member

Thanks for this!

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…tor (apache#38534)

### Rationale for this change

A few rough edges exist after apache#38236 including:

- When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
- At least one CI job is pulling nightlies instead of using the version from the current commit

### What changes are included in this PR?

- Clean up `find_latest_nightly()` + add test
- Ensure all CI jobs are not using `find_latest_nightly()`

### Are these changes tested?

Yes (test added)

### Are there any user-facing changes?

No
* Closes: apache#38430

Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
@paleolimbot paleolimbot deleted the r-nixlibs-improve branch November 15, 2023 13:12
raulcd pushed a commit that referenced this pull request Nov 28, 2023
…38534)

### Rationale for this change

A few rough edges exist after #38236 including:

- When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
- At least one CI job is pulling nightlies instead of using the version from the current commit

### What changes are included in this PR?

- Clean up `find_latest_nightly()` + add test
- Ensure all CI jobs are not using `find_latest_nightly()`

### Are these changes tested?

Yes (test added)

### Are there any user-facing changes?

No
* Closes: #38430

Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
assignUser pushed a commit to assignUser/arrow that referenced this pull request Dec 1, 2023
…tor (apache#38534)

### Rationale for this change

A few rough edges exist after apache#38236 including:

- When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
- At least one CI job is pulling nightlies instead of using the version from the current commit

### What changes are included in this PR?

- Clean up `find_latest_nightly()` + add test
- Ensure all CI jobs are not using `find_latest_nightly()`

### Are these changes tested?

Yes (test added)

### Are there any user-facing changes?

No
* Closes: apache#38430

Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…tor (apache#38534)

### Rationale for this change

A few rough edges exist after apache#38236 including:

- When zero or 1 nightly with the matching major version exist, detection of the latest nightly might fail
- At least one CI job is pulling nightlies instead of using the version from the current commit

### What changes are included in this PR?

- Clean up `find_latest_nightly()` + add test
- Ensure all CI jobs are not using `find_latest_nightly()`

### Are these changes tested?

Yes (test added)

### Are there any user-facing changes?

No
* Closes: apache#38430

Lead-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[R] Add test for new nixlibs.R
3 participants