-
Notifications
You must be signed in to change notification settings - Fork 3.6k
/
Copy pathinstall.Rmd
554 lines (415 loc) · 27.7 KB
/
install.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
---
title: "Installing on Linux"
description: >
Installing arrow on linux usually just works, but occasionally poses
problems. Learn how to handle installation problems if and when they arise
output: rmarkdown::html_vignette
---
In most cases, `install.packages("arrow")` should just work. There are things you can do to make the installation faster, documented in this article. If for some reason installation does not work, set the environment variable `ARROW_R_DEV=true`, retry, and share the logs with us.
## Background
The Apache Arrow project is implemented in multiple languages, and the R package depends on the Arrow C++ library (referred to from here on as libarrow). This means that when you install arrow, you need both the R and C++ versions. If you install arrow from CRAN on a machine running Windows or MacOS, when you call `install.packages("arrow")`, a precompiled binary containing both the R package and libarrow will be downloaded. However, CRAN does not host R package binaries for Linux, and so you must choose from one of the alternative approaches.
This article outlines the recommend approaches to installing arrow on Linux, starting from the simplest and least customizable to the most complex but with more flexbility to customize your installation.
The primary audience for this document is arrow R package _users_ on Linux, and not Arrow _developers_. Additional resources for developers are listed at the end of this article.
## System dependencies
The arrow package is designed to work with very minimal system requirements,
but there are a few things to note.
### Compilers
As of version 10.0.0, arrow requires a C++17 compiler to build.
For `gcc`, this generally means version 7 or newer. Most contemporary Linux
distributions have a new enough compiler; however, CentOS 7 is a notable
exception, as it ships with gcc 4.8.
If you are on CentOS 7, to build arrow you will need to install a newer `devtoolset`, and you'll need to update R's Makevars to define the `CXX17` variables. This script installs `devtoolset-8` and configures R to be able to use C++17:
```
#!/usr/bin/env bash
yum install -y centos-release-scl
yum install -y devtoolset-8
# Optional: also install cloud storage dependencies, as described below
yum install -y libcurl-devel openssl-devel
source /opt/rh/devtoolset-8/enable
if [ ! `R CMD config CXX17` ]; then
mkdir -p ~/.R
echo "CC = $(which gcc) -fPIC" >> ~/.R/Makevars
echo "CXX17 = $(which g++) -fPIC" >> ~/.R/Makevars
echo "CXX17STD = -std=c++17" >> ~/.R/Makevars
echo "CXX17FLAGS = ${CXX11FLAGS}" >> ~/.R/Makevars
fi
```
Note that the C++17 compiler is only required at *build* time. You don't need
to enable the devtoolset every time you load the package. What's more, if you
install a binary package from RStudio Package Manager (see method 1a below), you
do not need to set up any of this. Likewise, if you `R CMD INSTALL --build`
arrow on a CentOS machine with the newer compilers, you can take the binary
package it produces and install it on any other CentOS machine without those
compilers.
### Libraries
Optional support for reading from cloud storage--AWS S3 and
Google Cloud Storage (GCS)--requires additional system dependencies:
* CURL: install `libcurl-devel` (rpm) or `libcurl4-openssl-dev` (deb)
* OpenSSL >= 1.0.2: install `openssl-devel` (rpm) or `libssl-dev` (deb)
The prebuilt binaries come with S3 and GCS support enabled, so you will need to meet these system requirements in order to use them. If you're building everything from source, the install script will check for the presence of these dependencies and turn off S3 and GCS support in the build if the prerequisites are not met--installation will succeed but without S3 or GCS functionality. If afterwards you install the missing system requirements, you'll need to reinstall the package in order to enable S3 and GCS support.
## Install release version (easy way)
On macOS and Windows, when you run `install.packages("arrow")` and install arrow from CRAN, you get an R binary package that contains a precompiled version of libarrow. Installing binaries is much easier that installing from source, but CRAN does not host binaries for Linux. This means that the default behaviour when you run `install.packages()` on Linux is to retrieve the source version of the R package and compile both the R package _and_ libarrow from source. We'll talk about this scenario in the next section (the "less easy" way), but first we'll suggest two faster alternatives that are usually much easier.
### Binary R package with libarrow binary via RSPM/conda
```{r, echo=FALSE, out.width="30%", fig.alt = "Graphic showing R and C++ logo inside the package icon"}
knitr::include_graphics("./r_binary_libarrow_binary.png")
```
If you want a quicker installation process, and by default a more fully-featured build, you could install arrow from [RStudio's public package manager](https://packagemanager.rstudio.com/client/#/), which hosts binaries for both Windows and Linux.
For example, if you are using Ubuntu 20.04 (Focal):
```r
options(
HTTPUserAgent =
sprintf(
"R/%s R (%s)",
getRversion(),
paste(getRversion(), R.version["platform"], R.version["arch"], R.version["os"])
)
)
install.packages("arrow", repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest")
```
Note that the User Agent header must be specified as in the example above.
Please check [the RStudio Package Manager: Admin Guide](https://docs.posit.co/rspm/admin/serving-binaries/#using-linux-binary-packages) for more details.
For other Linux distributions, to get the relevant URL, you can visit
[the RSPM site](https://packagemanager.rstudio.com/client/#/repos/1/overview),
click on 'binary', and select your preferred distribution.
Similarly, if you use `conda` to manage your R environment, you can get the
latest official release of the R package including libarrow via:
```shell
# Using the --strict-channel-priority flag on `conda install` causes very long
# solve times, so we add it directly to the config
conda config --set channel_priority strict
conda install -c conda-forge r-arrow
```
### R source package with libarrow binary
```{r, echo=FALSE, out.width="50%", fig.alt = "Graphic showing R logo in folder icon, then a plus sign, then C++ logo inside the package icon"}
knitr::include_graphics("./r_source_libarrow_binary.png")
```
Another way of achieving faster installation with all key features enabled is to use static libarrow binaries we host. These are used automatically on many Linux distributions (x86_64 architecture only), according to the [allowlist](https://github.com/apache/arrow/blob/main/r/tools/nixlibs-allowlist.txt). If your distribution isn't in the list, you can opt-in by setting the `NOT_CRAN` environment variable before you call `install.packages()`:
```{r, eval = FALSE}
Sys.setenv("NOT_CRAN" = "true")
install.packages("arrow")
```
This installs the source version of the R package, but during the installation process will check for compatible libarrow binaries that we host and use those if available. If no binary is available or can't be found, then this option falls back onto method 2 below (full source build), but setting the environment variable results in a more fully-featured build than default.
The libarrow binaries include support for AWS S3 and GCS, so they require the
libcurl and openssl libraries installed separately, as noted above.
If you don't have these installed, the libarrow binary won't be used, and you will fall back to the full source build (with S3 and GCS support disabled).
If the internet access of your computer doesn't allow downloading the libarrow binaries (e.g. if access is limited to CRAN), you can first identify the right source and version by trying to install on the offline computer:
```{r, eval = FALSE}
Sys.setenv("NOT_CRAN" = "true", "LIBARROW_BUILD" = FALSE, "ARROW_R_DEV" = TRUE)
install.packages("arrow")
# This will fail if no internet access, but will print the binaries URL
```
Then you can obtain the libarrow binaries (using a computer with internet access) and transfer the zip file to the target computer. Now you just have to tell the installer to use that pre-downloaded file:
```{r, eval = FALSE}
# Watchout: release numbers of the pre-downloaded libarrow must match CRAN!
Sys.setenv("ARROW_DOWNLOADED_BINARIES" = "/path/to/downloaded/libarrow.zip")
install.packages("arrow")
```
## Install release version (less easy)
```{r, echo=FALSE, out.width="50%", fig.alt = "Graphic showing R inside a folder icon, then a plus sign, then C++ logo inside a folder icon"}
knitr::include_graphics("./r_source_libarrow_source.png")
```
The "less easy" way to install arrow is to install both the R package and the underlying Arrow C++ library (libarrow) from source. This method is somewhat more
difficult because compiling and installing R packages with C++ dependencies
generally requires installing system packages, which you may not have privileges
to do, and/or building the C++ dependencies separately, which introduces all sorts
of additional ways for things to go wrong.
Installing from the full source build of arrow, compiling both C++ and R
bindings, will handle most of the dependency management for you, but it is
much slower than using binaries. However, if using binaries isn't an option
for you,or you wish to customize your Linux installation, the instructions in
this section explain how to do that.
### Basic configuration
If you wish to install libarrow from source instead of looking for pre-compiled
binaries, you can set the `LIBARROW_BINARY` variable.
```{r, eval = FALSE}
Sys.setenv("LIBARROW_BINARY" = FALSE)
```
By default, this is set to `TRUE`, and so libarrow will only be built from
source if this environment variable is set to `FALSE` or no compatible binary
for your OS can be found.
When compiling libarrow from source, you have the power to really fine-tune
which features to install. You can set the environment variable
`LIBARROW_MINIMAL` to `FALSE` to enable a more full-featured build including S3 support
and alternative memory allocators.
```{r, eval = FALSE}
Sys.setenv("LIBARROW_MINIMAL" = FALSE)
```
By default this variable is unset, which builds many commonly used features such as Parquet support but disables some features that are more costly to build, like S3 and GCS support. If set to `TRUE`, a trimmed-down version of
arrow is installed with all optional features disabled.
Note that in this guide, you will have seen us mention the environment variable
`NOT_CRAN` - this is a convenience variable, which when set to `TRUE`,
automatically sets `LIBARROW_MINIMAL` to `FALSE` and `LIBARROW_BINARY` to `TRUE`.
Building libarrow from source requires more time and resources than installing
a binary. We recommend that you set the environment variable `ARROW_R_DEV` to
`TRUE` for more verbose output during the installation process if anything goes
wrong.
```{r, eval = FALSE}
Sys.setenv("ARROW_R_DEV" = TRUE)
```
Once you have set these variables, call `install.packages()` to install arrow
using this configuration.
```{r, eval = FALSE}
install.packages("arrow")
```
The section below discusses environment variables you can set before calling
`install.packages("arrow")` to build from source and customise your configuration.
#### Handling libarrow dependencies
When you build libarrow from source, its dependencies will be automatically downloaded.
The environment variable `ARROW_DEPENDENCY_SOURCE` controls whether the libarrow
installation also downloads or installs all dependencies (when set to `BUNDLED`),
uses only system-installed dependencies (when set to `SYSTEM`) or checks
system-installed dependencies first and only installs dependencies which aren't
already present (when set to `AUTO`, the default).
These dependencies vary by platform; however, if you wish to install these
yourself prior to libarrow installation, we recommend that you take a look at
the [docker file for whichever of our CI builds](https://github.com/apache/arrow/tree/main/ci/docker)
(the ones ending in "cpp" are for building Arrow's C++ libaries, aka libarrow)
corresponds most closely to your setup. This will contain the most up-to-date
information about dependencies and minimum versions.
If downloading dependencies at build time is not an option, as when building on a system that is disconnected or behind a firewall, there are a few options. See "Offline builds" below.
#### Dependencies for S3 and GCS support
Support for working with data in S3 and GCS is not enabled in the default
source build, and it has additional system requirements as described above. To
enable it, set the environment variable `LIBARROW_MINIMAL=false` or
`NOT_CRAN=true` to choose the full-featured build, or more selectively set
`ARROW_S3=ON` and/or `ARROW_GCS=ON`.
When either feature is enabled, the install script will check for the presence
of the required dependencies, and if the prerequisites are met, it will turn
off S3 and GCS support--installation will succeed but without S3 or GCS
functionality. If afterwards you install the missing system requirements,
you'll need to reinstall the package in order to enable S3 and GCS support.
### Advanced configuration
In this section, we describe how to fine-tune your installation at a more granular level.
#### libarrow configuration
Some features are optional when you build Arrow from source - you can configure
whether these components are built via the use of environment variables. The
names of the environment variables which control these features and their
default values are shown below.
| Name | Description | Default Value |
| ---| --- | :-: |
| `ARROW_S3` | S3 support (if dependencies are met)* | `OFF` |
| `ARROW_GCS` | GCS support (if dependencies are met)* | `OFF` |
| `ARROW_JEMALLOC` | The `jemalloc` memory allocator | `ON` |
| `ARROW_MIMALLOC` | The `mimalloc` memory allocator | `ON` |
| `ARROW_PARQUET` | | `ON` |
| `ARROW_DATASET` | | `ON` |
| `ARROW_JSON` | The JSON parsing library | `ON` |
| `ARROW_WITH_RE2` | The RE2 regular expression library, used in some string compute functions | `ON` |
| `ARROW_WITH_UTF8PROC` | The UTF8Proc string library, used in many other string compute functions | `ON` |
| `ARROW_WITH_BROTLI` | Compression algorithm | `ON` |
| `ARROW_WITH_BZ2` | Compression algorithm | `ON` |
| `ARROW_WITH_LZ4` | Compression algorithm | `ON` |
| `ARROW_WITH_SNAPPY` | Compression algorithm | `ON` |
| `ARROW_WITH_ZLIB` | Compression algorithm | `ON` |
| `ARROW_WITH_ZSTD` | Compression algorithm | `ON` |
#### R package configuration
There are a number of other variables that affect the `configure` script and
the bundled build script. All boolean variables are case-insensitive.
| Name | Description | Default |
| --- | --- | :-: |
| `LIBARROW_BUILD` | Allow building from source | `true` |
| `LIBARROW_BINARY` | Try to install `libarrow` binary instead of building from source | (unset) |
| `LIBARROW_MINIMAL` | Build with minimal features enabled | (unset) |
| `NOT_CRAN` | Set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false` | `false` |
| `ARROW_R_DEV` | More verbose messaging and regenerates some code | `false` |
| `ARROW_USE_PKG_CONFIG` | Use `pkg-config` to search for `libarrow` install | `true` |
| `LIBARROW_DEBUG_DIR` | Directory to save source build logs | (unset) |
| `CMAKE` | Alternative CMake path | (unset) |
See below for more in-depth explanations of these environment variables.
* `LIBARROW_BINARY` : By default on many distributions, or if explicitly set to
`true`, the script will determine whether there is a prebuilt libarrow that
will work with your system. You can set it to `false` to skip this option
altogether, or you can specify a string "distro-version" that corresponds to
a binary that is available, to override what this function may discover by
default. Possible values are: "linux-openssl-1.0", "linux-openssl-1.1",
"linux-openssl-3.0".
* `LIBARROW_BUILD` : If set to `false`, the build script
will not attempt to build the C++ from source. This means you will only get
a working arrow R package if a prebuilt binary is found.
Use this if you want to avoid compiling the C++ library, which may be slow
and resource-intensive, and ensure that you only use a prebuilt binary.
* `LIBARROW_MINIMAL` : If set to `false`, the build script
will enable some optional features, including S3
support and additional alternative memory allocators. This will increase the
source build time but results in a more fully functional library. If set to
`true` turns off Parquet, Datasets, compression libraries, and other optional
features. This is not commonly used but may be helpful if needing to compile
on a platform that does not support these features, e.g. Solaris.
* `NOT_CRAN` : If this variable is set to `true`, as the `devtools` package does,
the build script will set `LIBARROW_BINARY=true` and `LIBARROW_MINIMAL=false`
unless those environment variables are already set. This provides for a more
complete and fast installation experience for users who already have
`NOT_CRAN=true` as part of their workflow, without requiring additional
environment variables to be set.
* `ARROW_R_DEV` : If set to `true`, more verbose messaging will be printed
in the build script. `arrow::install_arrow(verbose = TRUE)` sets this.
This variable also is needed if you're modifying C++
code in the package: see the developer guide article.
* `ARROW_USE_PKG_CONFIG`: If set to `false`, the configure script won't look for
Arrow libraries on your system and instead will look to download/build them.
Use this if you have a version mismatch between installed system libraries and
the version of the R package you're installing.
* `LIBARROW_DEBUG_DIR` : If the C++ library building from source fails (`cmake`),
there may be messages telling you to check some log file in the build directory.
However, when the library is built during R package installation,
that location is in a temp directory that is already deleted.
To capture those logs, set this variable to an absolute (not relative) path
and the log files will be copied there.
The directory will be created if it does not exist.
* `CMAKE` : When building the C++ library from source, you can specify a
`/path/to/cmake` to use a different version than whatever is found on the `$PATH`.
## Using install_arrow()
The previous instructions are useful for a fresh arrow installation, but arrow
provides the function `install_arrow()`. There are three common use cases for this function:
- You have arrow installed and want to upgrade to a different version
- You want to try to reinstall and fix issues with Linux C++ binaries
- You want to install a development build
Examples of using `install_arrow()` are shown below:
```r
install_arrow() # latest release
install_arrow(nightly = TRUE) # install development version
install_arrow(verbose = TRUE) # verbose output to debug install errors
```
Although this function is part of the arrow package, it is also available as
a standalone script, so you can access it without first installing the package:
```r
source("https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R")
```
Notes:
- `install_arrow()` does not require environment variables to be set in order to
satisfy C++ dependencies.
- unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful arrow installation.
## Offline installation
The `install-arrow.R` file mentioned in the previous section includes a
function called `create_package_with_all_dependencies()`. Normally, when
installing on a computer with internet access, the build process will
download third-party dependencies as needed. This function provides a
way to download them in advance, which can be useful when installing Arrow
on a computer without internet access. The process is as follows:
**Step 1.** Using a computer with internet access, download dependencies:
* Install the arrow package **or** source the script directly using the following command:
```r
source("https://raw.githubusercontent.com/apache/arrow/main/r/R/install-arrow.R")
```
* Use the `create_package_with_all_dependencies()` function to create the installation bundle:
```r
create_package_with_all_dependencies("my_arrow_pkg.tar.gz")
```
* Copy the newly created `my_arrow_pkg.tar.gz` file to the computer without internet access
**Step 2.** On the computer without internet access, install the prepared package:
* Install the arrow package from the copied file:
```r
install.packages(
"my_arrow_pkg.tar.gz",
dependencies = c("Depends", "Imports", "LinkingTo")
)
```
This installation will build from source, so `cmake` must be available
* Run `arrow_info()` to check installed capabilities
Notes:
- arrow _can_ be installed on a computer without internet access
without using this function, but many useful features will be disabled,
as they depend on third-party components. More precisely,
`arrow::arrow_info()$capabilities()` will be `FALSE` for every
capability.
- If you are using binary packages you shouldn't need to this function. You
can download the appropriate binary from your package repository, transfer
that to the offline computer, and install that.
- If you're using RStudio Package Manager on Linux (RSPM), and you want to
make a source bundle with this function, make sure to set the first repository
in `options("repos")` to be a mirror that contains source packages. That is, the
repository needs to be something other than the RSPM binary mirror URLs.
## Offline installation (alternative)
A second method for offline installation is a little more hands-on. Follow these steps if you wish to try it:
* Download the dependency files (`cpp/thirdparty/download_dependencies.sh` may be helpful)
* Copy the directory of dependencies to the offline computer
* Create the environment variable `ARROW_THIRDPARTY_DEPENDENCY_DIR` on the offline computer, pointing to the copied directory.
* Install the arrow package as usual.
For offline installation using libarrow binaries, see Method 1b above.
## Troubleshooting
The intent is that `install.packages("arrow")` will just work and handle all C++
dependencies, but depending on your system, you may have better results if you
tune one of several parameters. Here are some known complications and ways to address them.
### Package failed to build C++ dependencies
If you see a message like
```
------------------------- NOTE ---------------------------
There was an issue preparing the Arrow C++ libraries.
See https://arrow.apache.org/docs/r/articles/install.html
---------------------------------------------------------
```
in the output when the package fails to install,
that means that installation failed to retrieve or build the libarrow version
compatible with the current version of the R package.
Please check the "Known installation issues" below to see if any apply, and if
none apply, set the environment variable `ARROW_R_DEV=TRUE` for more verbose
output and try installing again. Then,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues)
and include the full installation output.
### Using system libraries
If a system library or other installed Arrow is found but it doesn't match the R package version
(for example, you have libarrow 1.0.0 on your system and are installing R package 2.0.0),
it is likely that the R bindings will fail to compile.
Because the Apache Arrow project is under active development,
it is essential that versions of libarrow and the R package matches.
When `install.packages("arrow")` has to download libarrow,
the install script ensures that you fetch the libarrow version that corresponds to your R package version.
However, if you are using a version of libarrow already on your system, version match isn't guaranteed.
To fix version mismatch, you can either update your libarrow system packages to match the R package version,
or set the environment variable `ARROW_USE_PKG_CONFIG=FALSE`
to tell the configure script not to look for system version of libarrow.
(The latter is the default of `install_arrow()`.)
System libarrow versions are available corresponding to all CRAN releases
but not for nightly or dev versions, so depending on the R package version you're installing,
system libarrow version may not be an option.
Note also that once you have a working R package installation based on system (shared) libraries,
if you update your system libarrow installation, you'll need to reinstall the R package to match its version.
Similarly, if you're using libarrow system libraries, running `update.packages()`
after a new release of the arrow package will likely fail unless you first
update the libarrow system packages.
### Using prebuilt binaries
If the R package finds and downloads a prebuilt binary of libarrow,
but then the arrow package can't be loaded, perhaps with "undefined symbols" errors,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues).
This is likely a compiler mismatch and may be resolvable by setting some
environment variables to instruct R to compile the packages to match libarrow.
A workaround would be to set the environment variable `LIBARROW_BINARY=FALSE`
and retry installation: this value instructs the package to build libarrow from source
instead of downloading the prebuilt binary.
That should guarantee that the compiler settings match.
If a prebuilt libarrow binary wasn't found for your operating system but you think it should have been,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues) and share the console output.
You may also set the environment variable `ARROW_R_DEV=TRUE` for additional
debug messages.
### Building libarrow from source
If building libarrow from source fails, check the error message.
(If you don't see an error message, only the `----- NOTE -----`,
set the environment variable `ARROW_R_DEV=TRUE` to increase verbosity and retry installation.)
The install script should work everywhere, so if libarrow fails to compile,
please [report an issue](https://issues.apache.org/jira/projects/ARROW/issues)
so that we can improve the script.
### Known installation issues
* On CentOS, building the package requires a more modern `devtoolset` than the default system compilers. See "System dependencies" above.
## Contributing
We are constantly working to make the installation process as painless as
possible. If you find ways to improve the process, please [report an issue](https://github.com/apache/arrow/issues) so that we can
document it. Similarly, if you find that your Linux distribution
or version is not supported, we would welcome the contribution of Docker
images (hosted on Docker Hub) that we can use in our continuous integration
and hopefully improve our coverage. If you do contribute a Docker image,
it should be as minimal as possible, containing only R and the dependencies it
requires. For reference, see the images that
[R-hub](https://github.com/r-hub/rhub-linux-builders) uses.
You can test the arrow R package installation using the `docker-compose`
setup included in the `apache/arrow` git repository. For example,
```
R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose build r
R_ORG=rhub R_IMAGE=ubuntu-gcc-release R_TAG=latest docker-compose run r
```
installs the arrow R package, including libarrow, on the
[rhub/ubuntu-gcc-release](https://hub.docker.com/r/rhub/ubuntu-gcc-release)
image.
## Further reading
- To learn about installing development versions, see the article on [installing nightly builds](./install_nightly.html).
- If you're contributing to the Arrow project, see the [Arrow R developers guide](./developing.html) for resources to help you on set up your development environment.
- Arrow developers may also wish to read a more detailed discussion of the code run during the installation process, described in the [install details article](./developers/install_details.html).