-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] R CI flaky #2906
Comments
I want to acknowledge significant work over time by @nguyenv @mojaveazure @eddelbuettel and myself here. There has been significant investigation in this direction. I would note that these tend to be CI-only failures, not occurring in sandbox development, and thus more difficult to debug. Non-trivial time investment has already been done; this is a well-thought-out proposal. I also add that the end goal of CI is to find bugs before users do. Meanwhile the failures being tracked here are CI-only, not affecting end users, which not only makes them difficult to debug -- they function as a distraction from our true goals which are (a) delivering reliable software to our end users, and (b) completing the refactor work which we have reasonable confidence will resolve these CI-only failures. The proposal to temporarily move to a 24-hour cadence is that -- temporary -- and we will revisit once the R/cppification work is complete. |
There is a Heisenbug that has driven a few of us to madness. But when you e.g. do (in the cd tests
CI=true Rscripts testthat.R then all 3500+ tests pass. No skip, no fail. But it is frail, there is a possible interaction with the testrunner software (the widely-used |
I ran the following on an Ubuntu EC2 instance (m6a.4xlarge, AMI git clone https://github.com/single-cell-data/TileDB-SOMA
cd TileDB-SOMA/apis/r
tools/r-ci.sh bootstrap
Rscript <(cat <<EOF
rversion <- paste(strsplit(as.character(getRversion()), split = '\\.')[[1L]][1:2], collapse = '.')
codename <- system('. /etc/os-release; echo ${VERSION_CODENAME}', intern = TRUE)
repo <- "https://tiledb-inc.r-universe.dev"
(opt <- sprintf('options(repos = c("%s/bin/linux/%s/%s", "%s", getOption("repos")), timeout = 300L)', repo, codename, rversion, repo))
cat(opt, "\n", file = "~/.Rprofile", append = TRUE)
EOF
)
Rscript tools/install-tiledb-r.R
tools/r-ci.sh install_bioc SingleCellExperiment
Rscript -e "remotes::install_deps(dependencies = TRUE, upgrade = FALSE)"
R CMD build --no-build-vignettes --no-manual .
R CMD INSTALL $(ls -1tr *.tar.gz | tail -1)
cd tests
Rscript testthat.R # non-CI mode: OK ✅
export CI=true; Rscript testthat.R # ❌
export CI=true; Rscript testthat.R # ❌
export CI=true; Rscript testthat.R # ❌ The 3 failing test runs had identical output (modulo timings): Failing test output
Is that familiar to anyone? |
That's two out of 3496, they are in code undergoing current changes (timestamped read-write) so maybe we should just park these two. The Heisenbug is that things do fail earlier for some of us (SCEOutgest is a common one, sometimes later ones). One can run these test files individually too (that is tedious, and a fault of the overcomplicated This may all be related to Context and I have a change to handle that differently. Or it could be another race condition. edd@ahmad:~/git/tiledb-soma/apis/r/tests(main)$ CI=TRUE Rscript testthat.R
tiledbsoma: 1.13.99.2
tiledb-r: 0.29.0.4
tiledb core: 2.26.0
libtiledbsoma: 2.26.0
R: R version 4.4.1 (2024-06-14)
OS: Ubuntu 24.04 LTS
✔ | F W S OK | Context
✔ | 25 | Arrow-utils
✔ | 140 | Blockwise [4.3s]
✔ | 14 | ConfigList
✔ | 115 | CoordsStrider
✔ | 26 | EphemeralCollection
✔ | 14 | EphemeralExperiment
✔ | 17 | EphemeralMeasurement
✔ | 5 | example-datasets
✔ | 49 | Factory
✔ | 35 | IntIndexer
✔ | 9 | membership-caching
✔ | 11 | OrderedAndFactor
✔ | 26 | PlatformConfig
✔ | 277 | reopen [1.0s]
✔ | 70 | ScalarMap
✔ | 172 | SCEOutgest [10.5s]
✔ | 379 | SeuratIngest [16.3s]
✔ | 99 | SeuratOutgest-assay [3.2s]
✔ | 169 | SeuratOutgest-command [5.9s]
✔ | 49 | SeuratOutgest-graph [2.2s]
✔ | 134 | SeuratOutgest-object [7.2s]
✔ | 180 | SeuratOutgest-reduction [4.2s]
✔ | 29 | SingleCellExperimentIngest [11.0s]
✔ | 13 | SOMAArrayReader-Arrow
✔ | 80 | SOMAArrayReader-Iterated [6.0s]
✔ | 16 | SOMAAxisQuery
✔ | 22 | SOMACollection [2.8s]
✔ | 161 | SOMADataFrame [7.2s]
✔ | 52 | SOMADenseNDArray [1.3s]
✔ | 80 | SOMAExperiment-query-m-p [4.1s]
✔ | 33 | SOMAExperiment-query-matrix-outgest [1.9s]
✔ | 62 | SOMAExperiment-query [4.5s]
✔ | 12 | SOMAExperiment [1.1s]
✔ | 11 | SOMAMeasurement
✔ | 9 | SOMAOpen
✔ | 156 | SOMASparseNDArray [2.2s]
✔ | 57 | SOMATileDBContext
✔ | 2 | Stats
✔ | 26 | SummarizedExperimentIngest [2.8s]
✔ | 16 | TileDBArray
✔ | 52 | TileDBCreateOptions
✔ | 37 | TileDBGroup
✔ | 21 | TileDBURI
✔ | 50 | utils-matrixZeroBasedView
✔ | 24 | utils-uris
✔ | 13 | utils
✔ | 3 | version
✔ | 190 | write-soma-objects [3.7s]
✔ | 315 | write-soma-resume [37.4s]
══ Results ════════════════════════════════════════════════════════════════════════════════
Duration: 144.5 s
[ FAIL 0 | WARN 0 | SKIP 0 | PASS 3557 ]
edd@ahmad:~/git/tiledb-soma/apis/r/tests(main)$ |
It's possible that #2926 reduces the frequency of these failures; two runs there I re-ran several times this weekend:
This is in contrast to I don't think there's a known reason for #2926 to reduce the frequency of these failures, and @eddelbuettel mentioned separately that the R CI has gone through previous phases of failing with more or less frequency, just flagging: the failure frequency may be about to decrease (or we will find that failures occur preferentially on There's also discussion around disabling failing tests on #2922. |
I updated the issue description, but posting here as well for visibility:
|
Update: #2926 seems to have mitigated the failures significantly (though brute-force re-running R CI on that PR still failed 4 of 22 attempts: #5814, #5815).
R CI currently shows mostly successes, though run #5864 (from #2919, commit 9cd956f, which includes #2926) failed.
We can continue using this issue to track failures as we see them. My understanding is we don't expect #2926 to have actually fixed the issue, it has just made the failures less frequent.
@jp-dark also flagged recently that ASAN was inadvertently broken in #2773; fixing that (and running it as part of CI) may help root-cause the issue here. Related internal tracking: sc-34306, sc-53780.
Previous write-up
r-ci.yml
has been failing onmain
since mid-June.Example output
The last output is typically
70 | ScalarMap
, before a segfault:The segfault messages vary;
memory not mapped
,bad_function_call
,invalid permissions
,address (nil), cause 'unknown'
. ryan-williams/tdbs-r-ci-dbg contains some analysis of the logs of failing runs, including frequencies of various error messages.Breakdown by error message
Updated at 2024-08-29T09:01:20Z (by #17)
"memory not mapped"
Seen 52x, most recently #5816 (at 2024-08-23T03:41):
All runs
#5419, #5422, #5423, #5442, #5444, #5449, #5451, #5452, #5464, #5476, #5486, #5492, #5512, #5521, #5523, #5537, #5538, #5552, #5559, #5563, #5571, #5582, #5586, #5588, #5602, #5603, #5604, #5605, #5618, #5621, #5623, #5634, #5635, #5640, #5641, #5644, #5646, #5648, #5686, #5690, #5693, #5696, #5698, #5706, #5711, #5727, #5736, #5743, #5784, #5785, #5808, #5816
"bad_function_call"
Seen 12x, most recently #5820 (at 2024-08-23T16:04):
All runs
#5467, #5527, #5560, #5615, #5637, #5654, #5673, #5707, #5738, #5764, #5786, #5820
"invalid permissions"
Seen 12x, most recently #5790 (at 2024-08-20T13:45):
All runs
#5418, #5459, #5518, #5556, #5568, #5616, #5688, #5714, #5724, #5729, #5737, #5790
"address (nil), cause 'unknown'"
Seen 7x, most recently #5787 (at 2024-08-19T16:45):
All runs
#5511, #5532, #5550, #5657, #5746, #5750, #5787
Occasional successful runs
A few runs have passed on PR branches (e.g. 1, 2). However, I tried re-running the first one, and it failed, so it seems like it's just non-deterministic, and usually fails. The
dark/images
branch did pass 3x in a row, 8/8-11.Plan for debugging / fixing?
Several folks on the team have made some progress debugging it, but a fix remains elusive.
From discussions with @johnkerl, ongoing work to remove our dependency on TileDB-R may resolve it.
Change workflow to 24hr cron?
It's been causing "spurious" ❌'s on PRs (and
main
) for 2 months, which can mask other failures.Disabling it altogether feels too aggressive; it's good to have some visibility into the state of the R tests.
Setting it to run against
main
every 24hrs might be a good compromise.The text was updated successfully, but these errors were encountered: