-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in H5Fopen(file): Unable to open HDF5 file when using subsetArchRProject to create new arrow files #248
Comments
Hi,
I tried processing H5 files in other contexts and it was ok. I hope you guys can help me solve this problem. Thanks a lot! |
I encontered the exact same error with ArrowFiles <- createArrowFiles(
inputFiles = inputFiles,
sampleNames = names(inputFiles),
filterTSS = 4, #Dont set this too high because you can always increase later
filterFrags = 1000,
addTileMat = TRUE,
addGeneScoreMat = TRUE
)
doubScores <- addDoubletScores(
input = ArrowFiles,
k = 10, #Refers to how many cells near a "pseudo-doublet" to count.
knnMethod = "UMAP", #Refers to the embedding to use for nearest neighbor search.
LSIMethod = 1
) and my errors
My env is : R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /disk/soft/R-3.5.0/lib/libRblas.so
LAPACK: /disk/soft/R-3.5.0/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BSgenome_1.50.0 ArchR_0.9.5 magrittr_1.5 rhdf5_2.26.2 Matrix_1.2-16 data.table_1.12.0
[7] SummarizedExperiment_1.12.0 DelayedArray_0.8.0 BiocParallel_1.14.2 matrixStats_0.54.0 Biobase_2.40.0 ggplot2_3.1.0
[13] rtracklayer_1.42.2 Biostrings_2.48.0 XVector_0.20.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.2 IRanges_2.16.0
[19] S4Vectors_0.20.1 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Seurat_3.2.0 Rtsne_0.15 colorspace_1.4-1 deldir_0.1-28 ggridges_0.5.1 rstudioapi_0.9.0 spatstat.data_1.4-3
[8] leiden_0.3.1 listenv_0.7.0 npsurv_0.4-0 ggrepel_0.8.0 codetools_0.2-16 splines_3.5.0 lsei_1.2-0
[15] polyclip_1.10-0 jsonlite_1.6 Cairo_1.5-12.2 Rsamtools_1.34.1 ica_1.0-2 cluster_2.1.0 png_0.1-7
[22] uwot_0.1.8 shiny_1.2.0 sctransform_0.2.0 compiler_3.5.0 httr_1.4.0 assertthat_0.2.1 lazyeval_0.2.2
[29] later_0.8.0 htmltools_0.3.6 tools_3.5.0 rsvd_1.0.1 igraph_1.2.4 gtable_0.3.0 glue_1.3.1
[36] GenomeInfoDbData_1.1.0 RANN_2.6.1 reshape2_1.4.3 dplyr_0.8.0.1 Rcpp_1.0.1 spatstat_1.64-1 gdata_2.18.0
[43] ape_5.3 nlme_3.1-141 lmtest_0.9-37 stringr_1.4.0 globals_0.12.4 mime_0.6 miniUI_0.1.1.1
[50] irlba_2.3.3 gtools_3.8.1 XML_3.98-1.19 goftest_1.2-2 future_1.14.0 MASS_7.3-51.1 zlibbioc_1.26.0
[57] zoo_1.8-4 scales_1.0.0 promises_1.0.1 spatstat.utils_1.17-0 RColorBrewer_1.1-2 reticulate_1.11.1 pbapply_1.4-0
[64] gridExtra_2.3 rpart_4.1-15 stringi_1.4.3 caTools_1.17.1.2 rlang_0.4.0 pkgconfig_2.0.2 bitops_1.0-6
[71] lattice_0.20-38 Rhdf5lib_1.4.3 ROCR_1.0-7 purrr_0.3.2 tensor_1.5 GenomicAlignments_1.16.0 patchwork_1.0.1
[78] htmlwidgets_1.3 cowplot_0.9.4 tidyselect_0.2.5 RcppAnnoy_0.0.13 plyr_1.8.5 R6_2.4.0 gplots_3.0.1.1
[85] withr_2.1.2 pillar_1.3.1 mgcv_1.8-27 fitdistrplus_1.0-14 survival_2.43-3 abind_1.4-5 RCurl_1.95-4.12
[92] tibble_2.1.1 future.apply_1.3.0 crayon_1.3.4 KernSmooth_2.23-16 plotly_4.9.0 grid_3.5.0 digest_0.6.18
[99] xtable_1.8-3 tidyr_0.8.3 httpuv_1.5.0 munsell_0.5.0 viridisLite_0.3.0 |
Hi girls and guys, I am rather new in R and ArchR. Thanks in advance and nice day |
+1 to this when running |
I have been looking into this more see pachterlab/sleuth#120. I dont really know how to best diagnose this issue still besides recommending installing more recent versions of rhdf5. The versions I see in this thread are rhdf5_2.28.0, rhdf5_2.30.1, rhdf5_2.26.2. I use rhdf5_2.30.1 currently and we are trying to put together a stable packrat for user download for these type of issues. The current bioconductor version is rhdf5_2.32.4 which may have some stability to these issues. Sorry I am not more helpful at this time. |
For any of you with known corrupt ArrowFiles can you send one of them to archr.devs@gmail.com . I can look into trying to recover these files or something to stabilize this type of issue. Sorry for the troubles. |
I think the issue is based on hdf5 file locking. ArchR disables this to speed up these computations in parallel. I wonder if these issues are related to how your operating system handles this file locking procedure. I was able to recapitulate this error but found just disabling file locking (which ArchR tries to do worked). This is based on this type of error --
To confirm this is this type of error try --
Which should still show the contents indicating this file is not corrupted. Hope this helps Jeff |
For |
Hi, I know this issue is closed but I am also experiencing this issue when trying to run GetMarkerFeatures. I disabled subthreading as above but am still getting the error. Any help would be much appreciated! Thanks |
Hi @leeanapeters - Sorry you're having trouble with this. The H5 errors are really hard to track down and we (and many other software developers) are still trying to figure this out. The best I can say is that HDF5 errors are often sporadic and environment specific. If you've tried |
I guess this issue happens when user try to use ArchR on a HPC cluster system with shared file system (e.g. I meet this question on lustre, which is configured to disable flock for performance issue).
I guess there is still some trivial errors in v1.0.1 version, described as follows:
Many thanks! |
thanks @xiaosuyu1997 - flagging this for @jgranja24 |
Thank you so much! I encountered the same problem and solved it with your solution. |
Just to clarify, if the arrow files have already been created without the subThreading=F, then I can't access the H5DF files unless I start over with the createArrowFiles() step? Currently, I'm trying to use the archr project file created by someone else (but I have the read permission to their arrow files), and I am getting the error "Error in H5Fopen(file) : HDF5. File accessibility. Unable to open file.\n" I've tried to use h5disableFileLocking() to set the environment variables, but I still get the same error. I'm using the function getGroupBW(), but I've been seeing this error with other functions too. Thanks in advance for your help |
Hey everyone, I am so sorry to bother you but as many people had this issue I would like to share the error message and would seek for help as i am also new in computational stuff...
2021-12-06 11:46:50 : ERROR Found in .tabixToTmp for (/outs/atac_fragments.tsv.gz : 1 of 1) <simpleError in H5Fcreate(file): HDF5. File accessibility. Unable to open file.> 2021-12-06 11:46:50 : createArrowFiles has encountered an error, checking if any ArrowFiles completed.. if someone has a solution i would appreciate it! thank you |
@xiaosuyu1997 - I know that your solution is about a year old and I'm sorry that we havent implemented it yet. I'm trying to tidy up the current development branch ( Could you clarify your solution? From what I understand, the primary problem is that ArchR defaults to using A solution would be to set the default value for Would we need to additionally set default values ( |
Sorry to chime in but i am also having this issue intermittently while running ArchR in Dockerized RStudio (Rocker Project). The issue seems to go away if I completely restart the R session or remove the loaded R project. The issue I have is while running plot genome track!
Error code:
|
@rcorces
I think maybe cases where subThreading has to be FALSE is rare (default to TRUE maybe better for effeciency?). The most important question maybe the consistency of lock-use in one run (environment variable -> createArrowFiles -> later function calls use these ArrowFiles), changing in midtime may not be a good idea. This shall be warned in the document, especially for Network File System users who may run into these problems. |
I've had the same error when trying to call |
Hi,
I am definitely running out of ideas so if anybody has a suggestion i would be glad. ArchR-addGeneIntegrationMatrix-19443462eecb-Date-2022-07-05_Time-19-32-06.log
Logging With ArchR! Start Time : 2022-07-05 19:32:06 ------- ArchR Info ArchRThreads = 16 ------- System Info Computer OS = unix ------- Session Info R version 4.0.1 (2020-06-06) Matrix products: default locale: attached base packages: other attached packages: loaded via a namespace (and not attached): ------- Log Info 2022-07-05 19:32:07 : Running Seurat's Integration Stuart* et al 2019, 0.011 mins elapsed. 2022-07-05 19:32:07 : Input-Parameters, Class = list Input-Parameters$: length = 1 1 function (name) Input-Parameters$ArchRProj: length = 1 Input-Parameters$useMatrix: length = 1 Input-Parameters$matrixName: length = 1 Input-Parameters$reducedDims: length = 1 Input-Parameters$seRNA: length = 33477 Input-Parameters$groupATAC: length = 0 Input-Parameters$groupRNA: length = 1 Input-Parameters$groupList: length = 0 Input-Parameters$sampleCellsATAC: length = 1 Input-Parameters$sampleCellsRNA: length = 1 Input-Parameters$embeddingATAC: length = 0 Input-Parameters$embeddingRNA: length = 0 Input-Parameters$dimsToUse: length = 30 Input-Parameters$scaleDims: length = 0 Input-Parameters$corCutOff: length = 1 Input-Parameters$plotUMAP: length = 1 Input-Parameters$nGenes: length = 1 Input-Parameters$useImputation: length = 1 Input-Parameters$reduction: length = 1 Input-Parameters$addToArrow: length = 1 Input-Parameters$scaleTo: length = 1 Input-Parameters$genesUse: length = 0 Input-Parameters$nameCell: length = 1 Input-Parameters$nameGroup: length = 1 Input-Parameters$nameScore: length = 1 Input-Parameters$threads: length = 1 Input-Parameters$verbose: length = 1 Input-Parameters$force: length = 1 Input-Parameters$logFile: length = 1 2022-07-05 19:32:07 : Checking ATAC Input, 0.017 mins elapsed. 2022-07-05 19:32:36 : GeneScoreMat-Block-1, Class = dgCMatrix 2022-07-05 19:32:36 : Block (1 of 3) : Imputing GeneScoreMatrix, 0.385 mins elapsed. 2022-07-05 19:32:36 : addImputeWeights Input-Parameters, Class = list addImputeWeights Input-Parameters$ArchRProj: length = 1 addImputeWeights Input-Parameters$reducedDims: length = 1 addImputeWeights Input-Parameters$dimsToUse: length = 30 addImputeWeights Input-Parameters$scaleDims: length = 0 addImputeWeights Input-Parameters$corCutOff: length = 1 addImputeWeights Input-Parameters$td: length = 1 addImputeWeights Input-Parameters$ka: length = 1 addImputeWeights Input-Parameters$sampleCells: length = 1 addImputeWeights Input-Parameters$nRep: length = 1 addImputeWeights Input-Parameters$k: length = 1 addImputeWeights Input-Parameters$epsilon: length = 1 addImputeWeights Input-Parameters$useHdf5: length = 1 addImputeWeights Input-Parameters$randomSuffix: length = 1 addImputeWeights Input-Parameters$threads: length = 1 addImputeWeights Input-Parameters$seed: length = 1 addImputeWeights Input-Parameters$verbose: length = 1 addImputeWeights Input-Parameters$logFile: length = 1 2022-07-05 19:32:36 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed. 2022-07-05 19:33:07 : imputeMatrix Input-Parameters, Class = list imputeMatrix Input-Parameters$mat: nRows = 2000, nCols = 10848 imputeMatrix Input-Parameters$threads: length = 1 imputeMatrix Input-Parameters$verbose: length = 1 imputeMatrix Input-Parameters$logFile: length = 1 2022-07-05 19:33:07 : mat, Class = dgCMatrix 2022-07-05 19:33:07 : weightList, Class = SimpleList weightList$w1: length = 1 weightList$w2: length = 1 2022-07-05 19:33:07 : Imputing Matrix (1 of 2), 0 mins elapsed. 2022-07-05 19:33:07 : 2022-07-05 19:33:07 : 2022-07-05 19:33:07 : 2022-07-05 19:33:45 : 2022-07-05 19:33:45 : 2022-07-05 19:33:45 : 2022-07-05 19:34:22 : GeneScoreMat-Block-Impute-1, Class = dgeMatrix 2022-07-05 19:34:28 : Block (1 of 3) : Seurat FindTransferAnchors, 2.257 mins elapsed. 2022-07-05 19:35:42 : transferAnchors-1, Class = character transferAnchors-1: length = 1 2022-07-05 19:35:42 : rDSub-1, Class = matrix 2022-07-05 19:35:42 : rDSub-1, Class = array rDSub-1: nRows = 10848, nCols = 30 2022-07-05 19:35:42 : Block (1 of 3) : Seurat TransferData Cell Group Labels, 3.48 mins elapsed. 2022-07-05 19:37:29 : GeneScoreMat-Block-2, Class = dgCMatrix 2022-07-05 19:37:29 : Block (2 of 3) : Imputing GeneScoreMatrix, 5.273 mins elapsed. 2022-07-05 19:37:29 : addImputeWeights Input-Parameters, Class = list addImputeWeights Input-Parameters$ArchRProj: length = 1 addImputeWeights Input-Parameters$reducedDims: length = 1 addImputeWeights Input-Parameters$dimsToUse: length = 30 addImputeWeights Input-Parameters$scaleDims: length = 0 addImputeWeights Input-Parameters$corCutOff: length = 1 addImputeWeights Input-Parameters$td: length = 1 addImputeWeights Input-Parameters$ka: length = 1 addImputeWeights Input-Parameters$sampleCells: length = 1 addImputeWeights Input-Parameters$nRep: length = 1 addImputeWeights Input-Parameters$k: length = 1 addImputeWeights Input-Parameters$epsilon: length = 1 addImputeWeights Input-Parameters$useHdf5: length = 1 addImputeWeights Input-Parameters$randomSuffix: length = 1 addImputeWeights Input-Parameters$threads: length = 1 addImputeWeights Input-Parameters$seed: length = 1 addImputeWeights Input-Parameters$verbose: length = 1 addImputeWeights Input-Parameters$logFile: length = 1 2022-07-05 19:37:29 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed. 2022-07-05 19:37:58 : imputeMatrix Input-Parameters, Class = list imputeMatrix Input-Parameters$mat: nRows = 2000, nCols = 10848 imputeMatrix Input-Parameters$threads: length = 1 imputeMatrix Input-Parameters$verbose: length = 1 imputeMatrix Input-Parameters$logFile: length = 1 2022-07-05 19:37:59 : mat, Class = dgCMatrix 2022-07-05 19:37:59 : weightList, Class = SimpleList weightList$w1: length = 1 weightList$w2: length = 1 2022-07-05 19:37:59 : Imputing Matrix (1 of 2), 0 mins elapsed. 2022-07-05 19:37:59 : 2022-07-05 19:37:59 : 2022-07-05 19:37:59 : 2022-07-05 19:38:36 : 2022-07-05 19:38:36 : 2022-07-05 19:38:36 : 2022-07-05 19:39:14 : GeneScoreMat-Block-Impute-2, Class = dgeMatrix 2022-07-05 19:39:21 : Block (2 of 3) : Seurat FindTransferAnchors, 7.129 mins elapsed. 2022-07-05 19:40:18 : transferAnchors-2, Class = character transferAnchors-2: length = 1 2022-07-05 19:40:18 : rDSub-2, Class = matrix 2022-07-05 19:40:18 : rDSub-2, Class = array rDSub-2: nRows = 10848, nCols = 30 2022-07-05 19:40:18 : Block (2 of 3) : Seurat TransferData Cell Group Labels, 8.092 mins elapsed. 2022-07-05 19:41:57 : GeneScoreMat-Block-3, Class = dgCMatrix 2022-07-05 19:41:57 : Block (3 of 3) : Imputing GeneScoreMatrix, 9.734 mins elapsed. 2022-07-05 19:41:57 : addImputeWeights Input-Parameters, Class = list addImputeWeights Input-Parameters$ArchRProj: length = 1 addImputeWeights Input-Parameters$reducedDims: length = 1 addImputeWeights Input-Parameters$dimsToUse: length = 30 addImputeWeights Input-Parameters$scaleDims: length = 0 addImputeWeights Input-Parameters$corCutOff: length = 1 addImputeWeights Input-Parameters$td: length = 1 addImputeWeights Input-Parameters$ka: length = 1 addImputeWeights Input-Parameters$sampleCells: length = 1 addImputeWeights Input-Parameters$nRep: length = 1 addImputeWeights Input-Parameters$k: length = 1 addImputeWeights Input-Parameters$epsilon: length = 1 addImputeWeights Input-Parameters$useHdf5: length = 1 addImputeWeights Input-Parameters$randomSuffix: length = 1 addImputeWeights Input-Parameters$threads: length = 1 addImputeWeights Input-Parameters$seed: length = 1 addImputeWeights Input-Parameters$verbose: length = 1 addImputeWeights Input-Parameters$logFile: length = 1 2022-07-05 19:41:57 : Computing Impute Weights Using Magic (Cell 2018), 0 mins elapsed. 2022-07-05 19:42:31 : imputeMatrix Input-Parameters, Class = list imputeMatrix Input-Parameters$mat: nRows = 2000, nCols = 10848 imputeMatrix Input-Parameters$threads: length = 1 imputeMatrix Input-Parameters$verbose: length = 1 imputeMatrix Input-Parameters$logFile: length = 1 2022-07-05 19:42:31 : mat, Class = dgCMatrix 2022-07-05 19:42:31 : weightList, Class = SimpleList weightList$w1: length = 1 weightList$w2: length = 1 2022-07-05 19:42:31 : Imputing Matrix (1 of 2), 0 mins elapsed. 2022-07-05 19:42:31 : 2022-07-05 19:42:31 : 2022-07-05 19:42:32 : 2022-07-05 19:43:28 : 2022-07-05 19:43:28 : 2022-07-05 19:43:29 : 2022-07-05 19:44:18 : GeneScoreMat-Block-Impute-3, Class = dgeMatrix 2022-07-05 19:44:25 : Block (3 of 3) : Seurat FindTransferAnchors, 12.201 mins elapsed. 2022-07-05 19:45:49 : transferAnchors-3, Class = character transferAnchors-3: length = 1 2022-07-05 19:45:49 : rDSub-3, Class = matrix 2022-07-05 19:45:49 : rDSub-3, Class = array rDSub-3: nRows = 10848, nCols = 30 2022-07-05 19:45:49 : Block (3 of 3) : Seurat TransferData Cell Group Labels, 13.606 mins elapsed. |
@VDD58 - I dont have a solution for your problem but we are hoping that an upcoming release will fix many of these HDF5 problems. That being said, I dont think the error you are reporting is the same as what is discussed in this issue post. Also, in the future, please upload you log file rather than pasting the full text into the issue. |
Thank you so much for responding! |
I believe this has now been addressed properly on the There is a more stable way of handling file locking that will be implemented in |
@rcorces I installed the version in |
@mdanb - I'll need more information to actually help. The |
@rcorces I'm good. It turns out it was because my |
Just wanted to confirm: Is there currently no work around for this issue when running downstream analysis? I am using ArrowFiles created by someone else, and I don't really want to re-process all the arrow files, but if there is no other way I will. |
Hi, none of the above things worked for me. The only thing that let me continue was to set
|
Describe the bug
When I used subsetArchRProject to create new arrow files, a bug arose as in the KI.txt.
KI.txt
If I only use one core, the bug also exists. I would be happy to get your comments on this. Thanks.
Session Info
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /software/biosoft/software/python/anaconda3-python3-2018/lib/libblas.so.3.6.0
LAPACK: /software/biosoft/software/python/anaconda3-python3-2018/lib/liblapack.so.3.6.0
locale:
[1] C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ArchR_0.9.5 magrittr_1.5
[3] rhdf5_2.28.0 Matrix_1.2-17
[5] data.table_1.12.6 SummarizedExperiment_1.14.0
[7] DelayedArray_0.10.0 BiocParallel_1.18.0
[9] matrixStats_0.55.0 Biobase_2.44.0
[11] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0
[13] IRanges_2.18.1 S4Vectors_0.22.0
[15] BiocGenerics_0.30.0 ggplot2_3.2.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 pillar_1.4.4
[3] compiler_3.6.1 XVector_0.24.0
[5] tools_3.6.1 bitops_1.0-6
[7] zlibbioc_1.30.0 BSgenome_1.52.0
[9] lifecycle_0.2.0 tibble_3.0.1
[11] gtable_0.3.0 lattice_0.20-38
[13] pkgconfig_2.0.3 rlang_0.4.6
[15] GenomeInfoDbData_1.2.1 stringr_1.4.0
[17] rtracklayer_1.44.2 withr_2.1.2
[19] dplyr_0.8.3 Biostrings_2.52.0
[21] vctrs_0.2.4 grid_3.6.1
[23] tidyselect_1.0.0 glue_1.4.1
[25] R6_2.4.0 BSgenome.Hsapiens.UCSC.hg19_1.4.0
[27] XML_3.98-1.20 Rhdf5lib_1.6.0
[29] purrr_0.3.3 GenomicAlignments_1.20.1
[31] Rsamtools_2.0.0 scales_1.0.0
[33] ellipsis_0.3.0 assertthat_0.2.1
[35] colorspace_1.4-1 stringi_1.4.3
[37] RCurl_1.95-4.12 lazyeval_0.2.2
[39] munsell_0.5.0 crayon_1.3.4
[41] Cairo_1.5-10
The text was updated successfully, but these errors were encountered: