fail to create scTAR_cellranger env #10

adonatiucsd · 2021-10-05T16:01:06Z

Hi,
I am trying to get the fromcellranger pipeline to work on MacOS Catalina 10.15.7
When I run
conda env create -f scTAR_cellranger.yml
(after modifying the yml file prefix to /home/miniconda3/envs/scTAR_cellranger)
I get the following error message:

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

libtiff==4.2.0=hdc55705_0
_openmp_mutex==4.5=1_gnu
xorg-libxrender==0.9.10=h7f98852_1003
xorg-libxext==1.3.4=h7f98852_1
make==4.3=hd18ef5c_1
curl==7.76.1=h979ede3_1
c-ares==1.17.1=h7f98852_1
libxcb==1.13=h7f98852_1003
libgcc-devel_linux-64==9.3.0=h7864c58_19
readline==8.0=he28a2e2_2
gfortran_impl_linux-64==9.3.0=hc4a2995_19
pthread-stubs==0.4=h36c2ea0_1001
xz==5.2.5=h516909a_1
xorg-kbproto==1.0.7=h7f98852_1002
libcurl==7.76.1=hc4aaa36_1
cairo==1.16.0=h6cf1ce9_1008
ucsc-gtftogenepred==377=h0b8a92a_4
tk==8.6.10=h21135ba_1
binutils_linux-64==2.35=h67ddf6f_30
libuuid==2.32.1=h7f98852_1000
xorg-libxau==1.0.9=h7f98852_0
libnghttp2==1.43.0=h812cca2_0
libgomp==9.3.0=h2828fa1_19
libstdcxx-devel_linux-64==9.3.0=hb016644_19
xorg-renderproto==0.11.1=h7f98852_1002
libedit==3.1.20191231=he28a2e2_2
htslib==1.12=h9093b5e_1
gxx_linux-64==9.3.0=h3fbe746_30
zstd==1.4.9=ha95c52a_0
mysql-connector-c==6.1.11=h6eb9d5d_1007
icu==68.1=h58526e2_0
libxml2==2.9.10=h72842e0_4
perl==5.32.0=h36c2ea0_0
libgfortran-ng==9.3.0=hff62375_19
jpeg==9d=h36c2ea0_0
pcre2==10.36=h032f7d1_1
lz4-c==1.9.3=h9c3ff4c_0
gfortran_linux-64==9.3.0=hdc58fab_30
pcre==8.44=he1b5a44_0
fribidi==1.0.10=h36c2ea0_0
gcc_linux-64==9.3.0=hf25ea35_30
gcc_impl_linux-64==9.3.0=h70c0ae5_19
samtools==1.12=h9aed4be_1
pixman==0.40.0=h36c2ea0_0
bzip2==1.0.8=h7f98852_4
xorg-libx11==1.7.0=h7f98852_0
bedtools==2.30.0=h7d7f7ad_1
libiconv==1.16=h516909a_0
openssl==1.1.1k=h7f98852_0
libev==4.33=h516909a_1
libglib==2.68.1=h3e27bee_0
xorg-libxdmcp==1.1.3=h7f98852_0
openjdk==10.0.2=h14c3975_1015
gxx_impl_linux-64==9.3.0=hd87eabc_19
ca-certificates==2020.12.5=ha878542_0
xorg-xproto==7.0.31=h7f98852_1007
ncurses==6.2=h58526e2_4
libwebp-base==1.2.0=h7f98852_2
gettext==0.19.8.1=h0b5b191_1005
xorg-libsm==1.2.3=hd9c2040_1000
harfbuzz==2.8.0=h83ec7ef_1
krb5==1.17.2=h926e7f8_0
pango==1.48.4=hb8ff022_0
libgcc-ng==9.3.0=h2828fa1_19
libdeflate==1.7=h7f98852_5
_libgcc_mutex==0.1=conda_forge
gsl==2.6=he838d99_2
r-base==4.0.3=h349a78a_8
ld_impl_linux-64==2.35.1=hea4e1c9_2
xorg-libice==1.0.10=h7f98852_0
xorg-libxt==1.2.1=h7f98852_2
libstdcxx-ng==9.3.0=h6de172a_19
sed==4.8=he412f7d_0
libffi==3.3=h58526e2_2
bwidget==1.9.14=ha770c72_0
binutils_impl_linux-64==2.35.1=h193b22a_2
xorg-xextproto==7.3.0=h7f98852_1002
tktable==2.10=hb7b940f_3
libgfortran5==9.3.0=hff62375_19
zlib==1.2.11=h516909a_1010
expat==2.3.0=h9c3ff4c_0
fontconfig==2.13.1=hba837de_1005
graphite2==1.3.13=h58526e2_1001
freetype==2.10.4=h0708190_1
libopenblas==0.3.12=pthreads_h4812303_1
libpng==1.6.37=h21135ba_2
libssh2==1.9.0=ha56f1ee_6

any idea about what is wrong?
I tried to run snakemake without creating the scTAR_cellranger environment but I get a bunch of errors
for example after doing
conda activate snakemake
snakemake -j
I get:
The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs.
Building DAG of jobs...
MissingInputException in line 99 of /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/Snakefile:
Missing input files for rule convertToRefFlat:
/Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf

thanks!

Antoine

fw262 · 2021-10-06T17:59:01Z

Hi Antoine,

The ResolvePackageNotFound is a known issue if you're trying to create the environment on Mac using an environment.yml file created on a different OS, in this case Linux. I suggest running this pipeline either in Linux or try running without creating the scTAR_cellranger environment.

It looks like the error you're getting is related to your /Users/perrylabmac/Musca_ref_genome_Cellranger folder. Can you make sure this folder contains genes/genes.gtf?

Best,
Michael

adonatiucsd · 2021-10-07T16:14:17Z

Hi Michael,

It turns out the genes.gtf was zipped; after unzipping and rerunning the snakemake I got this error:

Error in rule convertToRefFlat:
jobid: 9
output: /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
shell:

            /Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp
            paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
            rm refFlat.tmp

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

thanks a lot for your help,

Antoine

fw262 · 2021-10-07T17:25:13Z

Can you share the error message you get when you run the following commands on the command line? The snakemake log file does not provide the error message info.

/Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp
            paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
            rm refFlat.tmp

Thanks,
Michael

adonatiucsd · 2021-10-07T20:20:44Z

I get this
(snakemake) Perrys-MacBook-Pro:log perrylabmac$ /Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred -genePredExt -geneNameAsName2 /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.gtf refFlat.tmp
dyld: Library not loaded: @rpath/libssl.1.1.dylib
Referenced from: /Users/perrylabmac/miniconda3/pkgs/ucsc-gtftogenepred-377-h516baf0_3/bin/gtfToGenePred
Reason: image not found
Abort trap: 6
(snakemake) Perrys-MacBook-Pro:log perrylabmac$ paste <(cut -f 12 refFlat.tmp) <(cut -f 1-10 refFlat.tmp) > /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
cut: cut: refFlat.tmp: No such file or directory
refFlat.tmp: No such file or directory
(snakemake) Perrys-MacBook-Pro:log perrylabmac$ rm refFlat.tmp
rm: refFlat.tmp: No such file or directory

fw262 · 2021-10-07T20:46:47Z

Hmm, seems like an issue with the libssl library. Try conda installing with conda install libssh2 and re-runing.

Michael

adonatiucsd · 2021-10-08T20:37:17Z

So I installed libssh2 and rerun, here is the log file:
Select jobs to execute...

[Fri Oct 8 10:07:32 2021]
rule calcHMMbed:
input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
jobid: 8
wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
threads: 3
resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T

[Fri Oct 8 12:18:55 2021]
Finished job 8.
1 of 11 steps (9%) done
Select jobs to execute...

[Fri Oct 8 12:18:55 2021]
rule calcHMMrefFlat:
input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Ce$
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
jobid: 7
wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T

[Fri Oct 8 12:18:55 2021]
Error in rule calcHMMrefFlat:
jobid: 7
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
shell:

            Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/per$

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-08T100732.520897.snakemake.log

and what I get during execution:

(snakemake) Perrys-MacBook-Pro:from_cellranger perrylabmac$ snakemake -j --rerun-incomplete
The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs.
Building DAG of jobs...
Using shell: /usr/local/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads

HMM_refFlat_to_gtf_WITHDIR 1 1 1
all 1 1 1
calcHMMbed 1 3 3
calcHMMrefFlat 1 1 1
extract_HMM_expression_withDir 1 1 1
getDiffFeatures 1 1 1
getDiffRegionsToBlast 1 1 1
getDiffSeqsToBlastFa 1 1 1
labelDiffuTARs 1 1 1
ruleBlast 1 3 3
stage3_withDir 1 1 1
total 11 1 3

Select jobs to execute...

[Fri Oct 8 10:07:32 2021]
rule calcHMMbed:
input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
jobid: 8
wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
threads: 3
resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T

Number of aligned reads is 594024319
mkdir: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features: File exists
tee: SingleCellHMM_Run_/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features.log: No such file or directory
Path to SingleCellHMM.R /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/scripts
INPUT_BAM /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam
cellranger count folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR
tmp folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features
number Of thread 3
minimum coverage 59
thresholded at 1 in 10000000 reads

Reads spanning over splicing junction will join HMM blocks
To avoid that, split reads into small blocks before input to groHMM
Spliting and sorting reads...
awk: syntax error at source line 1
context is
{print $0 >> >>> "chr"$ <<< 1".bed"}
awk: illegal statement at source line 1
zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory
find: illegal option -- n
usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]

Start to run groHMM on each individual chromosome...
chr*.bed

Merging HMM blocks within 500bp...
sort: No such file or directory
mkdir: toremove: File exists

Calculating the coverage...
zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory

Filtering the HMM blocks by coverage...

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

zcat: can't stat: TAR_reads.bed.gz (TAR_reads.bed.gz.Z): No such file or directory

Link the final TAR_reads.bed.gz file to the working directory

Move intermediate files to /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...

/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and
all major chromosomes are present in the final TAR_reads.bed.gz file

gzip: toremove is a directory

done!
gzip: possorted_genome_bam_split.sorted.bed.gz already has .gz suffix -- unchanged
gzip: TAR_reads.bed.gz already has .gz suffix -- unchanged
[Fri Oct 8 12:18:55 2021]
Finished job 8.
1 of 11 steps (9%) done
Select jobs to execute...

[Fri Oct 8 12:18:55 2021]
rule calcHMMrefFlat:
input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
jobid: 7
wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: read.delim -> read.table
Execution halted
[Fri Oct 8 12:18:55 2021]
Error in rule calcHMMrefFlat:
jobid: 7
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
shell:

	Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
	
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-08T100732.520897.snakemake.log

thanks!

Antoine

fw262 · 2021-10-12T16:06:38Z

Hi Antoine,

I think this issue may again be related to your MacOS system. The awk syntax seem to be a little different depending on the OS. I propose 2 temporary solutions you can try.

Try using gawk instead of awk (command: alias awk=gawk).
Go into the file scripts/SingleCellHMM_MW.bash and change the line bedtools bamtobed -i ${INPUT_BAM} -split |LC_ALL=C sort -k1,1V -k2,2n --parallel=30| awk '{print $0}' | gzip > ${TMPDIR}/${PREFIX}_split.sorted.bed.gz into bedtools bamtobed -i ${INPUT_BAM} -split |LC_ALL=C sort -k1,1V -k2,2n --parallel=30| awk '{print ($0)}' | gzip > ${TMPDIR}/${PREFIX}_split.sorted.bed.gz where there is a parenthesis added to the awk print command.

Let me know if that fixes it.

Michael

adonatiucsd · 2021-10-12T16:23:43Z

Hi Michael,

I tried both, I still get this error:

The flag 'directory' used in rule getMatsSteps is only valid for outputs, not inputs.
Building DAG of jobs...
Using shell: /usr/local/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads

HMM_refFlat_to_gtf_WITHDIR 1 1 1
all 1 1 1
calcHMMrefFlat 1 1 1
extract_HMM_expression_withDir 1 1 1
getDiffFeatures 1 1 1
getDiffRegionsToBlast 1 1 1
getDiffSeqsToBlastFa 1 1 1
labelDiffuTARs 1 1 1
ruleBlast 1 3 3
stage3_withDir 1 1 1
total 10 1 3

Select jobs to execute...

[Tue Oct 12 09:22:09 2021]
rule calcHMMrefFlat:
input: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
jobid: 7
wildcards: DATADIR=/Users/perrylabmac/M1_Musca, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
resources: tmpdir=/var/folders/kb/1djtgry52557wtjq8j15n_pm0000gn/T

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: read.delim -> read.table
Execution halted
[Tue Oct 12 09:22:09 2021]
Error in rule calcHMMrefFlat:
jobid: 7
output: /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
shell:

	Rscript scripts/generate_refFlat_script_both.R /Users/perrylabmac/Musca_ref_genome_Cellranger/genes/genes.refFlat /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz
	
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-12T092208.761573.snakemake.log

Thanks!

Antoine

fw262 · 2021-10-12T18:58:38Z

Can you share the content of your /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR directory with a ls -hlt command? It looks like you have an empty file somewhere along the TAR discover step.

Thanks,
Michael

adonatiucsd · 2021-10-13T01:13:34Z

Hi!

(base) Perrys-MacBook-Pro:TAR perrylabmac$ ls -hlt
total 0
lrwxr-xr-x 1 perrylabmac staff 123B Oct 12 15:23 TAR_reads.bed.gz -> /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/TAR_reads.bed.gz
drwxr-xr-x 7 perrylabmac staff 224B Oct 12 15:23 possorted_genome_bam_HMM_features

and the content of the inner folder:

(base) Perrys-MacBook-Pro:possorted_genome_bam_HMM_features perrylabmac$ ls -hlt
total 11993240
drwxr-xr-x 2 perrylabmac staff 64B Oct 12 15:23 toremove
-rw-r--r-- 1 perrylabmac staff 20B Oct 12 15:23 TAR_reads.bed.gz
-rw-r--r-- 1 perrylabmac staff 67B Oct 12 15:23 possorted_genome_bam_merge500.sorted.bed_count.gz
-rw-r--r-- 1 perrylabmac staff 61B Oct 12 15:23 possorted_genome_bam_merge500.sorted.bed.gz
-rw-r--r-- 1 perrylabmac staff 5.7G Oct 12 15:23 possorted_genome_bam_split.sorted.bed.gz

thanks,

Antoine

fw262 · 2021-10-13T15:16:37Z

Can you make sure you have the necessary R packages, listed below, installed?

BiocManager
rtracklayer
groHMM
Seurat, version >= 4.0
data.table
dplyr
stringr

Can you also share what is inside the toremove folder with ls -lht?

Thanks,
michael

adonatiucsd · 2021-10-13T20:22:04Z

Hi,

So I already had these R packages installed via Rstudio but I reinstalled them within miniconda3 just to make sure; I get the same error message as before.
The toremove folder is empty; nothing shows up when I type ls -lht

thanks,

Antoine

fw262 · 2021-10-14T15:08:29Z

Can you share the SingleCellHMM_Run*.log file generated after implementing the awk fix? I would also recommend running the pipeline on a Linux system if possible.

Michael

adonatiucsd · 2021-10-14T18:42:19Z

ok I will try to run this on Linux. Where can I find the SingleCellHMM log file?

fw262 · 2021-10-14T20:46:32Z

I made a change to scripts/SingleCellHMM_MW.bash to fix a bug with the TAR discovery log file. If you update this file and re-run your pipeline, you should find the log file in /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/SingleCellHMM_Run.log.

Michael

adonatiucsd · 2021-10-15T02:50:57Z

Here it is:

Path to SingleCellHMM.R /Users/perrylabmac/TAR-scRNA-seq/from_cellranger/scripts
INPUT_BAM /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam
cellranger count folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR
tmp folder /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features
number Of thread 10
minimum coverage 59
thresholded at 1 in 10000000 reads

Reads spanning over splicing junction will join HMM blocks
To avoid that, split reads into small blocks before input to groHMM
Spliting and sorting reads...
zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory
awk: syntax error at source line 1
context is
{print $0 >> >>> "chr"$ <<< 1".bed"}
awk: illegal statement at source line 1
find: illegal option -- n
usage: find [-H | -L | -P] [-EXdsx] [-f path] path ... [expression]
find [-H | -L | -P] [-EXdsx] -f path [path ...] [expression]

Start to run groHMM on each individual chromosome...
chr*.bed

Merging HMM blocks within 500bp...
sort: No such file or directory

Calculating the coverage...
zcat: can't stat: possorted_genome_bam_split.sorted.bed.gz (possorted_genome_bam_split.sorted.bed.gz.Z): No such file or directory

Filtering the HMM blocks by coverage...

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

zcat: can't stat: TAR_reads.bed.gz (TAR_reads.bed.gz.Z): No such file or directory

Link the final TAR_reads.bed.gz file to the working directory

Move intermediate files to /Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...

/Users/perrylabmac/M1_Musca/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and
all major chromosomes are present in the final TAR_reads.bed.gz file

adonatiucsd · 2021-10-19T02:01:29Z

Hi,

So I still haven't managed to get it to work on MacOS but I was able to run the pipeline on Linux.
However when loading the TAR bed file on IGV I noticed that many scaffolds did not have anything in the TAR track; and from the groHMM log file it looks like the program did not go through all the 20487 scaffold of the Housefly genome:

Path to SingleCellHMM.R /home/adonati/TAR-scRNA-seq/from_cellranger/scripts
INPUT_BAM /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/outs/possorted_genome_bam.bam
cellranger count folder /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR
tmp folder /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features
number Of thread 6
minimum coverage 59
thresholded at 1 in 10000000 reads

Reads spanning over splicing junction will join HMM blocks
To avoid that, split reads into small blocks before input to groHMM
Spliting and sorting reads...
Finished splitting input bam.
awk: cannot open "chrNW_004756140.1.bed" for output (Too many open files)

Start to run groHMM on each individual chromosome...
chrNW_004754939.1.bed
chrNW_004754940.1.bed
chrNW_004754941.1.bed
chrNW_004754943.1.bed
chrNW_004754965.1.bed
chrNW_004755053.1.bed
chrNW_004755054.1.bed
chrNW_004755111.1.bed
chrNW_004755164.1.bed
chrNW_004755198.1.bed
chrNW_004755220.1.bed
chrNW_004755231.1.bed
chrNW_004755318.1.bed
chrNW_004755333.1.bed
chrNW_004755352.1.bed
chrNW_004755353.1.bed
chrNW_004755386.1.bed
chrNW_004755398.1.bed
chrNW_004755497.1.bed
chrNW_004755531.1.bed
chrNW_004755586.1.bed
chrNW_004755622.1.bed
chrNW_004755631.1.bed
chrNW_004755653.1.bed
chrNW_004755684.1.bed
chrNW_004755784.1.bed
chrNW_004755797.1.bed
chrNW_004755819.1.bed
chrNW_004755838.1.bed
chrNW_004755853.1.bed
chrNW_004755919.1.bed
chrNW_004755924.1.bed
chrNW_004755942.1.bed
chrNW_004755953.1.bed
chrNW_004756008.1.bed
chrNW_004756052.1.bed
chrNW_004756054.1.bed
chrNW_004756076.1.bed
chrNW_004756109.1.bed
chrNW_004756120.1.bed

Merging HMM blocks within 500bp...

Calculating the coverage...

Filtering the HMM blocks by coverage...

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

NW_004754939.1
NW_004754940.1
NW_004754941.1
NW_004754943.1
NW_004754965.1
NW_004755053.1
NW_004755054.1
NW_004755164.1
NW_004755198.1
NW_004755220.1
NW_004755231.1
NW_004755353.1
NW_004755386.1
NW_004755398.1
NW_004755497.1
NW_004755531.1
NW_004755586.1
NW_004755631.1
NW_004755653.1
NW_004755797.1
NW_004755819.1
NW_004755838.1
NW_004755853.1
NW_004755919.1
NW_004755924.1
NW_004755942.1
NW_004755953.1
NW_004756008.1
NW_004756052.1
NW_004756054.1
NW_004756076.1
NW_004756109.1

Link the final TAR_reads.bed.gz file to the working directory

Move intermediate files to /home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove ...

/home/adonati/Desktop/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/possorted_genome_bam_HMM_features/toremove can be deleted if no error message in SingleCellHMM_Run log file and
all major chromosomes are present in the final TAR_reads.bed.gz file

gzip: possorted_genome_bam_split.sorted.bed.gz already has .gz suffix -- unchanged
gzip: TAR_reads.bed.gz already has .gz suffix -- unchanged

Looking at the TAR_reads. bed it seems many scaffold are absent indeed. Any idea why the program skipped so many scaffolds?

Thanks!

Antoine

fw262 · 2021-10-19T15:25:56Z

Hi Antoine,

I'm glad the pipeline worked better on Linux. I updated scripts/SingleCellHMM_MW.bash so please try re-running with this new script. With over 20k scaffolds, it's overloading memory so I modified that script to close bed files that are generated. This may take a little longer to run but should work.

Best,
Michael

adonatiucsd · 2021-10-21T19:32:38Z

Hi Michael,

I am trying to get the pipeline to work on UCSD supercomputer Expanse, I installed miniconda3 and all the software required except R, which I can't figure out how to install without sudo (which I can't use on Expanse); do you think the pipeline could work if I install R via conda?

Thanks!

Antoine

fw262 · 2021-10-21T20:10:22Z

A conda installed R should work fine, as long as you have all the required R packages as well. Let me know if it works out.

Michael

adonatiucsd · 2021-10-22T17:45:44Z

Hi Michael,

Sorry about the stupid question, but am I supposed to activate the scTAR_cellranger environment before running snakemake? I do have R and installed all the required libraries but within the scTAR_cellranger environement. The problem is that to run snakemake I need to use "conda activate snakemake", which deactivates scTAR_cellranger...
Thanks a lot!

Antoine

adonatiucsd · 2021-10-24T02:16:51Z

Hi Michael,

I get this error when I run the pipeline after installing R with conda:

wildcards: CR_REF=/expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/Musca_ref_genome
resources: tmpdir=/tmp

[Sat Oct 23 18:45:20 2021]
Finished job 9.
1 of 11 steps (9%) done
Select jobs to execute...

[Sat Oct 23 18:45:20 2021]
rule calcHMMrefFlat:
input: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz, /expanse/lustre/scratch/adon$
output: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
jobid: 7
wildcards: DATADIR=/expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1, sample=Lib_10X_Musca_larva_2019_02_25_M1_1sttry
resources: tmpdir=/tmp

/usr/bin/bash: line 1: Rscript: command not found
[Sat Oct 23 18:45:20 2021]
Error in rule calcHMMrefFlat:
jobid: 7
output: /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_M1_1sttry/TAR/TAR_reads.bed.gz.withDir.genes.refFlat
shell:

        Rscript scripts/generate_refFlat_script_both.R /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/Musca_ref_genome/genes/genes.refFlat /expa$

(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/adonati2/TAR-scRNA-seq/from_cellranger/.snakemake/log/2021-10-23T184509.175761.snakemake.log

it looks like the Rscript command is not working from within the snakemake pipeline even if I do have Rscript in miniconda3/bin...

thanks!

Antoine

fw262 · 2021-10-25T14:40:04Z

Hi Antoine,

It looks like Rscript is not available within the snakemake environment. You can try simply replacing Rscript command within the rule calcHMMrefFlat of Snakefile to explicitly call Rscript in miniconda3/bin.

In regards to the snakemake environment, it may be worthwhile to install snakemake via pip so you can activate the scTAR_cellranger environment without conflicting with the snakemake environment.

Best,
Michael

adonatiucsd · 2021-10-26T20:23:35Z

Hi Michael,

I had a problem with R because for some reason conda installed R 3.2
I managed to install R 4.1 with mamba and then I was able to run the pipeline
It ran for a full 24h on the computer before stopping before completion (max time reached): it created tones of .bed files (looks like one per scaffold) in /TAR/possorted_genome_bam_HMM_features
along with "possorted_genome_bam_split.sorted.bed.gz" and "SingleCellHMM_Run.log" which reads like:

Path to SingleCellHMM.R /home/adonati2/TAR-scRNA-seq/from_cellranger/scripts
INPUT_BAM /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_$
cellranger count folder /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_$
tmp folder /expanse/lustre/scratch/adonati2/temp_project/10XMusca/mapping/M1/Lib_10X_Musca_larva_2019_02_25_$
number Of thread 24
minimum coverage 59
thresholded at 1 in 10000000 reads

Reads spanning over splicing junction will join HMM blocks
To avoid that, split reads into small blocks before input to groHMM
Spliting and sorting reads...
Finished splitting input bam.

I was wondering if the pipeline is really supposed to generate and keep all these .bed files? Is it normal that the pipeline took so long even running on Expanse (I ran it on just one Node, which is two 64-core AMD EPYC 7742 processors and contain 256 GB of DDR4 memory)?
Thanks!

Antoine

fw262 · 2021-10-27T15:42:00Z

Hi Antoine,

Yes, the pipeline is suppose to generate those bed files. The groHMM algorithm is run on individual chromosomes/scaffolds. There is a limitation with the tool if you have a very high number of scaffolds (over 20,000 in your case). Are you interested in the expression across all scaffolds? Perhaps you can filter for the major scaffolds where you'd expect the most expression.

Best,
Michael

adonatiucsd · 2021-10-27T16:56:52Z

Hi Michael,

I guess it would be nice to have the TARs for all scaffold; here I have many scaffolds indeed but most of them are small, only 35 of them are over 1Mb, and the total genome size is only 750Mb; will the pipeline run slower with a more fragmented genome or is the speed only dependent on genome size?

fw262 · 2021-10-27T19:30:24Z

Hi Antoine,

The pipeline would run slower with more fragmented genome. It wouldn't take as long if you ran the pipeline on, for example, the human or mouse genome.

Best,
Michael

adonatiucsd · 2021-10-29T21:11:55Z

Hi Michael,

Are all these bedfiles deleted at the end of the pipeline or are they all merged together and zipped?
thanks

Antoine

fw262 · 2021-11-01T14:10:25Z

Hi Antoine,

These bed files are zipped, not merged, and moved to the possorted_genome_bam_HMM_features/toremove folder which you can remove manually.

Best,
Michael

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to create scTAR_cellranger env #10

fail to create scTAR_cellranger env #10

adonatiucsd commented Oct 5, 2021

fw262 commented Oct 6, 2021

adonatiucsd commented Oct 7, 2021

fw262 commented Oct 7, 2021

adonatiucsd commented Oct 7, 2021

fw262 commented Oct 7, 2021

adonatiucsd commented Oct 8, 2021

fw262 commented Oct 12, 2021

adonatiucsd commented Oct 12, 2021

fw262 commented Oct 12, 2021

adonatiucsd commented Oct 13, 2021

fw262 commented Oct 13, 2021

adonatiucsd commented Oct 13, 2021

fw262 commented Oct 14, 2021

adonatiucsd commented Oct 14, 2021

fw262 commented Oct 14, 2021

adonatiucsd commented Oct 15, 2021

adonatiucsd commented Oct 19, 2021

fw262 commented Oct 19, 2021

adonatiucsd commented Oct 21, 2021

fw262 commented Oct 21, 2021

adonatiucsd commented Oct 22, 2021

adonatiucsd commented Oct 24, 2021

fw262 commented Oct 25, 2021

adonatiucsd commented Oct 26, 2021

fw262 commented Oct 27, 2021

adonatiucsd commented Oct 27, 2021

fw262 commented Oct 27, 2021

adonatiucsd commented Oct 29, 2021

fw262 commented Nov 1, 2021

fail to create scTAR_cellranger env #10

fail to create scTAR_cellranger env #10

Comments

adonatiucsd commented Oct 5, 2021

Hi, I am trying to get the fromcellranger pipeline to work on MacOS Catalina 10.15.7 When I run conda env create -f scTAR_cellranger.yml (after modifying the yml file prefix to /home/miniconda3/envs/scTAR_cellranger) I get the following error message:

fw262 commented Oct 6, 2021

adonatiucsd commented Oct 7, 2021

fw262 commented Oct 7, 2021

adonatiucsd commented Oct 7, 2021

fw262 commented Oct 7, 2021

adonatiucsd commented Oct 8, 2021

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

fw262 commented Oct 12, 2021

adonatiucsd commented Oct 12, 2021

fw262 commented Oct 12, 2021

adonatiucsd commented Oct 13, 2021

fw262 commented Oct 13, 2021

adonatiucsd commented Oct 13, 2021

fw262 commented Oct 14, 2021

adonatiucsd commented Oct 14, 2021

fw262 commented Oct 14, 2021

adonatiucsd commented Oct 15, 2021

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

adonatiucsd commented Oct 19, 2021

Please examine if major chromosomes are all present in the final TAR_reads.bed.gz file

fw262 commented Oct 19, 2021

adonatiucsd commented Oct 21, 2021

fw262 commented Oct 21, 2021

adonatiucsd commented Oct 22, 2021

adonatiucsd commented Oct 24, 2021

fw262 commented Oct 25, 2021

adonatiucsd commented Oct 26, 2021

fw262 commented Oct 27, 2021

adonatiucsd commented Oct 27, 2021

fw262 commented Oct 27, 2021

adonatiucsd commented Oct 29, 2021

fw262 commented Nov 1, 2021

Hi,
I am trying to get the fromcellranger pipeline to work on MacOS Catalina 10.15.7
When I run
conda env create -f scTAR_cellranger.yml
(after modifying the yml file prefix to /home/miniconda3/envs/scTAR_cellranger)
I get the following error message: