Skip to content

Commit

Permalink
Deploying to gh-pages from @ f25927b 🚀
Browse files Browse the repository at this point in the history
  • Loading branch information
kaizhang committed Sep 30, 2024
0 parents commit 3fc2554
Show file tree
Hide file tree
Showing 162 changed files with 21,850 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: f2af386d6e99bb22829f120be7f0f3d7
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file not shown.
Binary file added .doctrees/_autosummary/precellar.SeqSpec.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/_autosummary/precellar.align.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/api.doctree
Binary file not shown.
Binary file added .doctrees/environment.pickle
Binary file not shown.
Binary file added .doctrees/index.doctree
Binary file not shown.
268 changes: 268 additions & 0 deletions .doctrees/nbsphinx/tutorials/txg_multiome.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import precellar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first create a SeqSpec object from the 10X multiome seqspec template file."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[90m[\u001b[0m2024-09-30T16:24:31Z \u001b[32mINFO \u001b[0m cached_path::cache\u001b[90m]\u001b[0m Cached version of https://raw.githubusercontent.com/regulatory-genomics/precellar/refs/heads/main/seqspec_templates/10x_rna_atac.yaml is up-to-date\n"
]
}
],
"source": [
"assay = precellar.SeqSpec(\"https://raw.githubusercontent.com/regulatory-genomics/precellar/refs/heads/main/seqspec_templates/10x_rna_atac.yaml\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The SeqSpec object can be visualized as a tree structure."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"├── rna(153-1150)\n",
"│ ├── rna-illumina_p5(29)\n",
"│ ├── rna-truseq_read1(29)\n",
"│ ├── rna-cell_barcode(16)\n",
"│ ├── rna-umi(12)\n",
"│ ├── rna-cDNA(1-1000)\n",
"│ ├── rna-truseq_read2(34)\n",
"│ ├── rna-index7(8)\n",
"│ └── rna-illumina_p7(24)\n",
"└── atac(153-1150)\n",
" ├── atac-illumina_p5(29)\n",
" ├── atac-cell_barcode(16)\n",
" ├── atac-linker(8)\n",
" ├── atac-nextera_read1(33)\n",
" ├── atac-gDNA(1-1000)\n",
" ├── atac-nextera_read2(34)\n",
" ├── atac-index7(8)\n",
" └── atac-illumina_p7(24)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"assay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We then add ATAC data to the SeqSpec object. Specifically, we add R1, R2 and index fastq files. Note that the index read, ususally named \"*_R2_*\" in the 10X multiome data, can be sequenced in either direction, depending on the sequencing platform. Older platforms such as NovaSeq 6000 with v1.0 reagent kits, MiniSeq with Rapid Reagent kits, MiSeq, HiSeq 2500, and HiSeq 2000 use the forward strand workflow. iSeq 100, MiniSeq with Standard reagent kits, NextSeq Systems, NovaSeq 6000 with v1.5 reagent kits, HiSeq X, HiSeq 4000, and HiSeq 3000 use the reverse strand workflow.\n",
"\n",
"We need to pay special attention to the orientation of the index read. In this example, the index read is sequenced in the reverse direction."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Read 1\n",
"assay.add_read(\n",
" read_id=\"atac-R1\",\n",
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R1_001.fq.zst\",\n",
" modality=\"atac\",\n",
" is_reverse=False,\n",
" primer_id=\"atac-nextera_read1\",\n",
")\n",
"\n",
"# Index Read\n",
"assay.add_read(\n",
" read_id=\"atac-I1\",\n",
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R2_001.fq.zst\",\n",
" modality=\"atac\",\n",
" is_reverse=True,\n",
" primer_id=\"atac-nextera_read1\",\n",
")\n",
"\n",
"# Read 2\n",
"assay.add_read(\n",
" read_id=\"atac-R2\",\n",
" fastq=\"/data/kzhang/projects/workshop/data/PS_DNA01-1_S2_L001_R3_001.fq.zst\",\n",
" modality=\"atac\",\n",
" is_reverse=True,\n",
" primer_id=\"atac-nextera_read2\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can visualize the SeqSpec object again after adding the ATAC data. Note that read information is added to the tree."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\n",
"├── rna(153-1150)\n",
"│ ├── rna-illumina_p5(29)\n",
"│ ├── rna-truseq_read1(29)\n",
"│ ├── rna-cell_barcode(16)\n",
"│ ├── rna-umi(12)\n",
"│ ├── rna-cDNA(1-1000)\n",
"│ ├── rna-truseq_read2(34)\n",
"│ ├── rna-index7(8)\n",
"│ └── rna-illumina_p7(24)\n",
"└── atac(153-1150)\n",
" ├── atac-illumina_p5(29)\n",
" ├── atac-cell_barcode(16)\n",
" ├── atac-linker(8)\n",
" ├── atac-nextera_read1(33) [↓atac-R1(150), ↑atac-I1(24)]\n",
" ├── atac-gDNA(1-1000)\n",
" ├── atac-nextera_read2(34) [↑atac-R2(150)]\n",
" ├── atac-index7(8)\n",
" └── atac-illumina_p7(24)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"assay"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can save the SeqSpec object to a yaml file and load it next time."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"with open(\"10x_rna_atac.yaml\", \"w\") as f:\n",
" f.write(assay.to_yaml())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now start reads mapping and create the fragment file. If you do not have the genome index, you can create it using the `make_genome_index` function."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Counting barcodes...\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[32mINFO \u001b[0m cached_path::cache\u001b[90m]\u001b[0m Cached version of https://teichlab.github.io/scg_lib_structs/data/10X-Genomics/atac_737K-arc-v1.txt.gz is up-to-date\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:39Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R1) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Found 12387063 barcodes. 94.19% of them have an exact match in whitelist\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[32mINFO \u001b[0m precellar::align\u001b[90m]\u001b[0m Aligning reads...\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R1) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n",
"\u001b[90m[\u001b[0m2024-09-30T16:24:46Z \u001b[33mWARN \u001b[0m seqspec\u001b[90m]\u001b[0m Reads (atac-R2) may contain additional bases downstream of the variable-length region, e.g., adapter sequences.\n",
"100%|██████████| 12387063/12387063 [05:29<00:00, 37567.71it/s]"
]
}
],
"source": [
"qc = precellar.align(\n",
" assay, \"/data/kzhang/genome/mm10\",\n",
" modality=\"atac\",\n",
" output_fragment=\"atac_fragments.tsv.zst\",\n",
" num_threads=32,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"QC metrics: "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'frac_q30_bases_barcode': 0.9219054690365263, 'frac_nonnuclear': 7.443249461151526e-05, 'frac_valid_barcode': 0.9815396111249293, 'frac_unmapped': 0.008819422106194463, 'frac_confidently_mapped': 0.9292046064511015, 'frac_fragment_in_nucleosome_free_region': 4.518862917979508e-05, 'frac_duplicates': 0.33183332695206896, 'sequenced_reads': 24774126.0, 'frac_q30_bases_read1': 0.9173089515515771, 'frac_q30_bases_read2': 0.9229901653577338, 'sequenced_read_pairs': 12387063.0, 'frac_fragment_flanking_single_nucleosome': 2.912761037492815e-05}\n"
]
}
],
"source": [
"print(qc)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Binary file added .doctrees/tutorials/index.doctree
Binary file not shown.
Binary file added .doctrees/tutorials/txg_multiome.doctree
Binary file not shown.
Empty file added .nojekyll
Empty file.
Loading

0 comments on commit 3fc2554

Please sign in to comment.