Skip to content

Commit

Permalink
Merge pull request #34 from pinellolab/dev
Browse files Browse the repository at this point in the history
update tutorial docs, fix tiling assignment for create-screen output
  • Loading branch information
jykr authored May 7, 2024
2 parents 5738498 + 295ff4c commit e6660b0
Show file tree
Hide file tree
Showing 8 changed files with 125 additions and 61 deletions.
2 changes: 2 additions & 0 deletions bean/framework/ReporterScreen.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,8 @@ def __init__(
self.uns["target_base_changes"] = target_base_changes
if tiling is not None:
self.uns["tiling"] = tiling
elif "tiling" not in self.uns:
self.uns["tiling"] = False

@property
def X_edits(self):
Expand Down
8 changes: 6 additions & 2 deletions bean/qc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,13 @@ def check_args(args):
raise ValueError(
f"Specified --condition-col `{args.condition_col}` does not exist in ReporterScreen.samples.columns ({bdata.samples.columns}). Please check your input."
)
if not bdata.tiling and args.target_pos_col not in bdata.guides.columns:
print("ARGS.TILING", args.tiling)
if (
(args.tiling is not None and args.tiling == False)
or (args.tiling is None and not bdata.tiling)
) and args.target_pos_col not in bdata.guides.columns:
raise ValueError(
f"Specified --target-pos-col `{args.target_pos_col}` does not exist in ReporterScreen.guides.columns ({bdata.guides.columns}). Please check your input."
f"Specified --target-pos-col `{args.target_pos_col}` does not exist in ReporterScreen.guides.columns ({bdata.guides.columns}). Please check your input. (--tiling {args.tiling}, ReporterScreen.tiling: {bdata.tiling})"
)
if args.posctrl_col and args.posctrl_col not in bdata.guides.columns:
raise ValueError(
Expand Down
4 changes: 2 additions & 2 deletions docs/_create-screen.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
bean create-screen gRNA_library.csv sample_list.csv gRNA_counts_table.csv
```
## Input
* gRNA_library.csv ([`variant` screen example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/var_mini_guides.csv), [`tiling` screen example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info_tiling_chrom.csv))
* sample_list.csv ([`sorting` screen example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list_survival.csv), [`variant` screen example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/var_mini_samples.csv))
* gRNA_library.csv (`variant` screen [example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/var_mini_guides.csv), `tiling` screen [example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/test_guide_info_tiling_chrom.csv))
* sample_list.csv (`sorting` screen [example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/sample_list_survival.csv), `variant` screen [example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/var_mini_samples.csv))
* gRNA_counts_table.csv: Table with gRNA ID in the first column and sample IDs as the column names (first row)
`gRNA_library.csv` and `sample_list.csv` should be formatted as :ref:`input`. ([example](https://github.com/pinellolab/crispr-bean/blob/main/tests/data/var_mini_counts.csv))
54 changes: 33 additions & 21 deletions docs/_tutorial_cds.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,36 +16,40 @@ Tiling screen that tiles gRNA densely across locus or multiple loci, selected ba

## Example workflow
```bash
screen_id=my_sorting_tiling_screen
working_dir=my_workdir
screen_id=tiling_mini_screen
working_dir=tests/workdir

# 1. Count gRNA & reporter
bean count-samples \
--input ${working_dir}/sample_list_tiling.csv `# Contains fastq file path; see test file for example.`\
-b A `# Base A is edited (into G)` \
-f ${working_dir}/test_guide_info_tiling_chrom.csv `# Contains gRNA metadata; see test file for example.`\
-o $working_dir `# Output directory` \
-r `# Quantify reporter edits` \
-n ${screen_id} `# ID of the screen` \
--input ${working_dir}/sample_list_tiling.csv `# Contains fastq file path; see test file for example.`\
-b A `# Base A is edited (into G)` \
-f ${working_dir}/test_guide_info_tiling_chrom.csv `# Contains gRNA metadata; see test file for example.`\
-o $working_dir `# Output directory` \
-r `# Quantify reporter edits` \
-n ${screen_id} `# ID of the screen` \
--tiling
# count-samples output from above test run is too low in read depth. Downstream processes can be run with test file included in the Github repo.

# (Optional) Profile editing patterns
bean profile tests/data/${screen_id}.h5ad --pam-col '5-nt PAM'

# 2. QC samples & guides
bean qc \
${working_dir}/bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
${working_dir}/${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r ${working_dir}/qc_report_${screen_id} `# Prefix for QC report` \

# 3. Filter & translate alleles
bean filter ${working_dir}/bean_count_${screen_id}_masked.h5ad \
-o ${working_dir}/bean_count_${screen_id}_alleleFiltered \
bean filter ${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/${screen_id}_alleleFiltered \
--filter-target-basechange `# Filter based on intended base changes. If -b A was provided in bean count, filters for A>G edit. If -b C was provided, filters for C>T edit.`\
--filter-window --edit-start-pos 0 --edit-end-pos 19 `# Filter based on editing window in spacer position within reporter.`\
--filter-allele-proportion 0.1 --filter-sample-proportion 0.3 `#Filter based on allele proportion larger than 0.1 in at least 0.3 (30%) of the control samples.` \
--translate --translate-genes-list ${working_dir}/gene_symbols.txt

# 4. Quantify variant effect
bean run sorting tiling \
${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
${working_dir}/${screen_id}_alleleFiltered.h5ad \
-o $working_dir \
--fit-negctrl \
--scale-by-acc \
Expand All @@ -71,13 +75,21 @@ bean count-samples \

Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`.

## (Optional) Profile editing pattern (:ref:`profile`)
You can profile the pattern of base editing based on the allele counts.
```bash
bean profile tests/data/${screen_id}.h5ad --pam-col '5-nt PAM'
```
Check the editing window, and consider feeding the start/end position of the editing window with the maximal editing rate into `bean qc` with `--edit-start-pos`, `--edit-end-pos` arguments.


## 2. QC (:ref:`qc`)
Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)

```bash
bean qc \
${working_dir}/bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
${working_dir}/${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r ${working_dir}/qc_report_${screen_id} `# Prefix for QC report` \
[--tiling] `# Not required if you have passed --tiling in counting step`
```
Expand All @@ -103,8 +115,8 @@ where `path_to_gene_names_file.txt` has one gene symbol per line, and gene symbo
Example allele filtering given we're translating based on MANE transcript exons of multiple gene symbols:

```bash
bean filter ${working_dir}/bean_count_${screen_id}_masked.h5ad \
-o ${working_dir}/bean_count_${screen_id}_alleleFiltered \
bean filter ${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/${screen_id}_alleleFiltered \
--filter-target-basechange `# Filter based on intended base changes. If -b A was provided in bean count, filters for A>G edit. If -b C was provided, filters for C>T edit.`\
--filter-window --edit-start-pos 0 --edit-end-pos 19 `# Filter based on editing window in spacer position within reporter.`\
--filter-allele-proportion 0.1 --filter-sample-proportion 0.3 `#Filter based on allele proportion larger than 0.1 in at least 0.3 (30%) of the control samples.` \
Expand All @@ -122,7 +134,7 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count

```bash
bean run sorting tiling \
${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
${working_dir}/${screen_id}_alleleFiltered.h5ad \
-o $working_dir \
--fit-negctrl \
--scale-by-acc \
Expand All @@ -133,7 +145,7 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count

```bash
bean run sorting tiling \
${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
${working_dir}/${screen_id}_alleleFiltered.h5ad \
-o $working_dir \
--fit-negctrl \
--scale-by-acc \
Expand All @@ -144,7 +156,7 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count

```bash
bean run sorting tiling \
${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
${working_dir}/${screen_id}_alleleFiltered.h5ad \
-o $working_dir \
--fit-negctrl
```
Expand All @@ -154,7 +166,7 @@ By default, `bean run [sorting,survival] tiling` uses most filtered allele count
```bash
bean run sorting tiling \
${working_dir}/bean_count_${screen_id}_alleleFiltered.h5ad \
${working_dir}/${screen_id}_alleleFiltered.h5ad \
-o $working_dir \
--fit-negctrl \
--uniform-edit
Expand Down
38 changes: 25 additions & 13 deletions docs/_tutorial_gwas.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,36 @@ GWAS variant screen with per-variant gRNA tiling design, selected based on FACS

## Example workflow
```bash
screen_id=my_sorting_tiling_screen
working_dir=my_workdir
screen_id=var_mini_screen
working_dir=tests/data/

# 1. Count gRNA & reporter
bean count-samples \
--input ${working_dir}//sample_list.csv `# Contains fastq file path; see test file for example.`\
--input ${working_dir}/sample_list.csv `# Contains fastq file path; see test file for example.`\
-b A `# Base A is edited (into G)` \
-f ${working_dir}/test_guide_info.csv `# Contains gRNA metadata; see test file for example.`\
-o ./ `# Output directory` \
-r `# Quantify reporter edits` \
-n ${screen_id} `# ID of the screen to be counted`
# count-samples output from above test run is too low in read depth. Downstream processes can be run with test file included in the Github repo.

# (Optional) Profile editing patterns
bean profile tests/data/${screen_id}.h5ad --pam-col '5-nt PAM'

# 2. QC samples & guides
bean qc \
${working_dir}/bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r ${working_dir}/qc_report_${screen_id} `# Prefix for QC report` \
-b ` # Remove replicates with no good samples.
-b ` # Remove replicates with no good samples.
# 3. Quantify variant effect
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--fit-negctrl \
--scale-by-acc \
--accessibility-col accessibility
--acc-bw-path tests/data/accessibility_signal_chr6.bw
```

See more details below.
Expand All @@ -62,18 +66,26 @@ bean count-samples \

Make sure you follow the [input file format](https://pinellolab.github.io/crispr-bean/input.html) for seamless downstream steps. This will produce `./bean_count_${screen_id}.h5ad`.


## (Optional) Profile editing pattern (:ref:`profile`)
You can profile the pattern of base editing based on the allele counts.

```bash
bean profile tests/data/${screen_id}.h5ad --pam-col '5-nt PAM'
```


## 2. QC samples & guides (:ref:`qc`)
Base editing data will include QC about editing efficiency. As QC uses predefined column names and values, beware to follow the [input file guideline](https://pinellolab.github.io/crispr-bean/input.html), but you can change the parameters with the full argument list of [bean qc](https://pinellolab.github.io/crispr-bean/qc.html). (Common factors you may want to tweak is `--ctrl-cond=bulk` and `--lfc-conds=top,bot` if you have different sample condition labels.)

```bash
bean qc \
bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r qc_report_${screen_id} `# Prefix for QC report`
```



If the data does not include reporter editing data, you can provide `--no-editing` flag to omit the editing rate QC.


Expand All @@ -84,7 +96,7 @@ If the data does not include reporter editing data, you can provide `--no-editin
If your gRNA metadata table (`${working_dir}/test_guide_info.csv` above) included per-gRNA accessibility score,
```bash
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--fit-negctrl \
--scale-by-acc \
Expand All @@ -95,7 +107,7 @@ If the data does not include reporter editing data, you can provide `--no-editin

```bash
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--fit-negctrl \
--scale-by-acc \
Expand All @@ -108,7 +120,7 @@ If the data does not include reporter editing data, you can provide `--no-editin

```bash
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--fit-negctrl
```
Expand All @@ -118,7 +130,7 @@ If the data does not include reporter editing data, you can provide `--no-editin

```bash
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--fit-negctrl \
--uniform-edit
Expand Down
18 changes: 9 additions & 9 deletions docs/_tutorial_no_edit.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,25 +17,25 @@ Here, we consider an example where BEAN uses the **external count** without repo

## Example workflow
```bash
screen_id=my_sorting_screen
working_dir=my_workdir
screen_id=var_mini_screen_noedit
working_dir=tests/data/

# 1. Given that we have gRNA count for each sample, generate ReporterScreen (.h5ad) object for downstream analysis.
bean create-screen ${working_dir}/gRNA_info.csv ${working_dir}/sample_list.csv ${working_dir}/gRNA_counts.csv -o ${working_dir}/bean_count_${screen_id}
bean create-screen ${working_dir}/gRNA_info.csv ${working_dir}/sample_list.csv ${working_dir}/var_mini_counts.csv -o ${working_dir}/${screen_id}

# 2. QC samples & guides
bean qc \
${working_dir}/bean_count_${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/bean_count_${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r ${working_dir}/qc_report_${screen_id} `# Prefix for QC report` \
-b ` # Remove replicates with no good samples.
${working_dir}/${screen_id}.h5ad `# Input ReporterScreen .h5ad file path` \
-o ${working_dir}/${screen_id}_masked.h5ad `# Output ReporterScreen .h5ad file path` \
-r ${working_dir}/qc_report_${screen_id} `# Prefix for QC report` \
-b ` # Remove replicates with no good samples.
# 3. Quantify variant effect
bean run sorting variant \
${working_dir}/bean_count_${screen_id}_masked.h5ad \
${working_dir}/${screen_id}_masked.h5ad \
-o ${working_dir}/ \
--uniform-edit --ignore-bcmatch `# As we have no edit/reporter information.` \
[--fit-negctrl [--negctrl-col target_group --negctrl-col-value NegCtrl]] `# If you have the negative control gRNAs.`
[--fit-negctrl [--negctrl-col target_group --negctrl-col-value NegCtrl]] `# If you have the negative control gRNAs.`
```
## Input file spec
Expand Down
Loading

0 comments on commit e6660b0

Please sign in to comment.