kircherlab · visze · Jan 2, 2025 · Sep 19, 2024 · Oct 17, 2024 · Oct 28, 2024
diff --git a/release-please-config.json → .github/release-please-config.json b/release-please-config.json → .github/release-please-config.json
@@ -8,8 +8,10 @@
             "release-type": "simple",
             "bump-minor-pre-major": true,
             "bump-patch-for-minor-pre-major": true,
-            "draft": true,
-            "prerelease": true
+            "draft": false,
+            "prerelease": true,
+            "tag-prefix": "v",
+            "include-component-in-tag": false
         }
     }
 }
diff --git a/.github/workflows/release-please.yml b/.github/workflows/release-please.yml
@@ -14,3 +14,8 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - uses: googleapis/release-please-action@v4
+        with:
+          token: ${{ secrets.GITHUB_TOKEN }}
+          target-branch: ${{ github.ref_name }}
+          config-file: .github/release-please-config.json
+          manifest-file: .release-please-manifest.json
diff --git a/.release-please-manifest.json b/.release-please-manifest.json
@@ -1,3 +1,3 @@
 {
-  ".": "0.3.0"
+  ".": "0.3.1"
 }
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,12 @@
 # Changelog
 
+## [0.3.1](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.3.0...MPRAsnakeflow-v0.3.1) (2024-12-17)
+
+
+### Bug Fixes
+
+* Wrong experiment count plots in QC report ([#149](https://github.com/kircherlab/MPRAsnakeflow/issues/149)) ([d2be468](https://github.com/kircherlab/MPRAsnakeflow/commit/d2be46891650ff9aaab61f750a4b3bc3b65e3e88))
+
 ## [0.3.0](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.2.0...MPRAsnakeflow-v0.3.0) (2024-11-20)
 
 

diff --git a/docs/assignment.rst b/docs/assignment.rst
@@ -59,6 +59,10 @@ Example of an assignment file using exact matches and read 1 with BC, linker and
 .. literalinclude:: ../config/example_assignment_exact_linker.yaml
    :language: yaml
 
+
+If you want to use the strand sensitivity option (e.g. testing enhancer in both directions), you can add the following to the config file: :code:`strand_sensitive: {enable: true}`. Otherwise, MPRAsnakeflow will give you an error because it cannot handle the same sequences in both sense and antisense directions. This is an issue with the mappers because they do not consider the strand and will always call your read ambiguous due to multiple matches.
+
+
 snakemake
 ============================
 

diff --git a/docs/cluster.rst b/docs/cluster.rst
@@ -22,6 +22,33 @@ Having 30 cores and 10GB of memory.
 
     snakemake --sdm conda --configfile config/config.yaml -c 30 --resources mem_mb=10000  --workflow-profile profiles/default
 
+Performance tweaks: Running specific rules with different resources
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Some of the rule swill benefit from multithreading or more memory. This can be specified within your profile, worflow profile or in the command line interface using :code:`--set-resources RULE_NAME:RESOURCE_NAME=VALUE` or :code:`---set-threads RULE_NAME=VALUE`. Before changing resources make sure that you really need the rule by running a dry run getting the list of executed rules only::code:`snakemamake -n --quiet rules`.
+
+Possible rules to tweaks:
+
+:Assignment:
+
+    :assignment_hybridFWRead_get_reads_by_cutadapt:
+        Only needed when using linker option in config. You can add more threads using :code:`--set-threads assignment_hybridFWRead_get_reads_by_cutadapt=4`. Default is always 1 thread.
+
+    :assignment_mapping_bbmap:
+        Only needed when using bbmap for mapping. Memory and threads can be optimized e.g. via :code:`--set-threads assignment_mapping_bbmap=30 --set-resources assignment_mapping_bbmap:mem_mb=10000`. Default is 1 thread and 4GB memory but we recommend to use 30 threads and 10GB if available.
+
+    :assignment_mapping_bwa:
+        Only needed when using bwa for mapping. Memory and threads can be optimized e.g. via :code:`--set-threads assignment_mapping_bwa=30 --set-resources assignment_mapping_bwa:mem_mb:10000`. Default is 1 thread but we recommend to use 30 threads and 10GB if available.
+
+    :assignment_collectBCs:
+        Threads can be optimized e.g. via :code:`--set-threads assignment_collectBCs=30`. Default is 1 thread but we recommend to use 30 threads if available.
+
+:Experiment:
+
+    :counts_onlyFW_raw_counts_by_cutadapt:
+        Only needed when you have only FW reads and use the adapter option. Threads can be optimized e.g. via :code:`--set-threads counts_onlyFW_raw_counts_by_cutadapt=30`. Default is 1 thread.
+
+
 Running on an HPC using SLURM
 -----------------------------
 

diff --git a/docs/config.rst b/docs/config.rst
@@ -4,11 +4,11 @@
 Config File
 =====================
 
-The config file is a yaml file that contains the configuration. Different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`version` (version of MPRAsnakeflow used), :code:`assignments` (assigment workflow), and :code:`experiments` (count workflow). This is a full example file with default configurations. :download:`config/example_config.yaml <../config/example_config.yaml>`.
+The config file is a yaml file that contains the configuration. Different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`version` (version of MPRAsnakeflow used), :code:`assignments` (assignment workflow), and :code:`experiments` (count workflow). This is a full example file with default configurations. :download:`config/example_config.yaml <../config/example_config.yaml>`.
 
 .. literalinclude:: ../config/example_config.yaml
-   :language: yaml
-   :linenos:
+    :language: yaml
+    :linenos:
 
 
 Note that the config file is controlled by json schema. This means that the config file is validated against the schema. If the config file is not valid, the program will exit with an error message. The schema is located in :download:`workflow/schemas/config.schema.yaml <../workflow/schemas/config.schema.yaml>`.
@@ -17,15 +17,15 @@ Note that the config file is controlled by json schema. This means that the conf
 Version settings
 ----------------
 
-Set the version of the of MPRAsnakeflow this configuration is used. This is important for future updates. The version is used to check if the config file is compatible with the current version of the workflow. If the version is not the same the workflow will exit with an error message.
+Set the version of the MPRAsnakeflow this configuration is used. This is important for future updates. The version is used to check if the config file is compatible with the current version of the workflow. If the version is not the same the workflow will exit with an error message.
 
 .. literalinclude:: ../workflow/schemas/config.schema.yaml
-   :language: yaml
-   :start-after: start_version
-   :end-before: start_assignments
+    :language: yaml
+    :start-after: start_version
+    :end-before: start_assignments
 
 :version:
-    A a string like "0.2.0" or "1.2". When major version "0" is used the minor version should fit with MPRAsnakeflow, e.g. "0.2.0" is compatible with MPRAsnakeflow 0.2.0. as well as 0.2.1 or 0.2.2. When major version greater 0 used then the major version have to fith with MPRAsnakeflow. E.g. config of "1.2.1" fits also with MPRAsnakeflow 1.7 or 1.0.
+     A string like "0.2.0" or "1.2". When major version "0" is used the minor version should fit with MPRAsnakeflow, e.g. "0.2.0" is compatible with MPRAsnakeflow 0.2.0. as well as 0.2.1 or 0.2.2. When major version greater than 0 is used then the major version has to fit with MPRAsnakeflow. E.g. config of "1.2.1" fits also with MPRAsnakeflow 1.7 or 1.0.
 
 --------------------
 Assignment workflow
@@ -95,6 +95,16 @@ For each assignment you want to process you have to give him a name like :code:`
         (Optional) Using a simple dictionary to find identical sequences. This is faster but uses only the whole (or center part depending on start/length) of the design file. Cannot find substrings as part of any sequence. Set to false for more correct, but slower, search. Default :code:`true`.
     :sequence_collitions:
         (Optional) Check if there are identical sequences in the design file. Default :code:`true`.
+:strand_sensitive:
+    (Optional) If is enabled the reads are mapped to the oligos in a strand-sensitive way by adding unique adapters to both ends of the oligo reference as well as the FASTQ files. Then MPRASnakeflow is able to distiguish between sense and antisense. By default this option is not enabled.
+
+    :enable:
+        (Optional) If set to :code:`true` the strand-sensitive mapping is enabled. Default is :code:`false`.
+    :forward_adapter:
+        (Optional) Adapter sequence added 5' of the oligo. Default is :code:`AGGACCGGATCAACT`.
+    :reverse_adapter:
+        (Optional) Adapter sequence added 3' of the oligo. Default is :code:`TCGGTTCACGCAATG`.
+
 
 :configs:
     After mapping the reads to the design file and extracting the barcodes per oligo, the configuration (using different names) can be used to generate multiple filtering and configuration settings of the final mapping oligo to barcode. Use `<your_config_name>: {}` to use the default values for the keys. Each configuration is a dictionary with the following keys:

diff --git a/docs/experiment.rst b/docs/experiment.rst
@@ -31,6 +31,9 @@ We allow different flavours of experiment files because sometimes no UMI exists
     * :code:`Condition,Replicate,DNA_BC_F,RNA_BC_F`
 
 
+It is possible to use only one count experiment per condition across replicates (DNA or RNA, but usually only DNA can make sense). E.g. if you expect the same number of inserts/transfections across replicates. If you use the same files for :code:`DNA` or :code:`RNA` MPRAsnakeflow will only run the first replicate and use the counts for all replicates later.
+
+
 Assignment File or configuration
 --------------------------------
 Tab separated gzipped file with barcode mapped to sequence. Can be generated using the :ref:`Assignment` workflow. Config file must be configured similar to this:

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -7,32 +7,29 @@ Frequently Asked Questions
 If you have more question please write us a ticket on `github <https://github.com/kircherlab/MPRAsnakeflow/issues>`_.
 
 
-Is it possible to differntiate beteween sense and antisense?
-    No! Or not directly. The reason why we are not able to do this is that reads will map to both sequence strands equally. Then assignment of the barcode becomes ambigous and is discarded. But when dsigning oligos you can add short sequence fragment on the start and on the end of the sequence that ar edifferent sense and antisense. These sequences should not be trimmed away during demultiplexing and have to be in the design file. For the lentiMPRA dsign we have 15bp adpaters on both ends for integration of the sequence. They can be used for that purpose.
+Is it possible to differentiate between sense and antisense?
+    Usually not because reads will map to both sequence strands equally. Then assignment of the barcode becomes ambiguous and is discarded. But we have a workaround that will add unique sequence adapters to both ends to the oligos, for the reference fasta and the fastqs. Now all mapping strategies should be able to differentiate between sense and antisense. To enable use the config :code:`strand_sensitive: {enable: true}`.
 
-The design/reference file check faild, why?
+The design/reference file check failed, why?
     The design file has to have:
-        * Unique headers. Each sequence has to have a unique sequence/id strating from :code:`>` to the first whitespace or newline.
-        * No special characters within the headers. This is because mapping tools create a reference dictionary and cannot handle all characters. In addition most databases (like SRA) have their restricted character set for the header.
-        * Unique sequences. They have to be different. Otherwise mapper place the read to both IDs and the barcode get ambigous and is discarded. Wenn you allow min/max start/lengths for sequences (e.g. in BWA mapping) be aware that the smalles substring has to be unqiue across all other (sub) sequences.
-
+
+    * Unique headers. Each sequence has to have a unique sequence/id starting from :code:`>` to the first whitespace or newline.
+    * No special characters within the headers. This is because mapping tools create a reference dictionary and cannot handle all characters. In addition, most databases (like SRA) have their restricted character set for the header.
+    * Unique sequences. They have to be different in sense and antisense directions. Otherwise, the mapper places the read to both IDs and the barcode gets ambiguous and is discarded. When you allow min/max start/lengths for sequences (e.g. in BWA mapping) be aware that the smallest substring has to be unique across all other (sub) sequences. If you have antisense collisions and want to keep the strand sensitivity you can enable it by using the option :code:`strand_sensitive: {enable: true}` in the config file (see question before).
 
 MPRAsnakeflow is not able to create a Conda environment
-    If you get a message like::
+    If you get a message like:
 
         Caused by: json.decoder.JSONDecodeError: Extra data: line 1 column 2785 (char 2784)#
 
-    Try to do the following steps ::
+    Try to do the following steps:
 
         rm -r .snakemake/metadata .snakemake/incomplete
 
-    Afterwards try MPRAsnakeflow again. If the above error still occurs, rerun after deleting the entire ``.snakemake`` folder.
-
-
+    Afterwards try MPRAsnakeflow again. If the above error still occurs, rerun after deleting the entire :code:`.snakemake` folder.
 
 Can I use STARR-seq with MPRAsnakeflow?
     No! Not yet ;-)
 
-
 The pipeline is giving an error **"BUG: Out of jobs ready to be started, but not all files built yet."** and won't run. How can I fix this?
-    Please update snakemake, as this error is highly likely to have occured from snakemake internal issues. 
+    Please update snakemake, as this error is highly likely to have occurred from snakemake internal issues.
diff --git a/resources/count_basic/config.yml b/resources/count_basic/config.yml
@@ -12,6 +12,7 @@ experiments:
         type: file
         assignment_file: SRR10800986_barcodes_to_coords.tsv.gz
     design_file: design.fa
+    label_file: labels.tsv
     configs:
       default: {}
       outlierZscore: