Adds 'batch.rmEffect' parameter. #103

daynefiler · 2020-07-15T10:46:36Z

When the 'batch.rmEffect' is TRUE, the simulation randomly generates the batch effects to preserve the random seed, but then removes the batch effect by over-writing the generated batch effects by 1.

This allows creating simulation pairs with and without batch effects.

When the 'batch.rmEffect' is TRUE, the simulation randomly generates the batch effects to preserve the random seed, but then removes the batch effect by over-writing the generated batch effects by 1. This allows creating simulation pairs with and without batch effects.

daynefiler · 2020-07-15T10:48:49Z

Note, lines 348-349 in splat-simulate.R. For some reason, I commented out:

# batch.means.gene <- means.gene * batch.facs

I do not remember why I did that. I am guessing it should probably be uncommented?

lazappi · 2020-07-16T07:52:44Z

Hi @daynefiler

This is an interesting idea. Thanks for sharing it. I guess the idea is to use the simulation with the "removed" batch effects as the ground truth when evaluating a batch correction method? I can see how that might be useful for some things. Do you have any examples you can show of what this looks like?

lazappi · 2020-07-16T07:53:29Z

Note, lines 348-349 in splat-simulate.R. For some reason, I commented out:
# batch.means.gene <- means.gene * batch.facs
I do not remember why I did that. I am guessing it should probably be uncommented?

I'm guessing because the factors are all 1 now so there's no point multiplying them?

daynefiler · 2020-07-16T14:00:03Z

Looking, again, I think I commented it out because the batch.means.gene parameter doesn't show up anywhere else in the file. Maybe it was a holdover from previous work?

To answer your question, you are completely correct. The idea is to build truth sets for testing batch correction methodologies. Drawing the random batch effects, then overwriting them (I think) ensures the rest of the random effects stay consistent across the two datasets (with & without batch).

daynefiler · 2020-07-16T14:00:24Z

What would you like in terms of an example? I can try to put it together.

lazappi · 2020-07-17T06:36:53Z

Maybe just some pairs of t-SNEs/UMAPs coloured by batch/group with and without batch effects to start with? I guess I'm mostly curious what this looks like in practice.

…

On Thu, 16 Jul 2020 at 16:00, Dayne Filer ***@***.***> wrote: What would you like in terms of an example? I can try to put it together. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#103 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABUNCSFKD72RJ3E5MMD5D7LR34BYTANCNFSM4O2NBQNA> .

lazappi · 2020-09-28T12:34:41Z

Sorry I have forgotten about this a bit. Just checking if you are still interested in going ahead with this?

daynefiler · 2020-09-30T17:50:41Z

Hi, sorry this fell off my to-do list. I will send some examples tomorrow afternoon.

daynefiler · 2020-10-01T19:45:47Z

Hi, this illustrates the approach. Importance is using the same parameter object for each simulation, removing the batch multiplier in one.

library(splatter)
library(scater)

## Setup initial parameters
p1 <- newSplatParams()
p1 <- setParam(p1, "group.prob", c(0.2, 0.3, 0.5))
p1 <- setParam(p1, "batchCells", c(1e3, 1e3))

## Perform simulation w/ batch effect
s1 <- splatSimulate(p1, method = "groups")
s1 <- logNormCounts(s1)
s1 <- runPCA(s1)
plotPCA(s1, colour_by = "Batch", shape_by = "Group")

## Perform simulation w/o batch effect
p2 <- setParam(p1, "batch.rmEffect", TRUE)
s2 <- splatSimulate(p2, method = "groups")
s2 <- logNormCounts(s2)
s2 <- runPCA(s2)
plotPCA(s2, colour_by = "Batch", shape_by = "Group")

lazappi · 2020-10-05T12:44:41Z

Thanks for that, that's exactly what I wanted. I've had some time to think about this a bit more. You have probably been through this process already but I just wanted to double check.

For a while I thought what you were suggesting was equivalent to setting batch.facLoc and batch.facScale so that all the batch factors are 1:

p1 <- newSplatParams(group.prob = c(0.2, 0.3, 0.5), batchCells = c(1e3, 1e3))
s1 <- splatSimulate(p1, method = "groups")
s1 <- logNormCounts(s1)
s1 <- runPCA(s1)
plotPCA(s1, colour_by = "Batch", shape_by = "Group")

s2 <- splatSimulate(p1, method = "groups", batch.facLoc = log(1), batch.facScale = 0)
s2 <- logNormCounts(s2)
s2 <- runPCA(s2)
plotPCA(s2, colour_by = "Batch", shape_by = "Group")

That looks like it works but if you look for closely at the actual simulated values it's not quite true:

DataFrame with 10000 rows and 9 columns
                 Gene BaseGeneMean OutlierFactor   GeneMean BatchFacBatch1
          <character>    <numeric>     <numeric>  <numeric>      <numeric>
Gene1           Gene1    2.2696734        1.0000  2.2696734       0.987669
Gene2           Gene2    5.3948593        1.0000  5.3948593       1.088644
Gene3           Gene3    0.3470967       61.6692 63.2300864       0.994561
Gene4           Gene4    0.0364211        1.0000  0.0364211       1.179244
Gene5           Gene5    3.4357531        1.0000  3.4357531       0.840552
...               ...          ...           ...        ...            ...
Gene9996     Gene9996     0.506195             1   0.506195       1.073213
Gene9997     Gene9997     2.010122             1   2.010122       0.890210
Gene9998     Gene9998     2.581748             1   2.581748       1.220922
Gene9999     Gene9999     0.317081             1   0.317081       0.981184
Gene10000   Gene10000     1.217171             1   1.217171       1.306551
          BatchFacBatch2 DEFacGroup1 DEFacGroup2 DEFacGroup3
               <numeric>   <numeric>   <numeric>   <numeric>
Gene1           1.038262     1.00000           1     1.00000
Gene2           1.006354     1.00000           1     1.00000
Gene3           1.169444     1.00000           1     1.53158
Gene4           0.942942     1.00000           1     1.06636
Gene5           1.327734     1.11651           1     1.00000
...                  ...         ...         ...         ...
Gene9996        0.908150           1           1    1.000000
Gene9997        1.120104           1           1    1.287344
Gene9998        0.889335           1           1    0.513365
Gene9999        0.972102           1           1    1.000000
Gene10000       1.058190           1           1    1.000000

DataFrame with 10000 rows and 9 columns
                 Gene BaseGeneMean OutlierFactor   GeneMean BatchFacBatch1
          <character>    <numeric>     <numeric>  <numeric>      <numeric>
Gene1           Gene1    2.2696734        1.0000  2.2696734              1
Gene2           Gene2    5.3948593        1.0000  5.3948593              1
Gene3           Gene3    0.3470967       61.6692 63.2300864              1
Gene4           Gene4    0.0364211        1.0000  0.0364211              1
Gene5           Gene5    3.4357531        1.0000  3.4357531              1
...               ...          ...           ...        ...            ...
Gene9996     Gene9996     0.506195             1   0.506195              1
Gene9997     Gene9997     2.010122             1   2.010122              1
Gene9998     Gene9998     2.581748             1   2.581748              1
Gene9999     Gene9999     0.317081             1   0.317081              1
Gene10000   Gene10000     1.217171             1   1.217171              1
          BatchFacBatch2 DEFacGroup1 DEFacGroup2 DEFacGroup3
               <numeric>   <numeric>   <numeric>   <numeric>
Gene1                  1           1           1           1
Gene2                  1           1           1           1
Gene3                  1           1           1           1
Gene4                  1           1           1           1
Gene5                  1           1           1           1
...                  ...         ...         ...         ...
Gene9996               1    1.000000           1           1
Gene9997               1    1.000000           1           1
Gene9998               1    1.000000           1           1
Gene9999               1    0.913961           1           1
Gene10000              1    1.000000           1           1

You can see that everything before the BatchFac columns is the same but after that you get differences. I'm not exactly sure why that is but it seems that just changing a parameter is enough to mess with the random seed.

# First set of numbers
> set.seed(1)
> rnorm(10)
 [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684  0.4874291
 [8]  0.7383247  0.5757814 -0.3053884
> rlnorm(10)
 [1] 4.5348008 1.4767493 0.5372775 0.1091863 3.0800041 0.9560610 0.9839401 2.5698209
 [9] 2.2732743 1.8110401
# Changing the parameters for the first random function changes the values from the second
> set.seed(1)
> rnorm(10, mean = 1, sd = 0)
 [1] 1 1 1 1 1 1 1 1 1 1
> rlnorm(10)
 [1] 0.5344838 1.2015872 0.4336018 4.9297132 1.3902836 0.4402254 1.6281250 2.0924271
 [9] 1.7785196 0.7368371
# But if we use the same parameters we can get the first set back
> set.seed(1)
> rnorm(10)
 [1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078 -0.8204684  0.4874291
 [8]  0.7383247  0.5757814 -0.3053884
> rlnorm(10)
 [1] 4.5348008 1.4767493 0.5372775 0.1091863 3.0800041 0.9560610 0.9839401 2.5698209
 [9] 2.2732743 1.8110401

I then thought we might be able to come up some kind of splatRemoveBatch() function but I think we would run into the same seed issues.

The short story is I'm convinced this is the only way to do this 🎉 ! I have some very minor comments on the code but if you can fix those I'll be happy to merge.

lazappi

Super minor things. If you don't have time let me know and I can probably merge and fix them myself.

If you are feeling keen it would be great to have some tests for things, but only if you are feeling keen 😸.

lazappi · 2020-10-05T12:45:52Z

R/splat-simulate.R

    means.gene <- rowData(sim)$GeneMean

    for (idx in seq_len(nBatches)) {
        batch.facs <- getLNormFactors(nGenes, 1, 0.5, batch.facLoc[idx],
                                        batch.facScale[idx])
-        batch.means.gene <- means.gene * batch.facs
+        if (batch.rmEffect) batch.facs <- rep(1, length(batch.facs))
+        # batch.means.gene <- means.gene * batch.facs


Can you please just delete this line? I think this is just a hangover from some old code that isn't used any more.

lazappi · 2020-10-05T12:46:58Z

R/splat-simulate.R

    means.gene <- rowData(sim)$GeneMean

    for (idx in seq_len(nBatches)) {
        batch.facs <- getLNormFactors(nGenes, 1, 0.5, batch.facLoc[idx],
                                        batch.facScale[idx])
-        batch.means.gene <- means.gene * batch.facs
+        if (batch.rmEffect) batch.facs <- rep(1, length(batch.facs))


Can you please add brackets to this if statement? i.e.

if (...) { do things }

HelloWorldLTY · 2023-12-03T20:17:45Z

Hi, thanks for your great work!

I am curious about one case, that is, do I still need to consider batch effect information like batch.loc or batch.scale if I intend to have a dataset with no batch effect using rm.Effect? Thanks.

lazappi · 2023-12-04T09:00:10Z

Please open an issue if you have question and explain more about what you use case is.

lazappi mentioned this pull request Jul 16, 2020

batch.rmEffect parameter #102

Closed

lazappi reviewed Oct 5, 2020

View reviewed changes

lazappi added 2 commits October 25, 2020 15:03

Tidy batch.rmEffect check

bd2f403

Add batch.rmEffect to splat_params vignette

287d402

lazappi merged commit 09ebd62 into Oshlack:master Oct 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds 'batch.rmEffect' parameter. #103

Adds 'batch.rmEffect' parameter. #103

daynefiler commented Jul 15, 2020

daynefiler commented Jul 15, 2020

lazappi commented Jul 16, 2020

lazappi commented Jul 16, 2020

daynefiler commented Jul 16, 2020

daynefiler commented Jul 16, 2020

lazappi commented Jul 17, 2020 via email

lazappi commented Sep 28, 2020

daynefiler commented Sep 30, 2020

daynefiler commented Oct 1, 2020 •

edited

Loading

lazappi commented Oct 5, 2020

lazappi left a comment

lazappi Oct 5, 2020

lazappi Oct 5, 2020

HelloWorldLTY commented Dec 3, 2023

lazappi commented Dec 4, 2023

Adds 'batch.rmEffect' parameter. #103

Adds 'batch.rmEffect' parameter. #103

Conversation

daynefiler commented Jul 15, 2020

daynefiler commented Jul 15, 2020

lazappi commented Jul 16, 2020

lazappi commented Jul 16, 2020

daynefiler commented Jul 16, 2020

daynefiler commented Jul 16, 2020

lazappi commented Jul 17, 2020 via email

lazappi commented Sep 28, 2020

daynefiler commented Sep 30, 2020

daynefiler commented Oct 1, 2020 • edited Loading

lazappi commented Oct 5, 2020

lazappi left a comment

Choose a reason for hiding this comment

lazappi Oct 5, 2020

Choose a reason for hiding this comment

lazappi Oct 5, 2020

Choose a reason for hiding this comment

HelloWorldLTY commented Dec 3, 2023

lazappi commented Dec 4, 2023

daynefiler commented Oct 1, 2020 •

edited

Loading