Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential parents with a relative cooccurence < 1.0 are never accepted #8

Open
frederic-mahe opened this issue Apr 24, 2020 · 1 comment

Comments

@frederic-mahe
Copy link

Dear @tobiasgf

I think there is a undesired behavior with minimum_ratio when minimum_ratio_type == "min", and to some degree when minimum_ratio_type == "avg". It happens in the code below:

          if (relative_cooccurence >= minimum_relative_cooccurence) {
            cat(paste0(" which is sufficient!"), file = log_con)
            if (minimum_ratio_type == "avg") {
              relative_abundance <-
                mean(otutable[line2, ][daughter_samples > 0]/
                       daughter_samples[daughter_samples > 0])
              cat(paste0("\n", "------mean avg abundance: ",
                         relative_abundance), file = log_con)
            } else {
              relative_abundance <-
                min(otutable[line2, ][daughter_samples > 0]/
                      daughter_samples[daughter_samples > 0])
              cat(paste0("\n", "------min avg abundance: ",
                         relative_abundance), file = log_con)
            }

When minimum_relative_cooccurence is smaller than 1.0 (default is 0.95), the potential parent OTU can be absent from some samples where the daughter OTU is present. A missing parent means that the ratio for that particular sample will be zero. This is were the issue is, the way minimum_ratio is computed, it will identify the samples where the father is missing as the ones with the smallest ratio (zero). This is below minimum_ratio and the father OTU is rejected, despite using minimum_relative_cooccurence.

To solve that, the minimum_ratio should be searched among the non-null ratio values, not all ratio values.

When minimum_ratio_type == "avg", I think the average should be calculated on non-zeros ratio values too.

@frederic-mahe
Copy link
Author

From the README:

If the number of samples where the 'potential parent' has a positive presence is below the minimum_relative_cooccurence (default 95%, meaning that 1 in 20 samples are allowed to have no parent presence), the 'potential parent' is rejected.

Contrary to what is indicated in the README, it seems that when using the default minimum_ratio_type = min, cases when the parent is missing in one or more samples are always rejected by lulu (no merging). A parent missing in one sample means that the minimum_ratio value is null, which leads to rejection. In consequence, changing the minimum_relative_cooccurence` value has no effect.

I wrote a toy-example to demonstrate this behavior:

library(lulu)
library(dplyr)  # lulu requires dplyr

matchlist_name <- data.frame(x = "B" , y = "A", z = 99.0)
otutable_name <- data.frame(
    row.names = c("A", "B"),
    s01 = c(0, 1), # <= 'B' present, 'A' absent
    s02 = c(9, 1),
    s03 = c(9, 1),
    s04 = c(9, 1),
    s05 = c(9, 1),
    s06 = c(9, 1),
    s07 = c(9, 1),
    s08 = c(9, 1),
    s09 = c(9, 1),
    s10 = c(9, 1),
    s11 = c(9, 1),
    s12 = c(9, 1),
    s13 = c(9, 1),
    s14 = c(9, 1),
    s15 = c(9, 1),
    s16 = c(9, 1),
    s17 = c(9, 1),
    s18 = c(9, 1),
    s19 = c(9, 1),
    s20 = c(9, 1),
    s21 = c(9, 1),
    s22 = c(9, 0)) # <= 'A' present, 'B' absent

## Note that parent 'A' is absent in one sample where 'B is present',
## and present in one sample where 'B' is absent (same total
## spread). The relative co-occurence is 20 / 21 = 0.95238, which is
## greater than 0.95, the default threshold value.

## bug: no merging with default parameters
## (minimum_ratio = 0)
lulu::lulu(otutable_name, matchlist_name)

## with minimum_ratio_type = "avg": merging as expected
## (minimum_ratio (average) is 8.57)
lulu::lulu(otutable_name, matchlist_name, minimum_ratio_type = "avg")

quit(save = "no")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant