Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show values (sums) in venn diagram #14

Open
williamlai2 opened this issue Feb 2, 2021 · 6 comments
Open

Show values (sums) in venn diagram #14

williamlai2 opened this issue Feb 2, 2021 · 6 comments

Comments

@williamlai2
Copy link
Contributor

It would be good to be able to show values.

Something like this (not sure if I am representing this correctly):

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

venn <- c(a[1], # A
          a[2] + b[1], # AB
          b[2]) # B

> venn
[1] 1 4 2
@yanlinlin82
Copy link
Owner

Sorry, I am confused about the example. Do the numbers in 'a' and 'b' vectors mean element counts? If so, why a[2] and b[1] are not equal? Could you specify your idea more concretely?

@williamlai2
Copy link
Contributor Author

Thanks for getting back to me. Imagine that they are dollars in groups, but the groups overlap (I am working with custom industry classifications and want to show the output for overlapping groups).

@yanlinlin82
Copy link
Owner

In the example:

a <- c(1, 3) # A, AB
b <- c(1, 2) # AB, B

Since both a[2] and b[1] are AB, why not code like this:

a <- c(1, 3+1) # A, AB
b <- c(3+1, 2) # AB, B

Could you please provide a real example to explain why AB in two vectors are different?

@williamlai2
Copy link
Contributor Author

williamlai2 commented Feb 2, 2021

Lets say that the numbers are jobs.

df <- structure(list(a = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 468, 0, 0, 0, 1446, 
                           3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 198, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           922, 249, 0, 0, 2060, 93, 0, 605, 274, 24, 161, 417, 122, 3, 
                           1560, 0, 3, 0, 0, 55, 73, 363, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 433, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 576, 34, 0, 0, 0, 0, 22, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1821, 4433, 
                           19062, 0, 0, 0, 0, 0, 0, 873, 89, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                           0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     b = c(0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 468, 0, 0, 0, 1446, 3, 0, 0, 1043, 1593, 0, 0, 0, 742, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          2137, 198, 284, 1181, 14588, 100, 340, 1558, 211, 6431, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 2022, 30939, 39, 169, 1845, 1811, 6088, 
                          2323, 1241, 1311, 13009, 1617, 6857, 0, 81, 63, 0, 124, 1642, 
                          537, 27404, 237, 1393, 1657, 0, 0, 620, 360, 152, 2922, 922, 
                          249, 410, 295, 2060, 93, 1724, 605, 274, 24, 161, 417, 122, 3, 
                          1560, 312, 3, 1785, 1053, 55, 73, 363, 13912, 1126, 0, 0, 217, 
                          626, 0, 10, 0, 0, 0, 0, 0, 0, 0, 108, 2635, 0, 0, 15, 0, 0, 6, 
                          135, 3, 0, 0, 0, 0, 830, 0, 0, 102, 0, 0, 397, 0, 0, 0, 0, 258, 
                          0, 0, 13, 128, 0, 0, 0, 0, 29, 0, 419, 0, 0, 0, 28, 0, 91, 0, 
                          0, 0, 0, 0, 0, 137, 0, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3158, 0, 0, 0, 0, 0, 0, 0, 
                          0, 2392, 0, 0, 0, 0, 0, 0, 13979, 1821, 4433, 19062, 1282, 7825, 
                          18692, 10279, 902, 1140, 873, 89, 5215, 951, 220, 529, 9144, 
                          712, 4212, 8, 630, 233, 538, 5747, 1780, 11, 7314, 1073, 16007, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 177, 358, 0, 563, 1006, 0, 0, 0, 1848, 
                          0, 281, 0, 1052, 0, 0, 0, 0, 0, 825, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
                     c = c(10623, 25707, 3343, 279, 
                          4007, 5372, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2199, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 168, 0, 0, 0, 102, 749, 85, 3110, 157, 648, 3204, 520, 
                          96, 50, 106, 846, 181, 290, 162, 183, 1, 337, 700, 191, 81, 23, 
                          378, 25, 93, 14, 459, 181, 257, 680, 802, 0, 1349, 10, 419, 306, 
                          1895, 167, 54, 908, 1252, 226, 177, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                          0, 0, 0, 0, 0, 0, 0)), 
                row.names = c(NA, -443L), 
                class = c("tbl_df", "tbl", "data.frame"))


library(ggvenn)
x <- list(`A` = df$a,
          `B` = df$b,
          `C` = df$c)

ggvenn(x,
       c("A", "B", "C"),
       show_percentage = FALSE)  

The venn diagram shows the intersect of counts, rather than the sum of the jobs. Does that make sense?

image

Looking at the code in your package, you have a show_elements argument. It would just be the sum of that if the items are numeric.

@yanlinlin82
Copy link
Owner

Thanks for the code! I understand now.

There are two ways to use ggvenn. One is using list as input, and the other is using data.frame.

In the former case (list), ggvenn treats list elements (x$A, x$B, x$C) as sets. So same values between sets will be counted into intersection. For example:

ggvenn(list(A = c(1,2,3,4), B = c(1,5,6)), show_percentage = FALSE)

Its result is exact the same as:

ggvenn(list(A = c("A","B","C","D"), B = c("A","E","F")), show_percentage = FALSE)

For the same reason, duplicated elements will be removed before plotting:

ggvenn(list(A = c(1,1,1,2,3,4), B = c(1,5,6,6,6)), show_percentage = FALSE) 

The output plot is the same.

In your example above, all zeros will be merged as one element before plotting. I guess treating numeric vectors as counts may lead to more confusion. I am not sure if an explicit argument (such as 'number_as_count') could help or not.

In the latter case (input as 'data.frame'), ggvenn so far picks up only logical columns for plotting. Your suggestion of treating numeric values as counts (and counting sum) is more intuitive and indeed a good idea, something like (using 'df' directly, rather than constructing another list 'x'):

ggvenn(df, c("a", "b", "c"))  # pick numeric columns

How do you think?

@williamlai2
Copy link
Contributor Author

williamlai2 commented Feb 3, 2021

Thanks for the explanation. It could be an option like you have mentioned.

With the data, it would be something like this:

a <- df$a
b <- df$b
c <- df$c

A <- sum(as.numeric(setdiff(a, union(b,c))))
B <- sum(as.numeric(setdiff(b, union(a,c))))
C <- sum(as.numeric(setdiff(c, union(a,b))))
AB <- sum(as.numeric(setdiff(intersect(a,b),c)))
AC <- sum(as.numeric(setdiff(intersect(a,c),b)))
BC <- sum(as.numeric(setdiff(intersect(b,c),a)))
ABC <- sum(as.numeric(intersect(intersect(a,b),c)))
sum_ABC <- A + B + C + AB + AC + BC + ABC

Edit: Actually this won't work as it is set difference and duplicates don't count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants