Skip to content

cxli233/customized_upset_plots

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customized Upset Plots

DOI

This repository contains scripts to produce customized upset plots, an alternative to Venn diagrams.

  • Author: Chenxin Li Ph.D., Assistant Research Scientist at Department of Crop & Soil Sciences and Center for Applied Genetic Technologies, University of Georgia.

  • Contact: Chenxin.Li@uga.edu | @ChenxinLi2

The Scripts/ directory contains .Rmd files that generate the graphics shown below. It requires R, RStudio, and the rmarkdown package.

Use bar lengths to present set or subset sizes

In Venn diagrams, we use area to represent set or subset sizes. However, I have found it much easier to discern different lengths than different area.

Example with 4 sets

This is a workflow for set/intersect visualization using UpSet plots. The upstream segment of the workflow (intersect size determination) is based on the re-implementation of UpSetR by ComplexHeatmap. List, data frame, and plot handling was provided by the tidyverse. Lastly, construction of composite plots is provided by patchwork.

In traditional upset plots, intersects/subsets are indicated by dots. When two dots are connected by a line, it represents the distinct intersect between the two sets. Set and intersect sizes are then represented by bars.

The workflow produces customized upset plot where intersects/subsets are indicated by a heatmap. The customized upset plot has 4 parts:

  • The upper left shows the total set sizes.
  • The upper right is legend/color scheme.
  • The lower left is a matrix showing subsets. E.g., when Set 1 and Set 2 are colored, it means the intersection of Set 1 & Set 2, but not in any other sets.
  • The lower right shows the sizes of subsets.

Subsetting which intersect to show

With upset plot, you can subset which intersect to show. E.g., if I only want to show intersects involving Set 3, I can do that.

Example only involving Set 3

Extending the upset plot to visualize other variables

In addition, upset plots can be extended. Mean separation plots (e.g., box plot, bar plot) and annotations (heatmaps) can be added to the sides of the upset plot using patchwork.

Example with extended lower right corner

Try it out with real data!

I also provided some example data. Data from Li et al., 2020, Genome Research

Example with real data

Dependencies

library(tidyverse) 
library(patchwork) 
library(ComplexHeatmap)

library(RVenn) # Only required if you want Venn diagrams 
library(RColorBrewer) # This is for the colors only, not actually necessary

Auxiliary dependencies:

  • For 2-3 sets, Venn diagrams can be made readily using the RVenn package. The ggVenn() function from RVenn produces a ggplot object that is a Venn Diagram.
  • The official way to install ComplexHeatmap is via devtools::install_github("jokergoo/ComplexHeatmap"), which requires the devtools package.
  • For mean separation plots, a suggested package is ggbeeswarm, a violin plot, but with actual data points.
  • For color palettes, suggested are viridis and RColorBrewer packages.
  • If you want to save plot as .svg file, you may need the R package svglite. If you are using Mac, you may need to install XQuart.

Getting started

Here are example scripts for 3 sets. The workflow is scalable to more sets, as intersect size calculation is automatic (provided by ComplexHeatmap). However, as the number of sets increases, the number of subsets increases geometrically, and thus filtering for subset of interest will be important. The easiest way to use this workflow is copy the code from this README file, or download one of the .Rmd files from the Scripts/ folder. Then modify the code to suit your data and taste.

Data

my_list <- list(
  data1 = letters[1:10], 
  data2 = letters[3:13], 
  data3 = letters[6:18])

The required input is a list of vectors.

If you want a Venn diagram

my_object <- RVenn::Venn(my_list)

ggvenn(
  my_object, slice = 1:3, 
  thickness = 0.5,
  alpha = 0.5, 
  fill = brewer.pal(8, "Set2")
) +
  theme_void() +
  theme(
    legend.position = "none"
  )

ggsave("../Results/VennDiagram_quick_start.svg", height = 4, width = 4, bg = "white")
ggsave("../Results/VennDiagram_quick_start.png", height = 4, width = 4, bg = "white")

example venn diagram

ggVenn() only goes up to 3 sets. For more sets, it is better to use upset plot.

ComplexHeatmap for heavy lifting

comb_mat <- make_comb_mat(my_list)
my_names <- set_name(comb_mat)

make_comb_mat() from ComplexHeatmap calculate intersect/subset sizes. make_comb_mat() produces a matrix object from the list of vectors. The matrix itself can be filtered for intersects/subsets of interst.
For examples, see this .Rmd file at section "Subsetting the intersects".

The rest of code is to produce the 4 pieces that make up a customized upset plot. Since every step along the way is customizable, the result can be highly tailored towards the needs and taste of the user.

Total set size

my_set_sizes <- set_size(comb_mat) %>% 
  as.data.frame() %>% 
  rename(sizes = ".") %>% 
  mutate(Set = row.names(.)) 

p1 <- my_set_sizes %>% 
  mutate(Set = reorder(Set, sizes)) %>% 
  ggplot(aes(x = Set, y= sizes)) +
  geom_bar(stat = "identity", aes(fill = Set), alpha = 0.8, width = 0.7) +
  geom_text(aes(label = sizes), 
            size = 5, angle = 90, hjust = 0, y = 1) +
  scale_fill_manual(values = brewer.pal(4, "Set2"),  # feel free to use some other colors  
                     limits = my_names) + 
  labs(x = NULL,
       y = "Set size",
       fill = NULL) +
  theme_classic() +
  theme(legend.position = "right",
        text = element_text(size= 14),
        axis.ticks.y = element_blank(),
        axis.text = element_blank()
        ) 

Legend

It's not easy to extract legend. But we can write a function for that.

get_legend <- function(p) {
  tmp <- ggplot_gtable(ggplot_build(p))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  legend
}

p2 <- get_legend(p1)

Overlap sizes

my_overlap_sizes <- comb_size(comb_mat) %>% 
  as.data.frame() %>% 
  rename(overlap_sizes = ".") %>% 
  mutate(category = row.names(.))

p3 <- my_overlap_sizes %>% 
  mutate(category = reorder(category, -overlap_sizes)) %>% 
  ggplot(aes(x = category, y = overlap_sizes)) +
  geom_bar(stat = "identity", fill = "grey80", color = NA, alpha = 0.8, width = 0.7) +
  geom_text(aes(label = overlap_sizes, y = 0), 
            size = 5, hjust = 0, vjust = 0.5) +
  labs(y = "Intersect sizes",
       x = NULL) +
  theme_classic() +
  theme(text = element_text(size= 14, color = "black"),
        axis.text =element_blank(),
        axis.ticks.x = element_blank(),
        axis.title.x = element_text(hjust = 0),
        ) +
  coord_flip()

Overlap matrix

my_overlap_matrix <- str_split(string = my_overlap_sizes$category, pattern = "", simplify = T) %>% 
  as.data.frame() 

colnames(my_overlap_matrix) <- my_names

my_overlap_matrix_tidy <- my_overlap_matrix %>% 
  cbind(category = my_overlap_sizes$category) %>% 
  pivot_longer(cols = !category, names_to = "Set", values_to = "value") %>% 
  full_join(my_overlap_sizes, by = "category") %>% 
  full_join(my_set_sizes, by = "Set")

p4 <- my_overlap_matrix_tidy %>% 
  mutate(category = reorder(category, -overlap_sizes)) %>%  
  mutate(Set = reorder(Set, sizes)) %>%  
  ggplot(aes(x = Set, y = category))+
  geom_tile(aes(fill = Set, alpha = value), color = "grey30", size = 1) +
  scale_fill_manual(values = brewer.pal(4, "Set2"), # feel free to use other colors 
                    limits = my_names) +
  scale_alpha_manual(values = c(0.8, 0),  # color the grid for 1, don't color for 0. 
                     limits = c("1", "0")) +
  labs(x = "Sets",  
       y = "Overlap") +
  theme_minimal() +
  theme(legend.position = "none",
        text = element_text(color = "black", size= 14),
        panel.grid = element_blank(),
        axis.text = element_blank()
        )

Put them together

wrap_plots(p1, p2, p4, p3, 
          nrow = 2, 
          ncol = 2,
          heights = c(1, 2), # the more rows in the lower part, the longer it should be
          widths = c(1, 0.8),
          guides = "collect") &
  theme(legend.position = "none")

ggsave("../Results/quick_start.svg", height = 3.5, width = 3, bg = "white") 
# this should be a tall & skinny plot 
# I prefer .svg, but you can also save as phd or png 
# I will open up the .svg file and mannually adjust the size until it's good
# check that nothing is cut off from the plot 
# png is for twitter posting 
ggsave("../Results/quick_start.png", height = 3.5, width = 3, bg = "white")

quick start

Conclusions

I hope you like it and find it pretty. If you use this code for a publication, I'd greatly appreciate if you can cite or acknowledge this repository. DOI: 10.5281/zenodo.7555525