Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements: seq breaks symbols and scale #85

Closed
iferres opened this issue Sep 15, 2021 · 7 comments
Closed

Enhancements: seq breaks symbols and scale #85

iferres opened this issue Sep 15, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@iferres
Copy link

iferres commented Sep 15, 2021

Hi again!

Not reporting a bug, just to suggest a couple of enhancements for future releases.

  1. To be able to add seq breaks symbols as in this comment (two parallel lines ~45 degrees at the beginning/end of each break).

  2. To be able to draw a scale but not the whole x axis. This is specially useful when using focus() since it doesn't make sense to draw an axis for truncated contigs. Instead, using a small scale to compare relative sizes as usually done in phylogeny figures would be nice. For example see ggtree::geom_treescale. It is probably possible using ggplot2 magic, but it would be nice to have an example in the documentation.

Sorry for the spam :P
Bests!

@thackl
Copy link
Owner

thackl commented Sep 19, 2021

Definitely good ideas! Challenge accepted ;)

I'm still playing around with some ideas. Would be curious about your thoughts on the following:

gggenomes(emale_genes, emale_seqs) %>% focus(name=="MCP") +
  geom_seq() + geom_gene() + geom_gene_tag(aes(label=name)) +
  geom_break() + # add // add ends of truncated seqs (alternatively: geom_seq(breaks=TRUE))
  geom_scale_bar() + no_x_axis() # add a scale bar

image

@thackl thackl self-assigned this Sep 19, 2021
@thackl thackl added the enhancement New feature or request label Sep 19, 2021
@iferres
Copy link
Author

iferres commented Sep 20, 2021

Looks good!

I'm not sure about the geom_break(), since it only make sense when focus()ing, don't you think? How about focus(add_breaks=TRUE), or something like that? I'm not an expert on ggplot2's grammar, but I can't see the what would be its behaviour if don't wrapped into a focus call.

Regarding the scale bar, it also looks very good! Here I link the ggtree approach, which makes use of a custom theme if you want to remove the x axis. May be it serves you as inspiration. Using similar approaches probably helps users to find what they saw in other packages. I don't remember gggenes's approach on this, but probably theme_genes() is doing the trick.

Thank you for your interest!

@thackl
Copy link
Owner

thackl commented Sep 21, 2021

You don't necessarily need focus() to truncate sequences. A truncated sequence is defined by having - in addition to a length - a start >1 and/or an end < length. You can also manually set that to illustrate some more complex situations, see the example below.

s0 <- tribble(
   # start/end define regions, i.e. truncated contigs
  ~bin_id, ~seq_id, ~length, ~start, ~end,
  "complete_genome", "chromosome_1_long_trunc_2side", 1e5, 1e4, 2.1e4,
  "fragmented_assembly", "contig_1_trunc_1side", 1.3e4, .9e4, 1.3e4,
  "fragmented_assembly", "contig_2_short_complete", 0.3e4, 1, 0.3e4,
  "fragmented_assembly", "contig_3_trunc_2sides", 2e4, 1e4, 1.4e4
)

l0 <- tribble(
  ~seq_id, ~start, ~end, ~seq_id2, ~start2, ~end2,
  "chromosome_1_long_trunc_2side", 1.1e4, 1.4e4, 
    "contig_1_trunc_1side", 1e4, 1.3e4,
  "chromosome_1_long_trunc_2side", 1.4e4, 1.7e4,
    "contig_2_short_complete", 1, 0.3e4,
  "chromosome_1_long_trunc_2side", 1.7e4, 2e4,
    "contig_3_trunc_2sides", 1e4, 1.3e4
)

gggenomes(seqs=s0, links=l0) +
  geom_seq() + geom_break() + geom_seq_label(nudge_y=-.05) + geom_link()

image

focus() computes start/end for sequences based on some criteria. It also does not plot by itself. It is like mutate() for a tibble. It just adds/modifies start/end columns for sequences in a gggenomes object (and filters unused sequences). That's why focus(add_breaks) would not make sense (it always computes breaks, but has nothing to do with plotting them)

geom_break() adds breaks at the ends of truncated sequences. On a plot without any truncated sequences, it would plot nothing. It could, however, be shortened to geom_seq(add_breaks=TRUE) to automatically add breaks to every truncated sequence that is drawn. The drawback of that approach, it would not be possible to further manipulate the breaks - change the icon, size, color, .... But it would be faster. So maybe it might make sense to have both options - geom_seq(add_breaks=TRUE) for default breaks and geom_break() for customized breaks.

gggenomes also uses a custom theme (theme_gggenomes). no_x_axis() is just a wrapper around functions to manipulate the theme. It would also be possible to create a theme_gggenomes_no_x_axis(). Alternatively, one can also just use theme_void() to remove everything.

no_x_axis <- function (){
  theme(axis.line.x = element_blank(), axis.title.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank())
}

Suppressing the axis could be made part of geom_scale_bar(remove_x_axis=TRUE or so, to automatically suppress the axis if the scalebar is used. However, I feel like removing the axis explicitly makes it more transparent.

@iferres
Copy link
Author

iferres commented Sep 21, 2021

Ah I see, now makes sense to me to have geom_break(). Thanks again for taking your time to explain it.

Regarding the scale bar, my two cents:

... + 
   theme_gggenomes_scalebar()

?

@thackl
Copy link
Owner

thackl commented Sep 21, 2021

Thank you for taking the time to give feedback! Really appreciated. The theme option sounds good! I'll try to add this to the next release.

@iferres
Copy link
Author

iferres commented Oct 8, 2021

I assume the following feature request is not trivial at all, but have you considered ... + coord_polar() to draw circularized contigs? Playing with the package (and diving into the source code) I found that the following kinda works:

library(gggenomes) 

s0 <- tibble(
  gene_id = letters[1:6],
  bin_id = c("A", "A", "B", "B", "B", "B"),
  seq_id = factor(c("A1", "A1", "B1", "B1", "B2", "B2"), levels = c("A1", "B2", "B1")), # set factor to order contigs
  feat_id = c("a1","a2","b3", "b4", "b1", "b2"),
  start = c(1, 20, 1, 50, 1, 20),
  end = c(10, 30, 40, 70, 10, 30),
  strand = c(1, 1, 1, 1, 1, 1),
  length = c(1000, 1000, 1000, 1000, 1000, 1000)
)

gggenomes(s0) + 
  geom_seq() + 
  gggenomes:::geom_gene2() + 
  coord_polar() # + 
  # facet_wrap(~bin_id)

but I guess is experimental and there's a lot to work with to make it stable and user friendly, isn't it?

@thackl thackl mentioned this issue Oct 8, 2021
@thackl
Copy link
Owner

thackl commented Oct 8, 2021

I've opened this as a separate issue so I can easier keep track

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants