Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About ggplot2 and tidy data #7

Open
Enchufa2 opened this issue Jul 9, 2019 · 5 comments
Open

About ggplot2 and tidy data #7

Enchufa2 opened this issue Jul 9, 2019 · 5 comments

Comments

@Enchufa2
Copy link

Enchufa2 commented Jul 9, 2019

Just one comment about these statements about ggplot2:

I don't consider it part of the Tidyverse, having been developed well before Tidy and thematically unrelated.

RStudio counts ggplot2 as being part of the Tidyverse, but it was developed much earlier, and does not follow the Tidy philosophy.

I don't think it's thematically unrelated, I do think it follows the philosophy. First of all, ggplot2 was designed to receive the input in (Hadley's) tidy form, even before it was called tidy. I believe this fact shaped the idea of tidy data, which culminated in Hadley's Tidy Data paper (JSS 2014), and that was in fact the seed for the Tidyverse.

@nicholasjhorton
Copy link

ggplot2 is an elegant system for professional graphics. But it has a number of features that are at odds with the overall tidyverse philosophy (and Hadley has publicly acknowledged these). I'd suggest noting that ggplot2 takes tidy data as input (though lattice and base graphics do as well).

@matloff
Copy link
Owner

matloff commented Jul 10, 2019

If one takes the definition of "tidy" to mean "row/colum" data frames, then 99% of R is "tidy." The term then becomes meaningless. The ggplot2 package is no more "tidy" than is lm().

@Enchufa2
Copy link
Author

I find myself constantly tidying and untidying data from modelling to visualisation and back to modelling again, because many modelling functions need all the features in columns (the model matrix), but ggplot2 needs many of them folded in long format, in order to be assigned to a layer. That's especially true for factors. The lm interface is pretty tidy in that sense, yes, but many are not.

@matloff
Copy link
Owner

matloff commented Jul 10, 2019 via email

@drag05
Copy link

drag05 commented Sep 17, 2021

@Enchufa2

Is there need for tidying and untidying? This example below could result in modeling and plotting at the same time. Data format remains unchanged:

dt = as.data.table(iris)
lapply(
        list('loess', 'glm', 'lm'), 
                 function(i) {
                               dt[, ggplot(.SD, aes(Petal.Length, Sepal.Length)) + 
                               geom_point() + 
                               geom_smooth(aes(color = Species), method = i)]
                             }
                )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants