Skip to content

stat_bin accepts functions for binwidth #1890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 26, 2017
Merged

Conversation

botanize
Copy link
Contributor

@botanize botanize commented Nov 3, 2016

Allowing binwidth to accept functions greatly improves histograms of faceted variables. Furthermore, the current default (30 bins) is somewhat arbitrary, and could be replaced with one of the three built-ins included here.

The built-ins include 'Sturgis', 'Scott', and 'FD'. These match the options included in d3. More information is available on wikipedia.

Examples are included in the pull-request, here's a motivating example:

mtlong <- reshape2::melt(mtcars)
ggplot(mtlong, aes(value)) + geom_histogram() + facet_wrap(~variable, scales = 'free_x')
ggplot(mtlong, aes(value)) + geom_histogram(binwidth = 'Sturges') + facet_wrap(~variable, scales = 'free_x')

@hadley
Copy link
Member

hadley commented Nov 4, 2016

I wouldn't mind extending bin to accept functions but I really dislike Sturgis/Scott/FD — I think they give the illusion that you're picking a good binwidth, but they're just as arbitrary as picking 30 bins.

@botanize
Copy link
Contributor Author

botanize commented Nov 4, 2016

None of the options are a default, they have to be chosen manually, which is still a level of intention beyond 30 bins.

Would you prefer to drop the built-in options entirely?

@hadley
Copy link
Member

hadley commented Nov 4, 2016

Yes please

Joey Reid added 2 commits November 3, 2016 22:14
@botanize
Copy link
Contributor Author

Is there anything else preventing you from accepting this pull request?

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions

@@ -111,9 +113,10 @@ StatBin <- ggproto("StatBin", Stat,
stop("Only one of `boundary` and `center` may be specified.", call. = FALSE)
}

if (is.null(params$breaks) && is.null(params$binwidth) && is.null(params$bins)) {
if (is.null(params$breaks) && (is.null(params$binwidth) || !(is.numeric(params$binwidth) || is.function(params$binwidth))) && is.null(params$bins)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be changed? If a function is supplied binwidth will no longer be NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that was leftover from checking for a string input as well.

@@ -130,8 +133,11 @@ StatBin <- ggproto("StatBin", Stat,
if (!is.null(breaks)) {
bins <- bin_breaks(breaks, closed)
} else if (!is.null(binwidth)) {
bins <- bin_breaks_width(scales$x$dimension(), binwidth, center = center,
boundary = boundary, closed = closed)
if (!is.numeric(binwidth)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to check for is.function() here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

bins <- bin_breaks_width(scales$x$dimension(), binwidth, center = center,
boundary = boundary, closed = closed)
if (!is.numeric(binwidth)) {
binwidth <- do.call(binwidth, list(data$x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need do.call here

@@ -22,6 +22,14 @@ test_that("bins specifies the number of bins", {
expect_equal(nrow(out(bins = 100)), 100)
})

test_that("binwidth computes widths for function input", {
df <- data.frame(x = 1:100)
out <- function(x, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the indirection of this function now.

And a simpler binwidth function would just return a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get the simpler function that just returns a constant, but I don't know what "indirection of this function" means.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was a bit unclear. I meant eliminate the out function:

out <- layer_data(ggplot(...))

@@ -16,6 +16,10 @@

## Major new features

### Binning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move this up a section?

@@ -22,6 +22,14 @@ test_that("bins specifies the number of bins", {
expect_equal(nrow(out(bins = 100)), 100)
})

test_that("binwidth computes widths for function input", {
df <- data.frame(x = 1:100)
out <- function(x, ...) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was a bit unclear. I meant eliminate the out function:

out <- layer_data(ggplot(...))

@hadley hadley merged commit 4c07082 into tidyverse:master Jan 26, 2017
@hadley
Copy link
Member

hadley commented Jan 26, 2017

Thanks!

@lock
Copy link

lock bot commented Jan 18, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jan 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants