Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

separate drops attributes of data frames #102

Closed
CameronBieganek opened this issue Sep 3, 2015 · 2 comments
Closed

separate drops attributes of data frames #102

CameronBieganek opened this issue Sep 3, 2015 · 2 comments

Comments

@CameronBieganek
Copy link

This is similar to the closed dplyr issue tidyverse/dplyr#1064. Here's a minimal example:

data_frame(
   x = c('blue:circle', 'blue:circle', 'orange:square', 'orange:square')
) %>%
   `attr<-`('val', 101) %>%
   separate(x, c('color', 'shape'), sep = ':') %>%
   str
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of  2 variables:
#  $ color: chr  "blue" "blue" "orange" "orange"
#  $ shape: chr  "circle" "circle" "square" "square"

I haven't tested any of the other tidyr functions to see if they keep or drop attributes.

@hadley
Copy link
Member

hadley commented Dec 30, 2015

The problem is that it's difficult in general to copy attributes, because you don't know if the attributes are global, per-row, or per-column.

The root cause is append_df():

append_df <- function(x, values, after = length(x)) {
  y <- append(x, values, after = after)
  class(y) <- class(x)
  attr(y, "row.names") <- attr(x, "row.names")

  y
}

which presumably loses attributes because of the subsetting + c() inside append() (and you can see I've already done a little to repair them).

I experimented with an alternative implementation:

append_df <- function(x, values, after = length(x)) {
  UseMethod("append_df")
}

#' @export
append_df.data.frame <- function(x, values, after = length(x)) {
  n <- length(x)
  if (after <= 0) {
    dplyr::bind_cols(values, x)
  } else if (after >= n) {
    dplyr::bind_cols(x, values)
  } else {
    dplyr::bind_cols(x[1L:after], values, x[(after + 1L):n])
  }
}
#' @export
append_df.tbl_dt <- function(x, values, after = length(x)) {
  tbl_dt(NextMethod())
}

but that doesn't preserve attributes, presumably because the rules for how you should combine when binding columns are non-obvious.

I also tried:

append_df <- function(x, values, after = length(x)) {
  n <- length(x)
  if (after <= 0) {
    x[] <- c(values, x)
  } else if (after >= n) {
    x[] <- c(values, x)
  } else {
    x[] <- c(x[1L:after], values, x[(after + 1L):n])
  }
  x
}

But that doesn't work because you can't use sub-assignment to increase the number of variables.

If you can think of a way to insert a column in an arbitrary position without losing attributes, I'd be happy to incorporate it

@hadley hadley closed this as completed Dec 30, 2015
@CameronBieganek
Copy link
Author

I agree that it's not always clear which attributes ought to be copied. However, for the specific case of separate, I think that it's more often correct to copy all the attributes than it is to drop them. Working under that assumption, the following seems like it should work:

append_df <- function(x, values, after = length(x)) {
   y <- append(x, values, after = after)
   ynames <- names(y)
   attributes(y) <- attributes(x)
   names(y) <- ynames    # Since the above line overwrites the correct names

   y
}

Unfortunately, I don't remember my original use case when I opened the issue. Perhaps I wanted to attach some meta-information to a data frame and then thought better of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants