Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Internal error in vec_slice_impl(): Unexpected NULL #5749

Closed
leachandrew opened this issue Feb 10, 2021 · 14 comments
Closed

Error: Internal error in vec_slice_impl(): Unexpected NULL #5749

leachandrew opened this issue Feb 10, 2021 · 14 comments
Labels
bug an unexpected problem or unintended behavior vctrs ↗️

Comments

@leachandrew
Copy link

This is a similar issue to that raised by #5189 that @hadley closed. Since that issue didn't have a reprex, and this is an issue which didn't previously exist using this code with this API, I think it's worth another look.

@DavisVaughan replied here that advising the API to fix the NULL might be one course of action #5189 (comment)

This is previously functional code that now seems to throw an error when an API provides variables that are read as NULL, as I can no longer use bind_rows to move from a list to a data frame. Previous iterations of this code worked fine, so not sure what the change is - as far as I can tell, the API was not altered from previous versions. The last time I know for sure that this code ran without this error was early 2019, and I am only just revisiting it now for a new project.

A quick reproduction of the error:


library(httr)
library(tidyverse)
#reproducible example with a form from Kent Marketing Petroleum Survey
# The code is used recursively build the set of feasible options in a series of drop-down menus from 
 https://charting.kentgroupltd.com/

kent_report_types<-function(product_id=1){
  # for testing, use product_id=1
  #product_id=1
  #For Each Product ID, get a report type
  #Product Report Types
  report_types<-content(POST(
    url = "https://charting.kentgroupltd.com/Charting/GetReportTypes",
      query = list(
      ProductID= product_id
    )
  ))
  
  #what I used to be able to do and would like to be able to do again
  #report_types %>% bind_rows() %>% filter(Value!=0)
  
  #now, that creates this error
#> report_types %>% bind_rows() %>% filter(Value!=0)
#Error: Internal error in `vec_slice_impl()`: Unexpected `NULL`.
# Run `rlang::last_error()` to see where the error occurred.

  # what I have to do now to work around it
  text<-sapply(report_types, function(x) x[[4]])
  value<-sapply(report_types, function(x) x[[5]])
  report_types<-tibble(text,value) %>% filter(value!=0) %>% mutate(product_id=product_id)
  report_types
}

#run the function
kent_report_types(1)

As far as I can tell, the API has not changed from when I wrote the original version of this code using bind_rows, but the API definitely sends a NULL variable:
image

I'd welcome any suggestions of a better work-around or a way to omit the NULL types from the list at reading.

@batpigandme
Copy link
Contributor

batpigandme commented Feb 10, 2021

Could you run a reprex with the broken code? I'm trying to reproduce to get the traceback, but am having a little trouble recreating your original function.

OK, got it (didn't comment out all of the new code)

rlang::last_error()
# <error/rlang_error>
# Internal error in `vec_slice_impl()`: Unexpected `NULL`.
# Backtrace:
#  1. global::kent_report_types(1)
#  4. dplyr::bind_rows(.)
#  5. vctrs::vec_rbind(!!!dots, .names_to = .id)
# Run `rlang::last_trace()` to see the full context.
rlang::last_trace()
# <error/rlang_error>
# Internal error in `vec_slice_impl()`: Unexpected `NULL`.
# Backtrace:
#
#  1. ├─global::kent_report_types(1)
#  2. │ └─report_types %>% bind_rows() %>% filter(Value != 0) ~/.active-rstudio-document:20:4
#  3. ├─dplyr::filter(., Value != 0)
#  4. └─dplyr::bind_rows(.)
#  5.   └─vctrs::vec_rbind(!!!dots, .names_to = .id)
#  6.     └─(function () ...

@leachandrew
Copy link
Author

This is what I get when I run it:

image

@batpigandme
Copy link
Contributor

Yeah, using the reprex package https://reprex.tidyverse.org/ allows us to run the code with nicely formatted input and output. Highly recommend! The broken code is just as (if not more) informative as the working code, which is why I wanted to run the code that threw the error (see the traceback in my earlier comment).

@leachandrew
Copy link
Author

Oh, I didn't know that. Exploring now. Thanks, Mara!

In the meantime, here's my output with the error after uncommenting the older code:

image

@batpigandme
Copy link
Contributor

Yep, same as mine (see #5749 (comment)), I just got it into copy-pastable form which is much easier to read (and added the backtraces)! 😄

@leachandrew
Copy link
Author

This is now meta-procrastination. I decided to re-up this code to procrastinate from writing, and now I can learn reprex to procrastinate from fixing my procrastination code!

@batpigandme
Copy link
Contributor

Honestly, reprex is truly THE MOST VALUABLE package for me (obviously I'm biased by my job). But I guarantee you serious ROI!

@romainfrancois
Copy link
Member

Here's a more minimal reprex that captures the issue:

library(dplyr, warn.conflicts = FALSE)

bind_rows(
  list(x = 1, y = NULL), 
  list(x = 2, y = NULL)
)
#> Error: Internal error in `vec_slice_impl()`: Unexpected `NULL`.

Created on 2021-02-11 by the reprex package (v0.3.0)

@romainfrancois
Copy link
Member

Perhaps vctrs::vec_rbind() could do a better job of promoting this error to something more user friendly (ping @lionel-, @DavisVaughan), but otherwise it is something that can be dealt with before going to dplyr::bind_rows():

library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(vctrs, warn.conflicts = FALSE)

data <- list(
  list(x = 1, y = NULL), 
  list(x = 2, y = NULL)
)

rm_nulls <- function(data) {
  data <- map(data, function(.x) {
    if (vec_is_list(.x)) {
      .x[!map_lgl(.x, is.null)]
    }
  })
  data
}
data %>% 
  rm_nulls() %>% 
  bind_rows()
#> # A tibble: 2 x 1
#>       x
#>   <dbl>
#> 1     1
#> 2     2

Created on 2021-02-11 by the reprex package (v0.3.0)

@DavisVaughan
Copy link
Member

@romainfrancois it is possible that instead of new_data_frame(x) here, you could use data_frame(!!!x):

dplyr/R/bind.r

Line 131 in a30c9e4

.x <- new_data_frame(as.list(.x))

That uses df_list() which would drop the NULL values altogether. Be aware that it also splices data frame columns and recycles using tidyverse recycling, so it may or may not be the right fit for what you want here.

Essentially that new_data_frame() call is making corrupt data frames because it doesn't do any validity checks

library(vctrs)

x1 <- list(x = 1, y = NULL)
x2 <- list(x = 2:3, y = NULL)

# corrupt data frames, shouldn't have a `NULL` columns
new_data_frame(x1)
#>   x    y
#> 1 1 NULL
new_data_frame(x2)
#> Warning in format.data.frame(if (omit) x[seq_len(n0), , drop = FALSE] else x, :
#> corrupt data frame: columns will be truncated or padded with NAs
#>   x    y
#> 1 2 NULL
#> 2 3 <NA>

vec_rbind(
  new_data_frame(x1),
  new_data_frame(x2)
)
#> Error: Internal error in `vec_slice_impl()`: Unexpected `NULL`.

vec_rbind(
  data_frame(!!!x1),
  data_frame(!!!x2)
)
#>   x
#> 1 1
#> 2 2
#> 3 3

Created on 2021-02-18 by the reprex package (v1.0.0)

@avsdev-cw
Copy link

avsdev-cw commented Apr 12, 2021

Could the rm_nulls function by @romainfrancois be added to dplyr? Or maybe a replace_nulls or nulls_to_na? Pulling in JSON files is an easy way to end up with NULLS on a frequent basis and causes me to re-implement the rm_nulls function in each project.

@lionel-
Copy link
Member

lionel- commented Apr 12, 2021

As Davis pointed out, dplyr is generating invalid data frames which is the problem. vec_rbind() should not treat this error as a normal user error since this sort of data frames are only created when there is a bug. Also the reprex should be added to to dplyr's set of unit tests.

@lionel- lionel- reopened this Apr 12, 2021
@hadley
Copy link
Member

hadley commented Apr 19, 2021

So bind_rows() is creating corrupt data frames because it's using the low-level constructor instead of the higher-level constructor. Making this change should only have localised performance consequences because it only arises when bind_rows() converts a list to a data frame. (I think this is just a legacy of the poorly thought out behaviour of bind_rows() that we're now stuck with)

@abdelhaknezzari
Copy link

data.table::rbindlist worked for me as an alternative

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior vctrs ↗️
Projects
None yet
Development

No branches or pull requests

8 participants