-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
list
sub-class with format()
method prints full contents
#5948
Comments
Hi Jesse, thanks for writing. Is there a way to fix your issue in a way that maintains backward compatibility with the existing print output? One of the guiding principles of data.table is stability/back-compatibility https://github.com/Rdatatable/data.table/blob/master/GOVERNANCE.md#the-r-package so it would be much easier to accept a code contribution that does not change what is printed. |
Hey Toby, thanks for getting back. The current behavior is actually not backwards compatible - list sub-classes print identically to bare lists through v1.14.10, and it's only in v1.15.0 and later that this "print everything" behavior occurs. The proposal is essentially to revert to the previous behavior, because there are cases where the current behavior is pathological, and I'm not sure the current behavior was actually intended (I can't find anything in NEWS that mentions it). |
This formats differently because This may be a frustrating issue. I like the notion that If you go down the road of over-riding for this class, then pandora's box of "override this class and that class and that other class" may not be desired, plus what happens if/when While frustrating, I wonder if the better path to resolve this aesthetic request is to request that |
I don't think so, no. At the end of the day, we are simply following On the tibble side, both I do not think it is out of the question for you to provide |
I'm good with including that in my current PR (if that doesn't get too complicated). If yes, I'm assuming dt
# list_col list_of_col
# <list> <vctrs_list>
# 1: <data.table[3x1]> <data.table[3x1]>
# 2: <data.table[3x1]> <data.table[3x1]> I added the header class, should it be |
Okay. This is starting to become a little more than "add some rider code to a PR", so let me know (@MichaelChirico ) if this is out of scope to be appended (to an already-approved PR). I think we can do this: @@ -105,6 +105,14 @@ print.data.table = function(x, topn=getOption("datatable.print.topn"),
expression = "<expr>", ordered = "<ord>")
classes = classes1(x)
abbs = unname(class_abb[classes])
+ abb_na = is.na(abbs)
+ if (any(abb_na) && any(classes[abb_na] %in% c("vctrs_list_of", "vctrs_vctr"))) {
+ if (!requireNamespace("vctrs", quietly=TRUE)) {
+ warningf("Some columns are class 'vctrs_list_of' but package vctrs is not available. Those columns will print as regular objects. Simplify install.packages('vctrs') to obtain the vctrs_list_of format method and print the data again.")
+ } else {
+ abbs[abb_na] <- paste0("<", vapply_1c(x, vctrs::vec_ptype_abbr)[abb_na], ">")
+ }
+ }
if ( length(idx <- which(is.na(abbs))) ) abbs[idx] = paste0("<", classes[idx], ">")
toprint = rbind(abbs, toprint)
rownames(toprint)[1L] = ""
@@ -207,6 +215,10 @@ format_col.default = function(x, ...) {
format(char.trunc(x), ...) # relevant to #37
}
+format_col.vctrs_list_of = function(x, ...) {
+ vapply_1c(x, format_list_item, ...)
+}
+
# #2842 -- different columns can have different tzone, so force usage in output
format_col.POSIXct = function(x, ..., timezone=FALSE) {
if (timezone) { and @@ -199,6 +199,7 @@ export(format_col)
S3method(format_col, default)
S3method(format_col, POSIXct)
S3method(format_col, expression)
+S3method(format_col, vctrs_list_of)
export(format_list_item)
S3method(format_list_item, default)
S3method(format_list_item, data.frame) Which would result in dt
# list_col list_of_col
# <list> <list<dt[,1]>>
# 1: <data.table[3x1]> <data.table[3x1]>
# 2: <data.table[3x1]> <data.table[3x1]> |
@MichaelChirico thoughts on that? |
That will induce a Turns out there is a trivial fix for the bug at hand: #6637. |
I ran into this when converting a nested
tibble
(i.e. with a list column oftibble
s) to adata.table
. Normally,list
columns print with something like<tibble[3x1]>
indata.table
; however, these columns printed as a string displaying the entirety of the nestedtibble
s contents (in my case, ~20k rows of data pertibble
). This appears to be due to a check for the existence of aformat()
method for the column type indata.table:::format_col()
. In this case, the list columns werevctrs_list_of
class, which implements its ownformat.vctrs_list_of()
method. See below reprex for an example.Fixing this is easy enough (see below reprex for proposed solution), but it would change the default printing behavior of list-cols. Any list subclass would then have to implement a
format_list_item()
method to get special treatment. On the other hand, any list subclass could then implement that method and get special treatment.To me, defaulting to the standard list print behavior is an improvement, given that
format()
methods for list-like classes are generally not going to product output suitable for a column in adata.table
(which is whyformat_list_item()
is needed in the first place).Reprex is below, along with proposed solution. I'm happy to file a PR but wanted to make sure the change in defaults is acceptable in principle first. Thanks!
Created on 2024-02-21 with reprex v2.0.2
Session info
The text was updated successfully, but these errors were encountered: