Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add some base/S3 classes #434

Merged
merged 19 commits into from
Oct 7, 2024
Merged

add some base/S3 classes #434

merged 19 commits into from
Oct 7, 2024

Conversation

lawremi
Copy link
Collaborator

@lawremi lawremi commented Sep 1, 2024

This adds base/S3 classes for name, call, language (union of name and call), POSIXt (abstract), POSIXlt, matrix, array and formula. Still need to make tests.

The matrix/array validation seems somewhat pointless, since it is really hard to create an invalid one given the C-level checking during attribute setting.

Coming up with default values for some of these was fun. Interested in comments, of course.

@TimTaylor TimTaylor mentioned this pull request Sep 6, 2024
@mmaechler mmaechler marked this pull request as ready for review September 8, 2024 12:48
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use >= : we do want to allow 0x0 but also nx0 etc. NB same for array validation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is referring to the checks on dim(). Yes, good catch. The error message even says "non-negative." Of course, it is impossible to set the "dim" attribute on any R object to a negative value outside of C. I guess this would catch alternative matrix implementations that override dim() and end up in an invalid state, which is really their problem, not ours.

# Define onload to avoid dependencies between files
on_load_define_union_classes <- function() {
class_numeric <<- new_union(class_integer, class_double)
class_atomic <<- new_union(class_logical, class_numeric, class_complex, class_character, class_raw)
class_vector <<- new_union(class_atomic, class_expression, class_list)
class_language <<- new_union(class_name, class_call)
Copy link
Member

@t-kalinowski t-kalinowski Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the class_language union type include class_expression?

Genuine question. base::is.language(expression()) returns TRUE, but I also can see why we might want the S7 class_language union not to include class_expression.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we shouldn't blindly copy base R here because IIRC the use of "language" is pretty inconsistent. It might be better to not bother with a union and instead force folks to be explicit. e.g. if you're trying to express "things that can appear in a parse tree generated from text", NULL and atomic vectors of length of 1 are also acceptable. And if you're trying to express "things that can appear in a parse tree", you could also encounter more complex inlined objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the class_language union type include class_expression?

Genuine question. base::is.language(expression()) returns TRUE, but I also can see why we might want the S7 class_language union not to include class_expression.

Yes, important question: Note that R's own C level isLanguage() *is* quite different from base is.language()already (with a "big Note" in the source code insrc/main/coerce.c.. ), but it also includes NILSXP in addition to LANGSXP. The help(is.language)example, i.e.,example(is.language)` shows a bit the conceptual "mess":

> ll <- list(a = expression(x^2 - 2*x + 1), b = as.name("Jim"),
             c = as.expression(exp(1)), d = call("sin", pi))
> sapply(ll, typeof)
          a            b            c            d 
"expression"     "symbol" "expression"   "language" 
> sapply(ll, mode)
           a            b            c            d 
"expression"       "name" "expression"       "call" 
> stopifnot(sapply(ll, is.language))

I'm not sure we should introduce yet another definition of "language" but rather the same as is.language, i.e., including class_expression.
Still, it may make sense to provide a class_<....> definition for the union of class_name and class_call
.. being aware that these two are actually also not the only constituents of an expression: That additionally may contain "atomic constants" such as TRUE, 3.14, or "NZ" .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"things that can appear in a parse tree"

What if, instead of this, we frame the definition as: "things that are not reflected unmodified by base::eval()".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For computing on the language looking at the C level eval() would probably be better. That treats EXPRSXP objects as self-evaluating, unlike base::eval() which does something I don't care for but we are stuck with. For C eval() the only no-self-evaluating things are SYMSXP, LANGSXP, PROMSXP and BCODESXP , and ideally the latter two should not escape to R level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just taking the cue from methods::getClass("language"), but I like the justification above.

R/S3.R Outdated
#' @order 3
class_formula <- new_S3_class("formula",
constructor = function(.data = NULL, env = parent.frame()) {
stats::formula(.data, env)
Copy link
Member

@t-kalinowski t-kalinowski Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats::formula() is itself an S3 generic. Should the default constructor for class_formula dispatch a generic? This breaks the pattern established with other classes we could choose to dispatch for, but don't, such as POSIXct, factor, array, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a problem with it. It's the only abstraction available. The other weird thing about formula is that it is the only "base" class not defined by the base package.

t-kalinowski

This comment was marked as resolved.

Copy link
Member

@t-kalinowski t-kalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Only one small question below.

R/S3.R Outdated
#' @format NULL
#' @order 3
class_matrix <- new_S3_class("matrix",
constructor = function(.data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nrow can default to length(.data)?

The nrow = 1 default in base::matrix() is a bit of a lie, because there is also a missing() check in the C code. The current constructor emits a warning and drops vector elements, given the convenient case of wanting to turn an length>1 atomic vector into a 1-col matrix.

E.g,

matrix(1:4)
#>      [,1]
#> [1,]    1
#> [2,]    2
#> [3,]    3
#> [4,]    4
matrix(1:4, 1, 1)
#> Warning in matrix(1:4, 1, 1): data length differs from size of matrix: [4 != 1
#> x 1]
#>      [,1]
#> [1,]    1


constructor = function(.data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) {
  matrix(.data, nrow, ncol, byrow, dimnames)
}

constructor(1:4)
#> Warning in matrix(.data, nrow, ncol, byrow, dimnames): data length differs from
#> size of matrix: [4 != 1 x 1]
#>      [,1]
#> [1,]    1

@t-kalinowski t-kalinowski added this to the v0.2.0 milestone Sep 16, 2024
@t-kalinowski
Copy link
Member

Still need to add tests.

lawremi and others added 4 commits September 21, 2024 12:02
…classes(), which is renamed to S4_basic_classes() for consistency with S4's terminology. Looking up S3 classes required a slight change to S4_to_S7_class() and the corresponding test. It is worth recognizing that this mechanism is now a general mapping from `class(x)[1]` to the corresponding `class_` object and could be used to implement a class_for_name() and even class_for_object().
…ethods that lack a corresponding `class_` object.
R/S3.R Outdated Show resolved Hide resolved
@t-kalinowski
Copy link
Member

@lawremi, I've added tests and additional fixes. I've also changed the default constructor for some base types. It would be great if you could take a look. (GitHub won't let me "request a review" from you because you opened the original PR.)

Barring any further discussion, I believe this is ready to merge.

R/S3.R Outdated Show resolved Hide resolved
@t-kalinowski t-kalinowski merged commit d804fe8 into main Oct 7, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants