-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
R6 v S3/S4 in HADES #82
Comments
A point on the syntax: I don't think the use of R6, in API form, has to use the So, for your above example you could have an API that is identical to the current version 2 of Capr - your functions
There would then be no difference with the syntactic sugar of R in this case. For the cases of People who prefer a java like declarative style could also just And honestly, I agree that the current API is good and shouldn't be changed. So to me this is just a point about the internal design of the package. Would R6 make it significantly better? The main reason I use R6 over S4 is because (as you say) I greatly prefer the model for encapsulation (which is the entire point of OOP design to me) . There are also some minor benefits with inheritance but most OOP developers agree that more than a single level of inheritance is often too confusing and prefer just encapsulating an internal object. Neither approach really enforces a "Safe Interface" approach like you would get in other languages (or even abstract base classes, like python has good support for). So in my opinion I find the differences between the two approaches are largely just programmer preference - in R6 you're encapsulating an object but its hardly type safe at runtime (Honestly, I still think of an R6 object as a jazzed up hash table pretending to be something more, while S4 is just a hash table with useful attributes). In conclusion, @chrisknoll and I come from a more traditional OOP background so R6 seems more natural to us. However, we're not maintaining this package... |
This sounds like a good topic for the next HADES meeting. My current two cents with all of HADES in mind: R code is hard to maintain (no strong typing, no finding errors at compile time, and lets not get into the ever-shifting dependencies). R6 might help in some areas, but also has its own limitations (e.g. can't use breakpoints in R6 methods). If I were to create a new package from scratch I might use R6, but I certainly wouldn't go back and change old code to use R6 just for the sake of slightly better OOP. |
Hi, @mdlavallee92 , I didn't mean to have #77 present a critique of the CaprAPI. At the core, the circe object model is a set of Java classes, (which is OO) and so I was just presenting the OO-view of creating instances and assigning values in an OO-way (which in R syntax would be via R6). CirceR (the R package) does not have any object model defined in it, it works only off the raw JSON string (or the list-of-lists that comes from jsonlite::parseJson). Makes it hard to programmatically build circe expressions in R (which is why we have Capr!). But CapR is solving a secondary problem of 'reusable fragments' such that when you merge a fragment (that can contain criteria and concept set), CapR 'figures it out' to merge those reference IDs into the larger expression. So, in my mind it makes sense that CapR has CapR-specific objects to support the functionality it uses. Back to the question of R6 use: the best example where I have made use of it is in the CohortIncidence package. It started off by a series of functions in the 2.x version, but 3.x introduced R6 classes (in case you are wondering what it was like to migrate). What I appreciate about R6 is not just the encapsulation and 'by reference' semantics of the object fields, but documentation is pretty nice too. Here's the documentation output of an IncidenceDesign, and here is the source code annotation that produces it. Note how the documentation annotations are very close to the implementation of the fields. The other reason I leveraged it is that jsonlite has some bad behavior when it comes to rendering single-element vectors: you can tell it to universally treat single element vectors as non-arrays, but it is such a thing (in JSON) to have a single element value that is actually an array:
The problem is the raw json clearly shows x is not an array, and Y is, but jsonlite will serialize both of those to either non-array or both array based on if you auto_unbox or not. I had a 'wonderful' discussion with the maintainer of json line in this issue. Needless to say, it didn't go the way I wanted. So, enter R6 classes, and implementing a Not having breakpoints in the R6 implementations hasn't been too much of a problem for me (I'm using these R6 classes as mostly data holders not function holders, so maybe i'm just dodging those issues). And I do have helper 'factory functions' (example: createincidenceDesign()) that can give you auto-complete prompts on the elements of the object, but this strategy does make it a bit of 'double work' such that if your object introduces a new field, you have to go back to your factory function and provide an argument for it. Anyways, happy to discuss my experiences with it on a call. I agree with the sentiment of Martijn that unless you see a big gain of using it, I wouldn't go back and switch. My primary need was controlling JSON serialization (and I liked the way it handled data validation via checkmate), so I went with it. |
Thanks guys! I did not take it as a critique @chrisknoll. I thought it was an interesting perspective to explore. With I agree @schuemie, maybe we can talk about this on the next HADES call or after the symposium as I know things get a bit hectic in the lead up. |
From #77 @chrisknoll has posed an interesting way of viewing
Capr
potentially through R6 via a pure OOP system. The purpose of this post is toa) understand the benefits of switching to
R6
forCapr
andb) consider the impact of
R6
within HADES...when/where to use it, why is it beneficial and ultimately does it even matterI am hoping to get some feedback or thoughts from others @chrisknoll, @azimov, @ablack3, @anthonysena, @schuemie. I know a post was made a while back referencing this same topic.
Thoughts on OOP in context of Capr
Currently
Capr
is written in S4 this was done for two (at this point, flimsy) reasons:R6
was not available at the time. R still usedReferenceClass
as its pure OOP system. I explored this in earlyCapr
development but opted to go theS4
routeS4
maintains the "feel" of R. Where asR6
is more amenable to programmers coming from the java and python worlds.S4
is a stricter version ofS3
which does a better job of working in a functional programming pipeline. WhenCapr
was originally created it was intended to heavily leverage the pipe operator%>%
however this proved to be rather awkward.Resources to give context to S4 and R6 can be found in chapters 14, 15, and 16 of Advanced R. While the strengths of S4 can be found here.
Starting at
Capr
v2, there was an intentional effort to transition the feel of the code away from piping and towards nested functions. The construction of cohorts would hence feel like building the ui of ashinyDashboard
. A dashboard requires a header, sidebar and body. Within each section the user provides context on the look by adding text, output, boxes etc. Similar to a cohort definition where the user is constructing sections of the definition...the entry, attrition, exit and era. An example of whatCapr
code should look like now is show below:If
Capr
were to switch toR6
the syntax would look more like this:Each class has a new object method where we describe its details. Classes can have further methods such as json coercion, sql builder, print statement, and plot functions. This would be quite nice. I am conscious of not overlapping too much with
CirceR
.Thoughts on R6
My main hesitation with
R6
is that it removes the "feel" of R. R works best inS3
when you take advantage of its "pipe-ability" and functional programming attributes. Forcing R code into a pure OOP system may tune out "tidy-verse" programmers trying to enter the OHDSI software space. Think there is legitimate fear here given the design of the DARWIN software which are quite "tidy-verse" heavy. Not that its any sort of competition.Having said this, I am beginning to realize the benefits of using
R6
particularly if we begin to think about complex objects (circe definition) and pipelines (strategus modules). Having a strictly encapsulated objects makes it easier to force a complex routine across a network.This post has gone way too long (of which I have accidentally deleted it twice) but maybe it starts a conversation to think about the HADES codebase as it becomes more and more complex :)
The text was updated successfully, but these errors were encountered: