-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simplistic example of base R v. dplyr #10
Comments
You've never heard of tapply()? |
I haven’t heard of tapply() and also fail to see how it is a relevant response given that you don’t seem to use it in the example referenced above. Maybe providing a counter example would prove to be more pedagogically useful rather than responding with a question about an obscure function? |
The issue that you used a trivial example and did not represent piping well still holds. Instead of addressing the issue you're trying make me feel bad for not using a different approach. For what it's worth I used tapply before I had even heard of the tidyverse and I clearly stated that there are multiple solutions in base R. Here is a link to many examples of base R and tidyverse code comparisons, in general you can see that the code is more readable and pipes are useful (which becomes even more apparent when combining several functions to clean a dataset): https://tavareshugo.github.io/data_carpentry_extras/base-r_tidyverse_equivalents/base-r_tidyverse_equivalents.html Also, you state that debugging is harder with pipes - this is not true since you can easily run smaller parts of piped code. |
As I see this, @matloff was only addressing the extensive use of pipes rather comparing the whole tidyverse vs base R (in this particular example at least), while you provided a specific example which contains grouping operations- which I think that most of us will agree- is not base R strongest side. On the other hand, if we stick with dplyr (tidyverse) vs. base R, we could also bring many examples where base R is much simpler than the dplyr idiom- you just conveniently picked one that matches the point you are trying to make, @ljanda |
@DavidArenburg, |
@DavidArenburg my example doesn't just include grouping operations - it also has the kable and kableExtra styling to render a nice table, showing that you can pipe the content of the grouping into the table styling functions whereas without the pipes you have to stop and assign multiple times. My point was that @matloff used a trivial example rather than something more meaty that actually shows a difference between the tidyverse and base R. I could have given even more complicated examples that used the full suite of dplyr functions and piping (eg selecting a few variables, mutating them, grouping, then summarizing, without having to stop to assign once), but here I gave a fairly simple one. |
@ljanda: Your point about debugging is exactly what I am saying: It's better to break things up as in my example. |
@ljanda I don't see anything special with making intermediate assigns. And I don't think pipes are really related to tidyverse anyway. You can pipe base R and data.table too if you really want to. And neither I think (my own opinion) that pipes are bad and some times they are even useful, but after spending about 5 years seeing all kind of questions and answers on StackOverflow, I see that in general, pipes are being abused by tidyverse users all the time. For instance, I find this absolutely ridiculous. |
@ljanda, thanks for the Tavares reference. The example is indeed one in which tapply is much clearer, more compact and more straightforward. I've added it to my essay. |
With debugging you are breaking things up regardless or whether you're running part of a pipe or parts of unpiped code |
Exactly! You have to revert to base-R to debug. Why not stay there? You'd
still get the "read left to right" benefit.
…On Mon, Jul 15, 2019, 6:00 PM Ludmila Janda ***@***.***> wrote:
With debugging you are breaking things up regardless or whether you're
running part of a pipe or parts of unpiped code
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10?email_source=notifications&email_token=ABZ34ZJLLDKXD7BOGTHYBMLP7SNKPA5CNFSM4ICMREPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZ6E34Q#issuecomment-511462898>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABZ34ZPVLE3TFFLGDHDG333P7SNKPANCNFSM4ICMREPA>
.
|
You don't have to revert to base R, you can just run part of the pipe. |
@DavidArenburg
That's an example from someone I work with. It isn't representative of the population, but it also highlights an issue with users who aren't terribly versed in programming. |
@wbuchanan this is a great example and a really good point about multiple assignment consuming more memory |
@ljanda magrittr wasn't originally part of tidyverse- it was basically contributed to it: tidyverse/magrittr@cf2e33f Regarding your debugging strategy, it basically means that you need to rerun your whole code over and over after adding each line which will probably be time/memory consuming. |
@wbuchanan I think in your example it is better to persist each step like your co-worker did instead of piping it up which would probably get an out of memory error. Also, if you work with data.table, each merge would update the data in place and both save speed and memory and avoid piping. Finally, if someone would pipe all these join and would like to pipe additional steps, he would need to run all of these join each time he would add additional step which would be time/memory mess. All in all (if we ignore the code cleanliness), piping would probably make it worse (in my opinion a least). |
Feels like this discussion is missing a few things... First - even the example by @ljanda is quite simplistic and can be achieved with base in an easier way:
Second - if pipe-like syntax (left to right) is more readable, this too can be achieved within base:
This would also allow to stop in the middle of the "pipeline" and continue from where you left. So the way I see it the discussion about the advantages and disadvantages of pipe could be compared with this style of syntax instead. Especially because all the advantages proposed seem to only be about readability so far. |
@KKPMW That said, there is still some potential overhead differences from reassigning values to the existing object in memory. While I don’t agree with everything @DavidArenburg mentioned above, I do agree that there are definitely cases where data.table is definitely the right solution. What I’m less certain about is whether the same memory benefit is achieved if the object increases in memory consumption along the way. For example, if the data set were arbitrarily small and the aggregation result was several times larger (say something analogous to a multidimensional cube in the world of relational databases), would it still perform as well or would it run into memory corruption issues or overhead associated with reallocating memory, since the pointers would no longer provide access to the necessary amount of memory. |
Based on a few benchmarks I am getting that with small objects the Small object:
Large object:
Of course haven't tested this thoroughly. But a few advantages of |
@KKPMW try benching with |
@KKPMW |
Tried
|
@ljanda:
No Even simpler (and still no piping!):
|
You write:
The Tidyverse also makes heavy use of magrittr pipes, e.g. writing the function composition h(g(f(x))) as
Again, the pitch made is that this is "English," in this case reading left-to-right. But again, one might question just how valuable that is, and in any event, I personally tend to write such code left-to-right anyway, without using pipes:
This simplistic example does not demonstrate the pain point of stopping and assigning rather than piping and the improved readability that follows, as demonstrated below:
As you can see, the base R approach requires a deeper understanding of functions, the ability to use a less clear syntax, and the need to keep assigning rather than piping.
The text was updated successfully, but these errors were encountered: