-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move ram tests #5520
Move ram tests #5520
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5520 +/- ##
==========================================
- Coverage 98.30% 97.53% -0.77%
==========================================
Files 80 80
Lines 14795 14800 +5
==========================================
- Hits 14544 14435 -109
- Misses 251 365 +114
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…) call that 1538 in #5520 bumped up against; perhaps combined with object.size. Anyway, both now improved.
Plot could have ylim starting from 0. There is big progress in reducing memory usage but without looking at the y scale plot is not much different. |
But I am looking at the y scale. Also I'm sometimes looking quite closely at the gaps which is easier when the range of data fills the y axis. I'm also looking at the time Btw, for anybody reading commits or comments now or in the future: I do know that when a change is made and I write "this saved 9MB" (for example) that that is not true. What is true is that the increase of 9MB in the plot of rss over time disappears from the top 10 when making that change, and that's the definition of "saving" that I mean: I'm writing roughly. All I know is that it's a change that lowers ram usage, that's all. The actual ram that is saved for that particular change is something less than 9MB (and perhaps significantly less); rss has just stepped up to accommodate the various actual heaps, that's all. These are tipping points and I'm using 9MB in the nominal sense (like 2x1 wood is not 2x1). The other way would be to split up tests.Rraw into 2,000 test files, as Michael suggested. Each test would run in its own R session and therefore be isolated, only reach peak rss that that test needed, and then close R. Closing R would clear down memory so thoroughly that ram usage of tests couldn't affect future tests. That would lower peak rss usage, yes. But 2,000 separate R sessions would be started, which Michael said could then be parallelized. Well, now it's getting complicated. Now we have a new problem of 2,000 R startups to manage. Although peak rss of any individual R session would indeed be lower, the total rss of all 2,000 R sessions would be way more. We don't need any parallelization of tests currently, because it's just 1 R session that runs it. Further, one of the main reasons tests.Rraw is setup like it is as one big test file to run in one single R session, is that that is closer to what users do in the real world. Real world usage of data.table is to use it in lots and lots of different ways in the same R session. If one test affects future tests (e.g. a static variable in C code that doesn't get cleared up properly, and that's certainly happened before) then that's good that we find that impact. It is desirable for tests.Rraw to exhibit increasing rss because that's what happens in real user R sessions. Now, yes, most of the time I'm spending now is wasted in that I'm mostly just fixing the tests themselves to be more memory efficient. But that is what users do too, so I'm trying to use it like they do. And then, here and there, rarely, this is sometimes leading to improvements in I hope that explains why I think tests.Rraw has some benefits the way it is, that have not been sufficiently appreciated. |
1538 (top ID in last table above) was due to Then I was hoping #5501 might improve test 1642 (current top ID) as that does construct calls albeit with
For future reference if anyone does this in future, yes even when comparing the y-axis to last one above, there is actually no difference in the plot this time. When the top ID is resolved and disappears from the top 10, the ending rss can still be unchanged. That's because, as I think about it, you just sort of remove the head test in that section, revealing the next test behind it which is now attributed to that step up in rss. You just have to keep tackling the top IDs until after 3 - 10 the final rss eventually does drop. Perhaps as the impact of the top IDs keeps dropping, it'll get harder and harder to do this. Hopefully before that point, the ending rss will be good enough. |
Indeed, moving those functions outside
|
Ending RSS now 120.6MB
1639 has 143 parts. I don't see anything obviously large there. Will trace the parts ... |
Another thing about changing the y-axis to start from zero, is that, afaik, |
I don't quite follow. if you're doing plot(x,y) you have y already, then plot(x, y, ylim=c(0, max(y))) is pretty normal and IIRC equivalent for the upper bound to the default |
Getting to a point of diminishing returns now. I'm scratching my head way too much over every test now. Things like Now starting R with Same plot but with ylim starting at 0 on the right
|
It's where
That's what I did above. Notice how the 120 tick mark appears on the right; ylim was supplied with the explicit max(expression) in that case. Whereas on the left, the 120 tick mark doesn't appear when the function is left to calculate the ylim.max itself. |
plot(x, y <- {...}, ylim=c(0, max(y))) ? |
adding support for If the points are made larger, I think ylim[2] might take that into account when the function calculates it for itself. Otherwise the big points get chopped off if you pass ylim[2] as max(data) yourself. Not sure. What I seem to remember. |
|
…ble.h thanks Jan, plotting in dev memtest need not require imports of standard functions (strange)
Closes #5517
Moved the 10 tests as planned. Ending RSS down from 265 to 200. So moving in the right direction but will need to rinse and repeat.