Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ram tests #5520

Merged
merged 26 commits into from
Nov 15, 2022
Merged

Move ram tests #5520

merged 26 commits into from
Nov 15, 2022

Conversation

mattdowle
Copy link
Member

@mattdowle mattdowle commented Nov 10, 2022

Closes #5517
Moved the 10 tests as planned. Ending RSS down from 265 to 200. So moving in the right direction but will need to rinse and repeat.

      ID nTest   RSS diff
 1: 1147     7 164.9 26.9
 2: 1549     1 169.3 22.7
 3: 1835     1 183.5 17.9
 4:  646     1 119.8 17.8
 5:  861     1 136.5 16.4
 6: 1888     9 198.7 14.4
 7: 1760     2 167.7  8.2
 8: 1779    13 175.6  7.9
 9: 1035    55 142.6  6.2
10: 1378     5 149.1  4.5

@mattdowle mattdowle added the WIP label Nov 10, 2022
@mattdowle mattdowle added this to the 1.14.5 milestone Nov 10, 2022
@codecov
Copy link

codecov bot commented Nov 10, 2022

Codecov Report

Merging #5520 (3973983) into master (5affced) will decrease coverage by 0.76%.
The diff coverage is 0.00%.

❗ Current head 3973983 differs from pull request most recent head cc1a9b4. Consider uploading reports for the commit cc1a9b4 to get more accurate results

@@            Coverage Diff             @@
##           master    #5520      +/-   ##
==========================================
- Coverage   98.30%   97.53%   -0.77%     
==========================================
  Files          80       80              
  Lines       14795    14800       +5     
==========================================
- Hits        14544    14435     -109     
- Misses        251      365     +114     
Impacted Files Coverage Δ
R/test.data.table.R 92.06% <0.00%> (-2.51%) ⬇️
R/utils.R 100.00% <ø> (ø)
src/fsort.c 77.52% <0.00%> (-18.54%) ⬇️
src/forder.c 92.06% <0.00%> (-7.94%) ⬇️
src/fread.c 98.52% <0.00%> (-0.89%) ⬇️
R/data.table.R 99.74% <0.00%> (-0.16%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mattdowle
Copy link
Member Author

Current status 265 down to 160MB

      ID nTest   RSS diff
 1: 1750    36 141.1 14.5
 2: 2224     3 159.6 10.4
 3: 1300     1 115.4  6.9
 4: 1642    91 126.8  6.2
 5: 1826     2 147.0  5.8
 6: 1378     5 120.2  4.5
 7: 1850     1 146.9  3.7
 8: 2201     4 149.2  3.3
 9: 1241     1 106.5  3.0
10: 2114     8 145.9  2.7

inst/tests/tests.Rraw Outdated Show resolved Hide resolved
@mattdowle
Copy link
Member Author

Current status 265 down to 130MB

      ID nTest   RSS diff
 1: 1538     2 116.3  9.4
 2: 1642    91 125.4  6.4
 3: 1378     5 108.0  4.5
 4: 1639   143 119.0  3.1
 5:   69     7 100.8  2.0
 6: 1989     1 127.8  1.9
 7: 1223   728 103.5  1.7
 8: 1437    36 109.2  1.2
 9: 1888     9 126.1  0.9
10: 1461     1 106.9  0.6

mattdowle added a commit that referenced this pull request Nov 12, 2022
…) call that 1538 in #5520 bumped up against; perhaps combined with object.size. Anyway, both now improved.
@jangorecki
Copy link
Member

Plot could have ylim starting from 0. There is big progress in reducing memory usage but without looking at the y scale plot is not much different.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 12, 2022

But I am looking at the y scale. Also I'm sometimes looking quite closely at the gaps which is easier when the range of data fills the y axis.

I'm also looking at the time test.data.table(memtest=1) takes. That's perhaps a crude method to simulate a strained/overloaded/mismanaged server since every gc() will flush the cpu cache like a server that has too many tasks running concurrently which keeps context switching. That's a guess. Anyway that time is down from 13 mins to 9 mins on my laptop.

Btw, for anybody reading commits or comments now or in the future: I do know that when a change is made and I write "this saved 9MB" (for example) that that is not true. What is true is that the increase of 9MB in the plot of rss over time disappears from the top 10 when making that change, and that's the definition of "saving" that I mean: I'm writing roughly. All I know is that it's a change that lowers ram usage, that's all. The actual ram that is saved for that particular change is something less than 9MB (and perhaps significantly less); rss has just stepped up to accommodate the various actual heaps, that's all. These are tipping points and I'm using 9MB in the nominal sense (like 2x1 wood is not 2x1).

The other way would be to split up tests.Rraw into 2,000 test files, as Michael suggested. Each test would run in its own R session and therefore be isolated, only reach peak rss that that test needed, and then close R. Closing R would clear down memory so thoroughly that ram usage of tests couldn't affect future tests. That would lower peak rss usage, yes. But 2,000 separate R sessions would be started, which Michael said could then be parallelized. Well, now it's getting complicated. Now we have a new problem of 2,000 R startups to manage. Although peak rss of any individual R session would indeed be lower, the total rss of all 2,000 R sessions would be way more. We don't need any parallelization of tests currently, because it's just 1 R session that runs it. Further, one of the main reasons tests.Rraw is setup like it is as one big test file to run in one single R session, is that that is closer to what users do in the real world. Real world usage of data.table is to use it in lots and lots of different ways in the same R session. If one test affects future tests (e.g. a static variable in C code that doesn't get cleared up properly, and that's certainly happened before) then that's good that we find that impact. It is desirable for tests.Rraw to exhibit increasing rss because that's what happens in real user R sessions. Now, yes, most of the time I'm spending now is wasted in that I'm mostly just fixing the tests themselves to be more memory efficient. But that is what users do too, so I'm trying to use it like they do. And then, here and there, rarely, this is sometimes leading to improvements in data.table itself; e.g. the nested data.table() which should have been list() inside tables() came out of this exercise yesterday.

I hope that explains why I think tests.Rraw has some benefits the way it is, that have not been sufficiently appreciated.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 13, 2022

1538 (top ID in last table above) was due to tables() so that was relieved by #5524 and its follow up without changing the test.

Then I was hoping #5501 might improve test 1642 (current top ID) as that does construct calls albeit with as.call and eval rather than do.call, and not a data.table() call. But didn't get lucky. I've left the tracing in for now and it's really odd because the rss steps up on mere entry to nqjoin_test before it has done any work at all (and yes, there's a trace right before its call so I don't think I've missed something before its call). So I'm now thinking that functions defined within functions are something to do with memory usage, which the non-equi tests 1641-1652 do. I was pretty sure it was going to be the lapply through rows but it isn't. Or at least, the step up is earlier on entry currently. And then when that is resolved, maybe it will then be the lapply. But if function definitions within functions turns out to be significant, then we should sweep the entire codebase to see where else in the internals we do that. Functions are byte compiled (we have that as TRUE in DESCRIPTION) but I don't know about functions within functions. As a test file this isn't byte compiled though, so it could be that functions within functions just in not-compiled R code here is in play in 1642.

      ID nTest   RSS diff
 1: 1642    91 126.1  9.9
 2: 1378     5 108.4  4.5
 3: 1538     2 111.9  4.5
 4: 1639   143 116.2  3.5
 5:   69     7 101.0  2.0
 6: 1223   728 103.9  1.8
 7: 1989     1 128.1  1.6
 8: 1437    36 109.8  1.4
 9: 1888     9 126.5  1.3
10: 1982     2 126.5  1.2

For future reference if anyone does this in future, yes even when comparing the y-axis to last one above, there is actually no difference in the plot this time. When the top ID is resolved and disappears from the top 10, the ending rss can still be unchanged. That's because, as I think about it, you just sort of remove the head test in that section, revealing the next test behind it which is now attributed to that step up in rss. You just have to keep tackling the top IDs until after 3 - 10 the final rss eventually does drop. Perhaps as the impact of the top IDs keeps dropping, it'll get harder and harder to do this. Hopefully before that point, the ending rss will be good enough.

@mattdowle
Copy link
Member Author

mattdowle commented Nov 13, 2022

Indeed, moving those functions outside nqjoin_test in 0f8e2d4 moved 1642 from top id with 9.9MB to 9th with 1.4MB. It didn't change the ending rss which is still 127MB but I'll keep the change and move on to the new top ID and keep going.

      ID nTest   RSS diff
 1: 1982     2 126.1  6.5
 2: 1378     5 108.4  4.6
 3: 1538     2 111.9  4.6
 4: 1639   143 116.2  3.6
 5: 1647    14 120.1  2.5
 6:   69     7 101.1  2.0
 7: 1223   728 103.8  1.8
 8: 1437    36 109.8  1.4
 9: 1642    91 117.6  1.4
10: 1989     1 127.4  1.3

@mattdowle
Copy link
Member Author

Ending RSS now 120.6MB

      ID nTest   RSS diff
 1: 1639   143 113.5  6.4
 2: 1378     5 108.1  4.5
 3: 1989     1 119.6  2.8
 4: 1647    14 116.7  2.1
 5:   69     7 100.8  2.0
 6: 1223   728 103.6  1.8
 7: 1437    36 109.7  1.6
 8: 1942    12 116.4  1.4
 9: 1641    14 114.6  1.1
10: 1888     9 116.0  1.0

1639 has 143 parts. I don't see anything obviously large there. Will trace the parts ...

@mattdowle
Copy link
Member Author

mattdowle commented Nov 14, 2022

Another thing about changing the y-axis to start from zero, is that, afaik, ylim= needs the minimum and maximum. But here we would just want to set the lower bound to be 0 and let the function look at the data to find the upper bound. But that's not how it works: you have to provide the maximum to ylim too. So either you make a crude guess for the maximum based on the existing plot and hard code that number, or you call max() on the data again (which introduces scope for a bug by using the wrong object there) and then there's doubt too about if you need to add a little bit to the max so that the maximum observation definitely doesn't get chopped off. Does anyone know how to tell plot just that the ylim should start at zero? (Now that I'm pretty much done, it does now make sense to show a plot with y starting at 0 too.)

@MichaelChirico
Copy link
Member

I don't quite follow. if you're doing plot(x,y) you have y already, then plot(x, y, ylim=c(0, max(y))) is pretty normal and IIRC equivalent for the upper bound to the default

@mattdowle
Copy link
Member Author

Getting to a point of diminishing returns now. I'm scratching my head way too much over every test now. Things like sapply(sapply(sapply(...))) in tests generate a lot of intermediate objects but they should be small on these small datasets and should be released anyway (and maybe they are). It is still troubling why the steps up in RSS happen towards the end but at least they're under 5MB now. Two things it could be is the leak in fread (PR exists), and the finalizer on data.table objects. I'll do those next, see if it helps here, and either way get this on CRAN. Since it's still possible the CRAN Windows fail is not even related to tests.Rraw. If it still fails on CRAN Windows then we won't know whether this work helped but was still insufficient, or whether it's not just not tests.Rraw at all. Hoping we'll get lucky and it is tests.Rraw.

Now starting R with $ R_GC_MEM_GROW=0 R when running memtest. It doesn't make much difference but should use this option if ever doing this exercise again in future just to avoid bigger jumps up getting in the way.

Same plot but with ylim starting at 0 on the right

      ID nTest   RSS diff
 1: 1378     5 108.3  4.5
 2: 1639   149 111.2  4.0
 3: 1989     1 118.4  3.0
 4: 1647    14 114.9  2.2
 5:   69     7 101.0  2.1
 6: 1223   728 103.7  1.7
 7: 1641    14 112.7  1.5
 8: 1942    12 115.0  1.4
 9: 1437    36 109.5  1.2
10: 1834     2 113.9  1.2

@mattdowle
Copy link
Member Author

mattdowle commented Nov 14, 2022

I don't quite follow. if you're doing plot(x,y) you have y already, then plot(x, y, ylim=c(0, max(y))) is pretty normal and IIRC equivalent for the upper bound to the default

It's where y is not y but some complicated expression. Then you end up repeating the expression; e.g. a table name, a filter, and column name. Then that's more maintenance and potential for bugs. What I would like is to avoid that by more simply writing ylim.min=0 and be done.

IIRC equivalent for the upper bound to the default

That's what I did above. Notice how the 120 tick mark appears on the right; ylim was supplied with the explicit max(expression) in that case. Whereas on the left, the 120 tick mark doesn't appear when the function is left to calculate the ylim.max itself.

@MichaelChirico
Copy link
Member

plot(x, y <- {...}, ylim=c(0, max(y))) ?

@mattdowle
Copy link
Member Author

mattdowle commented Nov 14, 2022

adding support for ylim.min, yes, so it could be other minimums not just 0 while leaving max up to the function which I'm not sure is actually just max(y), doesn't it depend on tick mark settings, margins, and things I don't want to know about. Also, I'm not using y; I'm just providing the data to x and that's it.

If the points are made larger, I think ylim[2] might take that into account when the function calculates it for itself. Otherwise the big points get chopped off if you pass ylim[2] as max(data) yourself. Not sure. What I seem to remember.

@mattdowle
Copy link
Member Author

fread leak fix didn't help here. I guess that's because fread didn't leak when the data is small. Certainly under 100 rows it didn't leak, and maybe under 1,000 too because if the number of rows is less than the sample size then it doesn't need to estimate because the sample already reads all the rows and knows exactly how many rows there are before allocating. We didn't have any big test files anyway, and any large constructions have been moved to benchmark.Rraw.

image

      ID nTest   RSS diff
 1: 1378     5 108.0  4.4
 2: 1639   149 111.0  4.0
 3: 1989     1 118.1  2.7
 4: 1647    14 114.5  2.1
 5:   69     7 100.8  2.0
 6: 1223   728 103.5  1.7
 7: 1641    14 112.4  1.4
 8: 1437    36 109.3  1.2
 9: 1942    12 114.9  0.8
10: 1461     1 107.0  0.6

@mattdowle mattdowle removed the WIP label Nov 15, 2022
@mattdowle mattdowle merged commit 2d2892c into master Nov 15, 2022
@mattdowle mattdowle deleted the move_ram_tests branch November 15, 2022 11:06
mattdowle added a commit that referenced this pull request Nov 15, 2022
…ble.h thanks Jan, plotting in dev memtest need not require imports of standard functions (strange)
mattdowle added a commit that referenced this pull request Nov 16, 2022
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce or move top 10 tests by RAM usage
3 participants