-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memory leak #2866
Comments
I'll add that the call to # NOTE: sum(get(aref)) now sum(a)
require(data.table)
n <- 2e6
df <- data.frame(a=rnorm(n),
b=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]),
c=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]))
dt <- setDT(df)
print(pryr::mem_used())
aref <- "a"
for(i in 1:10) {
ff <- lapply(1:5, function(i) {
dt2 <- dt[,list(sumA=sum(a)),by=b][,c:=letters[i]]
dt2
})
f <- rbindlist(ff)
rm("f")
gc()
print(pryr::mem_used())
}
# no memory leak
#66.6 MB (repeated) but this causes the memory leak # NOTE: no fff function
require(data.table)
n <- 2e6
df <- data.frame(a=rnorm(n),
b=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]),
c=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]))
dt <- setDT(df)
print(pryr::mem_used())
aref <- "a"
for(i in 1:10) {
ff <- lapply(1:5, function(i) {
dt2 <- dt[,list(sumA=sum(get(aref))),by=b][,c:=letters[i]]
dt2
})
f <- rbindlist(ff)
rm("f")
gc()
print(pryr::mem_used())
}
# 66.6 MB
# 170 MB
# 273 MB
# 375 MB Interestingly, running the first code chunk after the second and it too has a memory leak, it just can't get it started. |
Using sessionInfo
Example:Memory leak only occurs when
|
I don't see the leak with these examples on Linux with latest r-devel (well, as of 2nd May, and we've been seeing this problem before then). Looks likely to be Windows-only then, consistent with #2767. That will mean debugging on Windows which I've never done before. I'm surprised a leak like this would be Windows-only as this is more to do with R's internal heap which is platform independent as far as I know. |
We first saw this a few days ago, if that helps. |
@pdbailey0 Yes that helps. Have you been routinely using R-devel daily snapshot for more than a few weeks every day, and then only saw the problem a few days ago? |
@nguyentr17 did not see this problem with 3.5.0 pre-release, and I did with 3.6.0 pre-release, do we know whey they switched over? It occurs to me I don't know how often she updated her R-devel. |
Ok thanks. I updated R-devel on my laptop to 2018-05-10 (r74708) but still no leak on Linux. I used your tests verbatim and increased the loops to 100 too. We'll just have to bite the bullet and use Windows to trace it... |
Can reproduce on Windows R-devel. |
When I tried to submit my package, the Windows version had an error that I believe indicates data.table is now not available on CRAN's Windows install.
|
I was successfully able to update a package on 2018-05-13. I suspect that you encountered a race condition: try resubmitting. |
@HughParsonage, you were right, I resubmitted and the DLL was present. Unfortunately, the memory leak lead to CRAN running our of memory and me failing a test. I posted asking for help about the immediate issue of my package submission on R-package-devel. |
Thanks @pdbailey0 : could you provide a link to the package you're attempting to submit? I couldn't see it on your GitHub profile. To clarify, I don't think this is a problem with your package. |
Hint: Running with Calling |
@ltierney why it is non consistent across OSes? |
@jangorecki its a race condition. The risk exists on all platforms. When it bites varies from run to run within platforms, How likely it is to bite varies between platforms with performance differences in particular in thread implementations. On my Ubuntu system where require(data.table)
n <- 20
df <- data.frame(a=rnorm(n),
b=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]),
c=factor(rbinom(n,5,prob=0.5),1:5,letters[1:5]))
dt <- setDT(df)
v <- gc()
print(sum(v[, 2]))
fff <- function(aref) {
ff <- lapply(1:5, function(i) {
dt2 <- dt[,list(sumA=sum(get(aref))),by=b][,c:=letters[i]]
dt2
})
return(rbindlist(ff))
}
for(i in 1:100) {
f <- fff("a")
rm("f")
v <- gc()
print(sum(v[, 2]))
} |
Many thanks @ltierney ! I see DATAPTR usage within parallel region in |
I can confirm that using |
INTEGER, REAL, etc call DATAPTR so are also unsafe. I believe it was
INTEGER that was involved in the specific report. It would be safest
to get any calls into R out of parallel regions if you can.
…On Tue, 15 May 2018, Matt Dowle wrote:
Many thanks @ltierney ! I see DATAPTR usage within parallel region in
reorder.c. I'll start there ...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[ADvaVBgXTt6rdXNrdJXZoM1cY3xhgidNks5tyz6lgaJpZM4T78v2.gif]
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
|
That commit just now fixes the reprex for me locally. This one was indeed INTEGER inside parallel region on an ALTREP. In If this works I'll sweep through all parallel regions similarly. I saw you reduced |
On Tue, 15 May 2018, Matt Dowle wrote:
That commit just now fixes the reprex for me locally. This one was indeed
INTEGER inside parallel region on an ALTREP. In subset.c not reorder.c as it
turned out. For now, it just calls duplicate() on any ALTREP it is passed. I
see that duplicate(x) calls duplicate1(x, deep=TRUE) so I'm guessing I can
rely on that to expland an ALTREP (since duplicate1 is hidden from package
use).
If this works I'll sweep through all parallel regions similarly.
I saw you reduced n to 20 to reproduce this on Ubuntu. n was 2e6 in the
Windows reprex orginally reported. Did you make n smaller so that the
vectors would be on the R heap? I think I recall you telling me before that
Linux R now allocates large vectors on the OS heap but Windows R still puts
them on the R heap. If so, that would be a contributing factor of the
apparent Windows-only nature of this particular reprex at the 2e6 size?
Nothing so sophisticated -- I was just tinkering to see if I could
shift the workload timings to make the global updates happen in the
wrong order. The version I posted on github now no longer fails for
me on Ubuntu -- that's the curse of race conditions. But this one
crashes and burns pretty badly and reliably on Ubuntu with two threads
(at least at the moment); it's fine single threaded. This also fails
for me in R-patched, also on Ubuntu.
```r
require(data.table)
n <- 20
mkDT <- function() {
df <- data.frame(a = 1:n,
b = 1:n,
c = 1:n,
d = 1:n,
e = 1:n,
f = 1:n,
h = 1:n,
i = 1:n)
setDT(df)
}
print(sum(gc()[, 2]))
fff <- function(aref) {
dt <- mkDT()
lapply(1:5, function(i) {
dt2 <- dt[,list(sumA=sum(get(aref))),by=b][,c:=letters[i]]
dt2
})
}
for(i in 1:100) {
fff("a")
print(sum(gc()[, 2]))
}
```
Best,
luke
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[ADvaVJwGR4gm37L0TqXo64tT8jj9KL-Kks5ty17dgaJpZM4T78v2.gif]
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
|
Huge thanks Luke! We would have been stuck for weeks (perhaps longer) if it wasn't for you. This one fails for me too but gracefully with It seems that the test suite is passing AppVeyor R-devel now, for the first time in a month or so!!! Time will tell. I'll tidy up subset.c and merge it to fix this particular reprex so that folks can start testing 1.11.3. I'll continue to sweep the other parallel regions. |
On Tue, 15 May 2018, Matt Dowle wrote:
Huge thanks Luke! We would have been stuck for weeks (perhaps longer) if it
wasn't for you.
This one fails for me too but gracefully with DT contains an altrep. I just
added that error in the last commit. But that error() call is within a
parallel region so it's very possible crash and burn too. It was just a
temporary line (I know error isn't thread-safe) and I'll address that now.
It seems that the test suite is passing AppVeyor R-devel now, for the first
time in a month or so!!! Time will tell. I'll tidy up subset.c and merge it
to fix this particular reprex so that users can start testing it's fixed in
1.11.3. I'll continue to sweep the other parallel regions.
Just another quick note: You mentioned using duplicate for ALTREP's.
duplicate isn't guaranteed to return a standard vector -- ALTREP
classes can define their own methods that do something else. Rather
than test for an ALTREP and duplicate a better approach is to move
calls to REAL and anything else that might not be thread safe such out
of the parallel region or put them in a critical section.
Best,
luke
…
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[ADvaVFmo-9G5aIk9G4nQMuiKVeZqdyagks5ty2mmgaJpZM4T78v2.gif]
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
|
The trouble with moving them out is I'll then often have to allocate a new vector to hold the vector of safe pointers (what INTEGER and REAL return). For data.tables of 10,000 or more columns there's a RAM usage concern but it's also about the extra code required. It might be faster though I suppose to do that, since iiuc, INTEGER and REAL are slightly heavier now. As an option for package use, is there a reliable way to expand an ALTREP vector? |
I don't think that the memory allocation for a vector of pointers is an issue. A pointer is something like 64 bit maximum. So, What concerns me more is
(https://gmbecker.github.io/jsm2017/jsm2017.html). So probably this will not be a problem since they explicitely don't need |
On Wed, 16 May 2018, MarkusBonsch wrote:
I don't think that the memory allocation for a vector of pointers is an
issue. A pointer is something like 64 bit maximum. So, 1e6 columns would
require ~8MB if I am not mistaken. In case the data.table contained all
integer columns, this would correspond to the memory needed by two
additional rows (assuming 32bit integer). I think, this is definitely
acceptable.
What concerns me more is CHARSXPs and VECSXPs. Is it even possible to get
rid of SET_STRING_ELT and SET_VECTOR_ELT and replace them by native C code?
But then I read that the ELT methods do
Set_elt - Set the value of element i without using dataptr
You can only modify STRSXP and VECSXP through the ELT functions since
the write barrier needs to see the modifications.
More info is here:
https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
So probably this will not be a problem since they explicitely don't need
DATAPTR?
They may still allocate. At this point, DATAPTR, STRING_ELT, and
SET_STRING_ELT are functions that now may allocate. There may be more
as we go forward.
Anything that may allocate may signal an error because the allocation
failed. DATAPTR can also signal an error because a particular ALTREP
class can't provide a pointer to a contiguous address range. There may
be some other functions that now might signal an error but might not
have previously.
Anything that might modify internal R data structures has to be
accessed by only one thread at a time. Anything that might signal an
error has to be called from the main thread or a jump will segfault.
Even a jump from the main thread could mess up OMP thread management
code if it happens inside a parallel region. So basically the only
safe option is to move everything that calls into R out of the
parallel regions.
I would suggest to review the C code in the long term to move R stuff as far
as possible out of parallel code. I will be happy to help on that. On short
notice, expanding ALTREP seems like a fast and easy solution if reliably
possible.
The notion of expanding isn't general enough. ALTREP is meant to allow
objects where obtaining a pointer to contiguous address space isn't
feasible (e.g. a large distributed array). Code that requires a
DATAPTR will fail on these things but code that has been written to
work a chunk at a time when a DATAPTR isn't available will work.
Best,
luke
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the
thread.[ADvaVHF7pM3hAcQoCxXuIFzb3lxVYpJiks5ty9sJgaJpZM4T78v2.gif]
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney@uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
|
For anyone looking for a work-around, I included this in using_r_devel <- grepl(pattern = "devel", x = R.version$status)
if (using_r_devel) {
requireNamespace("data.table")
data.table::setDTthreads(threads = 1L)
} It passed Winbuilder. |
Markus: I shouldn't have written RAM usage. What I had in mind more specifically was cache, not RAM. My laptop has 32KB of L1d, 256K of L2, 6MB L3. When we use multiple threads, that same cache is shared by all the threads (and the OS and other processes). So divide those cache figures by 4 if data.table is using 4 threads. Cache isn't typically much larger on servers, so divide cache by 32 if 32 threads on a server. if we create a new 10,000 item hop vector, then that's 78KB of weight sloshing through cache. Each use of that vector will need that element's cache line (64 bytes on x86). However, thinking about it, in data.table's case most often it wouldn't be an extra hop vector with the base SEXP vector sloshing through cache as well, but rather the parallel code would use the hop vector instead of the base SEXP one. That should be fine then and I agree ncol*8 of working bytes is not a concern even for a 1e6 column dt (8MB). It's certainly cleaner and likely faster to use raw C pointers inside parallel regions, so I'm very comfortable with that. Yes any help much appreciated. I'm looking too so we should coordinate. Thanks Luke - understood and thanks again. The only place I have deliberately violated the write-barrier on STRSXP and VECSXP is in For those reading that are wondering what the 'write barrier' means (it took me a while to understand it too, with help from Luke in the past), and roughly speaking, SET_STRING_ELT doesn't just assign the pointer (to the CHARSXP in global character cache) to the vector, but it also increments the reference count of the string being assigned and decrements the reference count of the string being replaced. The reference count is on the single global CHARSXP (i.e. shared between threads). So if two threads do that at the same time, the reference count can get messed up (a race). Because an increment (e.g. |
@mattdowle I thought only |
@ltierney and @mattdowle Thank you so much for your explanations and material, I learned a lot. |
Hugh: it's a bit more: |
sorry if you expected this, but I was about to test my package vs this new version. First I grabbed master, built it (in 3.5.0) and installed it in 3.6.0 and still got a substantial increase in memory size from my test case (in the OP).
and from Luke's example (longer output not shown here). |
@pdbailey0 Thanks for checking. Could you use the binary .zip that AppVeyor built please and reboot your Windows machine as per the paragraphs here. Suspect it's an upgrade/dll issue. Also, what does test.data.table() return please? |
OK, great, it works now, thanks! It also works great in my package. |
Hi, sorry, I just read your 1.14.4 NEWS and wanted to give credit where credit is due. @nguyentr17 (Trang Nguyen) came up with the minimal working example. Sorry that I wasn't clearer about that above. Can you please attribute this to her and not me? |
Sure. Hope that edit cab7c11 just now is ok. Many many thanks @nguyentr17 ! |
Thanks for fixing it! |
Only when using the current 3.6.0 nightly build, I found what appears to be a memory leak in data.table
#
Minimal reproducible example
I would expect the printed memory usage to stay flat (since
f
is removed andgc()
is called) but it goes up. Notice that it always takes some time. adding[,c:=letters[i]]
makes it cause problems faster but is not necessary (you may have to increase 10 to 200 or something)example output
#
Output of sessionInfo()
on SO
This is a problem on the CRAN version and the Git Hub version of data.table.
The text was updated successfully, but these errors were encountered: