-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: segfault on large host arrays with timeit #184
Comments
Not specific to usage in |
Ah yeah, at |
Providing a smaller reproducer: import pykokkos as pk
import numpy as np
@profile
def test_gh_184():
arr = np.zeros(10000)
view = pk.from_numpy(arr)
for i in range(50000):
pk.sqrt(view)
if __name__ == "__main__":
test_gh_184() And profiling the memory usage with
I hypothesized that PyKokkos is not properly freeing the new
|
Simplifying even more, I think this can be summarized as:
For example: import pykokkos as pk
import numpy as np
from tqdm import tqdm
@pk.workunit
def nothing(tid: int, view: pk.View1D[pk.double]):
view[tid] = view[tid]
def test_gh_184():
for i in tqdm(range(500000)):
v = pk.View((10_000,), pk.float64)
pk.parallel_for(1, nothing, view=v)
if __name__ == "__main__":
test_gh_184() vs. hoisting the view to reduce the memory punishment (this also runs orders of magnitude slower...): import pykokkos as pk
import numpy as np
from tqdm import tqdm
@pk.workunit
def nothing(tid: int, view: pk.View1D[pk.double]):
view[tid] = view[tid]
def test_gh_184():
v = pk.View((10_000,), pk.float64)
for i in tqdm(range(500000)):
pk.parallel_for(1, nothing, view=v)
if __name__ == "__main__":
test_gh_184() |
This reverts commit 9fa8784.
did some testing ... to me this looks like a classic python garbage collector problem. the gc is not obliged to instantly delete an object if it goes out of scope (this deletion would be costly, so why do it now). But it also is not allowed to reuse the allocated memory of an object on the death list as long as the object was not deleted yet. Furthermore, afaik it is not even obliged to reuse memory already allocated at all (for externally defined types this might even be dangerous or lead to memory corruption). Thus my assessment is: the gc sees the view |
I’m confused, do you see the same leak with NumPy; I think the gc is pretty
fast usually so I still think PyKokkos is a problem here.
…On Thu, Mar 16, 2023 at 07:36 JBludau ***@***.***> wrote:
did some testing ... to me this looks like a classic python garbage
collector problem.
the gc is not obliged to instantly delete an object if it goes out of
scope (this deletion would be costly, so why do it now). But it also is not
allowed to reuse the allocated memory of an object on the death list as
long as the object was not deleted yet. Furthermore, afaik it is not even
obliged to reuse memory already allocated at all (for externally defined
types this might even be dangerous or lead to memory corruption).
Thus my assessment is: the gc sees the view v that got created in every
iteration but does not delete it ... bc why should it even do this now.
This results in every iteration allocating new space for a view as this
guarantees no memory corruption. Thus you see the time necessary for
allocation of a new v in every iteration in the triangular plot.
If the view is created before the loop, the gc does even less ... it will
only delete the one existing v at exit of the function that does the
iteration.
—
Reply to this email directly, view it on GitHub
<#184 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4JOZSGJ46EXJAL4GZOZTLW4MJPDANCNFSM6AAAAAAV4N72QE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I still maintain that something is wrong here, and regardless of the details, we should be competitive with NumPy and other array API providers for usage of tools like Consider, for example, that if we don't send the view to the workunit, the import pykokkos as pk
import numpy as np
from tqdm import tqdm
@pk.workunit
def nothing(tid: int, view: pk.View1D[pk.double]):
view[tid] = view[tid]
def test_gh_184():
for i in tqdm(range(500000)):
v = pk.View((10_000,), pk.float64)
#pk.parallel_for(1, nothing, view=v)
if __name__ == "__main__":
test_gh_184() And if you add the workunit back, suddently the garbage collector is lazy? import pykokkos as pk
import numpy as np
from tqdm import tqdm
@pk.workunit
def nothing(tid: int, view: pk.View1D[pk.double]):
view[tid] = view[tid]
def test_gh_184():
for i in tqdm(range(500000)):
v = pk.View((10_000,), pk.float64)
pk.parallel_for(1, nothing, view=v)
if __name__ == "__main__":
test_gh_184() And if you switch the loop to use a NumPy ufunc, suddently the import pykokkos as pk
import numpy as np
from tqdm import tqdm
def test_gh_184():
for i in tqdm(range(500000)):
v = np.ones(10_0000, dtype=np.float64)
np.sqrt(v)
if __name__ == "__main__":
test_gh_184() |
I've been looking into it, and I think the cause is the kernel argument caching in
I'm still figuring out the details, but @tylerjereddy could you try commenting out that line and see if it fixes your problem? |
@NaderAlAwar unfortunately, I'm seeing identical memory saturation with that change own its own locally. |
Ah wait, that's for reduction instead of parallel_for as @JBludau points out, let me try |
@NaderAlAwar ok, if I apply a similar patch for |
* fix the memory leak with small patch described in kokkosgh-184
@NaderAlAwar Well, the original |
um … It also segfaults on my machine for sizes above 1e8. Running a debug version I get mixed results in the debugger: for different numbers of threads it segfaults at different indices in the view. All of them are valid indices though… |
ok ... so I tried with valgind and it looks to me like the memory we are using from numpy by: arr = rng.random(array_size_1d).astype(float)
view = pk.from_numpy(arr)
arr = view gets freed while we are still doing our work e.g.:
If I use a view allocated by pykokkos as arr = pk.View([array_size_1d], pk.double) Furthermore, if I get rid of the self assignment on np_array = rng.random(array_size_1d) #.astype(float)
view = pk.from_numpy(np_array)
arr = view I guess we have to look at |
I guess the easy workaround until then is to not do self assignments on arrays that we grab from numpy |
This segfaults for me locally with PyKokkos at
1e6
1D (host)array_size_1d
array size, but not with CuPy or NumPy, and not with1e4
elements with PyKokkos. Clearly, PyKokkos should be able to handle this--probably makes sense to cut the reproducer/regression test down and investigate (latestdevelop
branch):The text was updated successfully, but these errors were encountered: