-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread-safety of device_context
#11
Comments
@PokhodenkoSA this requirement should be considered when we reevaluate the queue manager design. If we make the get_active_queues re-entrant that should be enough? Basically, we have to remove the use of the static global stack for the queues. Adding this for gold. |
@oleksandr-pavlyk since we are talking about the specs, these questions should be addressed. |
Here is an example I used to verify that import dpctl
import dpctl.memory as dpmem
import random
import asyncio
async def task1():
q = dpctl.SyclQueue("opencl:gpu")
abc = b"abcdefghijklmnopqrstuvwxyz"
for _ in range(100):
with dpctl.device_context(q):
m = dpmem.MemoryUSMShared(len(abc))
m.copy_from_host(abc)
await asyncio.sleep(0.1*random.random())
async def task2():
q = dpctl.SyclQueue("level_zero:gpu")
for _ in range(100):
with dpctl.device_context(q):
m = dpmem.MemoryUSMShared(10)
m.copy_from_host(b'\x00' * 10)
await asyncio.sleep(0.1*random.random())
async def main():
j1 = asyncio.create_task(task1())
j2 = asyncio.create_task(task2())
await j1
await j2
print("done")
asyncio.run(main()) It executes normally, and outputs |
@oleksandr-pavlyk, the relevant test is one that uses |
Thank you @eric-wieser, having moved await inside the with context I still see no issues: import dpctl
import dpctl.memory as dpmem
import random
import asyncio
async def task1():
q = dpctl.SyclQueue("opencl:gpu")
abc = b"abcdefghijklmnopqrstuvwxyz"
m = dpmem.MemoryUSMShared(len(abc))
for _ in range(100):
with dpctl.device_context(q) as lq:
cd = dpctl.get_current_queue().sycl_device
assert cd.backend == q.sycl_device.backend
m.copy_from_host(abc)
await asyncio.sleep(0.1*random.random())
async def task2():
q = dpctl.SyclQueue("level_zero:gpu")
m = dpmem.MemoryUSMShared(10)
for _ in range(100):
with dpctl.device_context(q) as lq:
cd = dpctl.get_current_queue().sycl_device
assert cd.backend == q.sycl_device.backend
m.copy_from_host(b'\x00' * 10)
await asyncio.sleep(0.1*random.random())
async def main():
j1 = asyncio.create_task(task1())
j2 = asyncio.create_task(task2())
await j1
await j2
print("done")
asyncio.run(main()) The code uses two queues with different backends, and checks that the current queue, which is being reset with the context, has the expected backend.
|
Yes, that code also is unlikely to detect any problems because it doesn't actually use the context after resuming from the await. Swapping the To be clear, #265 already outlines the solution here - the state must be stored in PEP 567's |
Ok, indeed import dpctl
import dpctl.memory as dpmem
import random
import asyncio
async def task1():
q = dpctl.SyclQueue("opencl:gpu")
abc = b"abcdefghijklmnopqrstuvwxyz"
m = dpmem.MemoryUSMShared(len(abc))
with dpctl.device_context(q) as lq:
for _ in range(100):
cd = dpctl.get_current_queue().sycl_device
assert cd.backend == q.sycl_device.backend
m.copy_from_host(abc)
await asyncio.sleep(0.1*random.random())
async def task2():
q = dpctl.SyclQueue("level_zero:gpu")
m = dpmem.MemoryUSMShared(10)
with dpctl.device_context(q) as lq:
for _ in range(100):
cd = dpctl.get_current_queue().sycl_device
assert cd.backend == q.sycl_device.backend
m.copy_from_host(b'\x00' * 10)
await asyncio.sleep(0.1*random.random())
async def main():
j1 = asyncio.create_task(task1())
j2 = asyncio.create_task(task2())
await j1
await j2
print("done")
asyncio.run(main()) trips. |
Here is a Cython file I used: # filename: stack.pyx
# distutils: language = c++
# cython: language_level=3
import contextvars
from contextlib import contextmanager
from libcpp.vector cimport vector
ctypedef vector.vector[size_t] stack_t
cdef class Stack:
cdef stack_t stack
def __cinit__(self, size_t v=0):
self.stack = stack_t()
self.stack.push_back(v)
def push(self, size_t v):
self.stack.push_back(v)
def pop(self):
self.stack.pop_back()
def top(self):
return self.stack.at(0)
def set_global(self, size_t v):
self.stack[0] = v
def current(self):
return self.stack.back()
def copy(self):
cdef Stack _copy = Stack.__new__(Stack, self.stack[0])
for i in range(1, self.stack.size()):
_copy.stack.push_back(self.stack[i])
return _copy
_st = Stack()
@contextmanager
def working_stack_context(v):
tmp = None
try:
tmp = _st.copy()
tmp.push(v)
yield tmp
finally:
if tmp is not None:
tmp.pop()
else:
raise TypeError("Argument {} is not of size_t".format(v))
@contextmanager
def broken_stack_context(v):
tmp = None
try:
tmp = _st
tmp.push(v)
yield tmp
finally:
if tmp is not None:
tmp.pop()
else:
raise TypeError("Argument {} is not of size_t".format(v)) Using the # filename: asyncio_run.py
import random
import asyncio
import stack
stack_context = stack.working_stack_context
async def task1():
v = 11
with stack_context(v) as s:
for i in range(100):
c = s.current()
assert c == v, "task1 check failed at i={}, (c, v) = ({}, {})".format(i, c, v)
await asyncio.sleep(0.1*random.random())
async def task2():
v = 7
with stack_context(v) as s:
for j in range(100):
c = s.current()
assert c == v, "task2 check failed at j={}, (c,v)=({},{})".format(j, c, v)
await asyncio.sleep(0.1*random.random())
async def main():
j1 = asyncio.create_task(task1())
j2 = asyncio.create_task(task2())
await j1
await j2
print("done")
asyncio.run(main()) The code executes just fine:
As soon as I replace
I was never able to do make it work with |
Your Cython here is a distraction I think, and not really relevant to what you're struggling with. this example from the python docs seems reasonable. |
You are right, use of Cython can be avoided. # filename: stack2.py
import contextvars
from contextlib import contextmanager
class Stack:
def __init__(self, v):
self.stack = [v]
def push(self, v):
self.stack.append(v)
def pop(self):
self.stack.pop()
def top(self):
return self.stack[0]
def set_global(self, v):
self.stack[0] = v
def current(self):
return self.stack[-1]
def __len__(self):
return len(self.stack)
def copy(self):
_cpy = Stack.__new__(Stack, self.stack[0])
_cpy.__init__(self.stack[0])
for i in range(1, len(self)):
_cpy.push(self.stack[i])
return _cpy
_stack = contextvars.ContextVar('global stack', default=Stack(0))
@contextmanager
def stack_context(v):
token = None
try:
tmp = _stack.get().copy()
tmp.push(v)
token = _stack.set(tmp)
yield tmp
finally:
if token is not None:
_stack.reset(token)
else:
raise TypeError("Argument {} can be used".format(v)) This context manager works as expected with The think I was missing was the concept of So in
|
I think the mistake here is using
|
Yes, the whole issue arises because we are keeping the state in C rather than in Python. I initially did it this way to make it easy for us to get the current queue by just calling I support not storing the state globally in the C library. Also, we can use async-context-managers as syntactic sugar. @shssf @PokhodenkoSA @reazulhoque I am adding you to the discussion here, as these changes will impact how numba-dppy and dpnp use dpctl. |
For reference, there is a C API for contextvars too; |
I see no benefit to using async context manager here. Async context managers are for when your context manager requires access to the event loop to enter and leave the context. dpctl doesn't need access to the event loop, so there's no point doing this. |
This function caches queues by (context, device) key. The cache is stored in contextvars.ContextVar variable, learning our lessons from issue gh-11. get_device_cached_queue(dev : dpctl.SyclDevice) -> dpctl.SyclQueue get_device_cached_queue( (ctx: dpctl.SyclContext, dev : dpctl.SyclDevice) ) -> dpctl.SyclQueue Function retrieves the queue from cache, or adds the new queue instance there if previously absent.
Just to record the issues raised by @hameerabbasi and I in the numba dev call:
with device_context
in parallel, do they interfere with each other?with_device_context
in parallel, do they interfere with each other? (xref PEP 567)The text was updated successfully, but these errors were encountered: