Making ocl safer to use #172

aabizri · 2019-10-09T00:23:29Z

Currently, there is no way to efficiently use ocl without unsafe everywhere, (not only on kernel execution, but also on buffer creation, mem-mapping, ...), and for good reason: as they currently exist, they are fundamentally unsafe.

Knowing that rust has a strong ownership model as well as robust memory and concurrency safety support, is there way to encode some of those requirement statically/at compile time using the type system ? This would also reduce/eliminate the quantity of unsafe required to use ocl, and we can take inspiration of what glium achieved, even though OpenGL shaders are easier to deal with safely.

Does anyone have any idea of how to achieve or get closer to that goal ? I'm playing for the idea but not making much progress right now.

Non-exhaustive list of UB:

If a kernel reads from a buffer marked CL_MEM_WRITE_ONLY
If a kernel reads from a buffer marked CL_MEM_READ_ONLY
Overlapping backing memory CL_MEM_USE_HOST_PTR-created buffers
Concurrent reading from, writing to and copying between both a buffer object and its sub-buffer object(s) (but reading from them is defined)
Concurrent reading from, writing to and copying between sub-buffer objects with overlapping regions (but reading from them is defined)
Reading from a buffer creating with CL_MEM_USE_HOST_PTR while it is mapped is undefined, or while it has been used (similar but not exactly the same requirements for writing)
Reading from a buffer mapped for writing only, or having a kernel or command read/write it, sub-buffers or parent-buffers)
Writing to a buffer mapped for reading only

This also generally applies to images.

The text was updated successfully, but these errors were encountered:

dmarcuse · 2019-10-09T09:40:41Z

I've thought of this a bit as well. I think at least for memory objects, this is actually not too difficult to implement using lifetimes and generics:

Proof of concept

use std::marker::PhantomData;

pub trait HostAccess {}

pub trait HostReadable: HostAccess {}
pub trait HostWritable: HostAccess {}

pub struct HostNoAccess;
pub struct HostReadOnly;
pub struct HostWriteOnly;
pub struct HostReadWrite;

impl HostAccess for HostNoAccess {}
impl HostAccess for HostReadOnly {}
impl HostReadable for HostReadOnly {}
impl HostAccess for HostWriteOnly {}
impl HostWritable for HostWriteOnly {}
impl HostAccess for HostReadWrite {}
impl HostReadable for HostReadWrite {}
impl HostWritable for HostReadWrite {}

pub struct Buffer<'a, H: HostAccess> {
    _lifetime: PhantomData<&'a ()>,
    _host_access: PhantomData<H>,
}

impl<'a, H: HostAccess> Buffer<'a, H> {
    fn create() -> Self {
        Self {
            _lifetime: PhantomData::default(),
            _host_access: PhantomData::default(),
        }
    }

    pub fn use_host_slice(_slice: &'a mut [u8]) -> Self {
        Self::create()
    }
}

impl<H: HostAccess> Buffer<'static, H> {
    pub fn copy_host_slice(_slice: &[u8]) -> Self {
        Self::create()
    }
}

impl<'a, H: HostReadable> Buffer<'a, H> {
    pub fn read(&self, _into: &mut [u8]) {}
}

impl<'a, H: HostWritable> Buffer<'a, H> {
    pub fn write(&mut self, _from: &[u8]) {}
}

fn main() {
    let mut memory = [1, 2, 3];

    // create a buffer backed by memory, with no read/write access
    let buffer = Buffer::<HostNoAccess>::use_host_slice(&mut memory);

    // uncommenting either of these lines will fail to compile
    // println!("memory: {:?}", &memory);
    // buffer.write(&[]);

    // drop the buffer, which then allows us to access the memory again...
    drop(buffer);

    println!("memory: {:?}", &memory);

    // create a buffer with read-only access, but with an initial copy
    let buffer = Buffer::<HostReadOnly>::copy_host_slice(&memory);

    // we can read the buffer, and still have mutable access to memory...
    buffer.read(&mut memory);

    // ...but we can't write it, so uncommenting this line would error
    // buffer.write(&[]);

    // drop the buffer - this is just to force the buffer to not be dropped early by NLL
    drop(buffer);
}

I haven't yet come up with an idea I like for kernel invocations/asynchronous operations, but I think these could still be implemented using wrapper objects and methods to "borrow back" mutable access of a buffer referenced by a kernel.

In general, these are changes that I'd like to see, but would basically require rewriting the entire project, and breaking API compatibility. I'd be interested in working on this and some other fundamental issues (like #107) in a new experimental project.

aabizri · 2019-10-22T23:18:39Z

Sorry for not getting back to you sooner,

Congrats on the idea, I played around with the sample code a bit and it seems like a good skeleton for the solution !

I agree that it would require either a very large breaking rewrite or another experiment. If you go ahead on the latter I'd be interested to contribute.

I've also been experimenting locally on some other parts, including type-checking kernel invocations with macros and stub kernel functions, and I dabbled a little bit in the asynchronous kernel realm, as written below.

For kernel invocation

This indeed seems like a hard problem. I've been experimenting on using procedural macros, which allows us to type-check kernel invocations when the kernels parameters are known at compile-time, which is most of the time (though not when using #define for kernel parameters).

struct KernelDefinition<'prog, 'name> {
    program: &'prog Program,
    name: &'name str,
}

fn main() {
   let my_kernel_def = KernelDefinition{/* ... */};
   let my_kernel = instantiate_kernel!(my_kernel_def, mut buf<u32>, buf<f64>, scalar<u64>);

The procedural macro would generate something like:

fn main() {
   let my_kernel_def = KernelDefinition{/* ... */};
   let my_kernel: Fn(&mut Buffer<u32>, &Buffer<f64>, &u64) -> Kernel = |&mut Buffer<u32>, &Buffer<f64>, &u64| {
      Kernel::builder()./* ... */.build()
   }
}

This could normally be combined with the code you posted, adding KernelWritable and KernelReadable traits for safety.

This also makes it possible in the future to have single-source type-checked GPGPU code using procedural macros, for example with rlsl.

For asynchronous kernels

I have no code to present here, but I think we should start by thinking that all queues are out-of-order, just like it is done in sycl. Instead of requirements we should use combinators (and_then(Kernel) -> KernelHandler and KernelHandler::join(KernelHandler)) with some special magic to correctly manage the event waiting list & event creation.

For mapping/unmapping

We could do something like that see playground

pub struct ReadMappedBuffer<'a, H: HostReadable> {
    _buf: &'a Buffer<'a, H>,
}

impl<'a, H: HostReadable> Buffer<'a, H> {
    pub fn map_readonly(&'a self) -> ReadMappedBuffer<'a, H> {
        ReadMappedBuffer::<'a, H> { _buf: self }
    }
}

const FAKE_ARRAY: [u8; 3] = [1, 3, 5];

impl<'a, H: HostReadable> Deref for ReadMappedBuffer<'a, H> {
    type Target = [u8];

    fn deref(&self) -> &Self::Target {
        &FAKE_ARRAY
    }
}

fn main() {
    let mut memory = [1, 2, 3];

    // create a buffer backed by memory, with no read/write access
    let buffer = Buffer::<HostReadOnly>::use_host_slice(&mut memory);

    // uncommenting either of these lines will fail to compile
    // println!("memory: {:?}", &memory);
    // buffer.write(&[]);

    // but when mapped, this works
    let mapped = buffer.map_readonly();
    println!("memory: {:?}", mapped.deref());
}

PS: you can drop me an e-mail at the address on my profile page

calebwin · 2019-10-26T05:26:57Z

Even if you can address type safety, what about data races?
Data races are unsafe in Rust. How would you eliminate that unsafety?

I've been trying to figure out type safety for my Emu project. RLSL guarantees type safety within the kernel but I'm not sure about the CPU-GPU interface. Some of my thoughts on this are here.

Some things that I think are needed for safety.

macros/compile-time generating of code that could potentially be failed by RustC in a way that is strategic
generated Rust code that could potentially be failed by RustC (a KernelDefinition/DSL is not enough because you're going to end up with stringly-typed variables). Emu generates a dead code closure that contains checked Rust code, for example.

calebwin · 2019-10-26T14:52:54Z

I might take back the 2 things I thought were needed for safety. Glium is really, really cool. Honestly can't believe I haven't seen it before. I'll have to look into it for sure.

Edit: I thought they generate GLSL from functions/DSL/AST. Still cool though. Rasen is interesting too.

calebwin · 2019-10-26T18:50:03Z

By the way, I started experimenting with safety for kernels, programs (not buffers) here. This will probably go through multiple make-overs to get the right design of a safe abstraction.

Admittedly, I haven't fully read through your comments here fully @aabizri @dmarcuse. Apologies for jumping into the conversation.

aabizri · 2019-10-26T21:34:42Z

Nice to see more and more interest @calebwin !

Regarding your experiments on safe abstractions for kernel-based map operations on buffers, I've also been working on such a thing locally for one of my projects, and if I have time I'll put it on a gist this weekend.

My implementation is based on generalized functional kernels, with the goal to support the basic functional operations of map, fold (and thus reduce), and scan (and thus sum). Currently, only scan/sum is fully implemented.

However, these convenience methods should be built upon to the broader work of encoding GPGPU semantics in rust: for a more general Buffer::map method, we need to be able to provide a trait encoding kernel orchestration(1), as well as ergonomic and safe kernel & buffer async semantics.

(1): as some algorithms may require workgroup synchronization or mulitple Kernel passes, we can't just have map only take a Kernel, but some object that orchestrates kernel launches.

That leads me to what I see as a todo-list of sorts

A humble todo list for safe & convenient GPGPU on Rust

Foundational work (most important)

Safe & convenient Buffer creation, access, transmuting, mapping, subbuffering, and other operations (@dmarcuse 's ideas seems to be a good foundation for this)
Safe & convenient Kernel calling (not creation) in rust, including data-race checking (only access to one buffer at a time) and type-checking (correct buffer type & capabilities: read/write).
Safe & convenient Kernel creation: the trait system & macros I proposed earlier in the thread may very well work, but maybe there are better ideas out there
async-first: not only is GPGPU is deeply asynchronous, but working on everything as async forces the library authors to think about all concurrency-linked bugs that can arise !

Convenience operations (later, built on the former)

Functional operations like map, fold, reduce, scan, etc.

PS: Rasen seems very interesting, thanks for the link

dmarcuse · 2020-01-05T19:58:26Z

For those interested, I've been working on a new library, dynamic_ocl, which implements some of the ideas discussed here: Of particular note, Kernel objects have a type parameter representing arguments to catch cases where the wrong argument index is used, and Buffer objects make use of lifetime and type parameters to prevent issues like unsafe use of USE_HOST_PTR or incorrect host accessibility (e.g. attempting to read from a HOST_WRITE_ONLY or HOST_NO_ACCESS buffer). A very early alpha release is available on crates.io.

aabizri · 2020-01-05T23:39:16Z

I've been doing the same thing, though I've taken a pause for the holiday season. Currently I have an early design sketched out which also uses much of the same strategy as you do, so I'm quite happy of the direction the current efforts for good type-safe OpenCL in Rust are going !I'm excited to see what comes next !Le 5 janv. 2020 14:58, Dominic Marcuse <notifications@github.com> a écrit :For those interested, I've been working on a new library, dynamic_ocl, which implements some of the ideas discussed here: Of particular note, Kernel objects have a type parameter representing arguments to catch cases where the wrong argument index is used, and Buffer objects make use of lifetime and type parameters to prevent issues like unsafe use of USE_HOST_PTR or incorrect host accessibility (e.g. attempting to read from a HOST_WRITE_ONLY or HOST_NO_ACCESS buffer). A very early alpha release is available [https://crates.io/crates/dynamic_ocl](on crates.io). —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making ocl safer to use #172

Making ocl safer to use #172

aabizri commented Oct 9, 2019

dmarcuse commented Oct 9, 2019

aabizri commented Oct 22, 2019 •

edited

Loading

calebwin commented Oct 26, 2019

calebwin commented Oct 26, 2019 •

edited

Loading

calebwin commented Oct 26, 2019

aabizri commented Oct 26, 2019

dmarcuse commented Jan 5, 2020 •

edited

Loading

aabizri commented Jan 5, 2020 via email

Making ocl safer to use #172

Making ocl safer to use #172

Comments

aabizri commented Oct 9, 2019

Non-exhaustive list of UB:

dmarcuse commented Oct 9, 2019

aabizri commented Oct 22, 2019 • edited Loading

For kernel invocation

For asynchronous kernels

For mapping/unmapping

calebwin commented Oct 26, 2019

calebwin commented Oct 26, 2019 • edited Loading

calebwin commented Oct 26, 2019

aabizri commented Oct 26, 2019

A humble todo list for safe & convenient GPGPU on Rust

Foundational work (most important)

Convenience operations (later, built on the former)

dmarcuse commented Jan 5, 2020 • edited Loading

aabizri commented Jan 5, 2020 via email

aabizri commented Oct 22, 2019 •

edited

Loading

calebwin commented Oct 26, 2019 •

edited

Loading

dmarcuse commented Jan 5, 2020 •

edited

Loading