Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Add extended API documentation #53

Merged
merged 1 commit into from
Nov 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/contributing.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
has_children: true
has_toc: true
nav_order: 4
nav_order: 5
---

# Contributing
Expand Down
29 changes: 29 additions & 0 deletions docs/extended_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
has_children: true
has_toc: false
nav_order: 3
---

# Extended API

## [Headers](./extended_api/headers.md)

### [\<cuda/pipeline>](./extended_api/headers/pipeline.md)

## [Concepts](./extended_api/concepts.md)

### [Group](./extended_api/concepts/group.md)

### [Shape](./extended_api/concepts/shape.md)

## [Synchronization library](./extended_api/synchronization_library.md)

### [pipeline](./extended_api/synchronization_library/pipeline.md)

### [pipeline_shared_state](./extended_api/synchronization_library/pipeline_shared_state.md)

## [Asynchronous operations library](./extended_api/asynchronous_operations_library.md)

### [aligned_size_t](./extended_api/asynchronous_operations_library/aligned_size_t.md)

### [memcpy_async](./extended_api/asynchronous_operations_library/memcpy_async.md)
17 changes: 17 additions & 0 deletions docs/extended_api/asynchronous_operations_library.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
parent: Extended API
has_children: true
has_toc: false
---

# Asynchronous operations library

The asynchronous operations library provides components for asynchronous data movement.

## Shapes

| [aligned_size_t](./asynchronous_operations_library/aligned_size_t.md) | defines an extent of bytes with a statically defined alignment `(class template)` |

## Asynchronous operations

| [memcpy_async](./asynchronous_operations_library/memcpy_async.md) | asynchronously copies one buffer to another `(function template)` |
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
grand_parent: Extended API
parent: Asynchronous operations library
---

# cuda::**aligned_size_t**

Defined in header [`<cuda/barrier>`](../headers/barrier.md)

Defined in header [`<cuda/pipeline>`](../headers/pipeline.md)

```c++
template<size_t Alignment>
struct aligned_size_t;
```

The class template `cuda::aligned_size_t` is a _shape_ representing an extent of bytes with a statically defined (address and size) alignment.

## Template parameters

| Alignment | the address & size alignement of the byte extent |

## Data members

| [align](./aligned_size_t/align.md) | the alignment of the byte extent |
| [value](./aligned_size_t/value.md) | the size of the byte extent |

## Member functions

| [(constructor)](./aligned_size_t/constructor.md) | constructs an _aligned size_ |
| (destructor) [implicitly declared] | trivial implicit destructor |
| operator= [implicitly declared] | trivial implicit copy/move assignment |
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) |

## Notes

If `value` is not a multiple of `align` the behavior is undefined.
c0riolis marked this conversation as resolved.
Show resolved Hide resolved

If `Alignment` is not a [valid alignment](https://en.cppreference.com/w/c/language/object#Alignment) the behavior is undefined.

c0riolis marked this conversation as resolved.
Show resolved Hide resolved
## Example

```c++
#include <cuda/barrier>

__global__ void example_kernel(void * dst, void * src, size_t size)
{
cuda::barrier<cuda::thread_scope_system> barrier;
init(&barrier, 1);

// Implementation cannot make assumptions about alignment
cuda::memcpy_async(dst, src, size, barrier);

// Implementation can assume that dst, src and size are 16-bytes aligned and may optimize accordingly
cuda::memcpy_async(dst, src, cuda::aligned_size_t<16>(size), barrier);

barrier.arrive_and_wait();
}
```
[See it on Godbolt](https://godbolt.org/z/v7Ev9E){: .btn }
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**align**

```c++
static constexpr size_t align = Alignment;
```

Represents the alignment (address and size) of the byte extent.
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**aligned_size_t**

```c++
explicit aligned_size_t(size_t size);
```
Constructs an `aligned_size_t` _shape_.
## Notes
If `size` is not a multiple of `Alignment` the behavior is undefined.
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
nav_exclude: true
---

# cuda::aligned_size_t\<Alignment>::**value**

```c++
size_t value;
```

Represents the size of the byte extent.
64 changes: 64 additions & 0 deletions docs/extended_api/asynchronous_operations_library/memcpy_async.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
grand_parent: Extended API
parent: Asynchronous operations library
---

# cuda::**memcpy_async**

Defined in header [`<cuda/barrier>`](../../api/synchronization_library/barrier.md)

```c++
template<typename Shape, thread_scope Scope>
void memcpy_async(void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (1)

template<typename Group, typename Shape, thread_scope Scope>
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, barrier<Scope> & barrier); // (2)
```

Defined in header [`<cuda/pipeline>`](../headers/pipeline.md)

```c++
template<typename Shape, thread_scope Scope>
void memcpy_async(void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (3)

template<typename Group, typename Shape, thread_scope Scope>
void memcpy_async(Group const & group, void * destination, void const * source, Shape size, pipeline<Scope> & pipeline); // (4)
```

Asynchronously copies `size` bytes from the memory location pointed to by `source` to the memory location pointed to by `destination`.
c0riolis marked this conversation as resolved.
Show resolved Hide resolved
Both objects are reinterpreted as arrays of `unsigned char`.

`cuda::memcpy_async` have similar constraints to [`std::memcpy`](https://en.cppreference.com/w/cpp/string/byte/memcpy), namely:
* If the objects overlap, the behavior is undefined.
* If either `destination` or `source` is an invalid or null pointer, the behavior is undefined (even if `count` is zero).
* If the objects are [potentially-overlapping](https://en.cppreference.com/w/cpp/language/object#Subobjects) the behavior is undefined.
* If the objects are not of [`TriviallyCopyable`](https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable) type the program is ill-formed, no diagnostic required.

If _Shape_ is [`cuda::aligned_size_t`](./aligned_size_t.md)), `source` and `destination` are both required to be aligned on [`cuda::aligned_size_t::align`](./aligned_size_t/align.md), else the behavior is undefined.

If `pipeline` is in a _quitted state_ (see [`pipeline::quit`](../synchronization_library/pipeline/quit.md)), the behavior is undefined.

1. Binds the asynchronous copy completion to `barrier` and issues the copy in the current thread.
2. Binds the asynchronous copy completion to `barrier` and cooperatively issues the copy across all threads in `group`.
3. Binds the asynchronous copy completion to `pipeline` and issues the copy in the current thread
4. Binds the asynchronous copy completion to `pipeline` and cooperatively issues the copy across all threads in `group`.

## Template parameters

| Group | a type satisfying the [_group concept_](../concepts/group.md) |
| Shape | a type satisfying the [_shape concept_](../concepts/shape.md) (see [`size_t`](https://en.cppreference.com/w/c/types/size_t) and [`cuda::aligned_size_t`](./aligned_size_t.md)) |

## Parameters

| group | the group of threads |
| destination | pointer to the memory location to copy to |
| source | pointer to the memory location to copy from |
| size | the number of bytes to copy |
| barrier | the barrier object used to wait on the copy completion |
| pipeline | the pipeline object used to wait on the copy completion |

c0riolis marked this conversation as resolved.
Show resolved Hide resolved
## Example

```c++
TODO
c0riolis marked this conversation as resolved.
Show resolved Hide resolved
```
11 changes: 11 additions & 0 deletions docs/extended_api/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
parent: Extended API
has_children: true
has_toc: false
nav_order: 1
---

# Concepts

| [Group](./concepts/group.md) | defines the requirements of a type that represents a group of cooperating threads |
| [Shape](./concepts/shape.md) | defines the requirements of a type that represents a byte extent with a particular memory layout |
46 changes: 46 additions & 0 deletions docs/extended_api/concepts/group.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
grand_parent: Extended API
parent: Concepts
---

# Group

```c++
struct Group {
static constexpr cuda::thread_scope thread_scope;
integral size() const;
integral thread_rank() const;
void sync() const;
};
```

The _Group concept_ defines the requirements of a type that represents a group of cooperating threads.

## Data members

| thread_scope | the scope at which `Group::sync()` synchronizes memory operations and thread execution |

## Member functions

| size | returns the number of participating threads |
| thread_rank | returns a unique value for each participating thread (`0 <= Group::thread_rank() < Group::size()`) |
| sync | synchronizes the participating threads |

## Notes

This concept is defined for documentation purposes but is not materialized in the library.

## Example

```c++
#include <cuda/atomic>

struct single_thread_group {
static constexpr cuda::thread_scope thread_scope = cuda::thread_scope::thread_scope_thread;
size_t size() const { return 1; }
size_t thread_rank() const { return 0; }
void sync() const { }
};
```

[See it on Godbolt](https://godbolt.org/z/453r3s){: .btn }
37 changes: 37 additions & 0 deletions docs/extended_api/concepts/shape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
grand_parent: Extended API
parent: Concepts
---

# Shape

```c++
struct Shape {
operator size_t() const;
};
```
The _Shape concept_ defines the requirements of a type that represents a byte extent with a particular memory layout.
## Member functions
| operator size_t | implicit conversion to [`size_t`](https://en.cppreference.com/w/cpp/types/size_t) |
## Notes
This concept is defined for documentation purposes but is not materialized in the library.
## Example
```c++
// A size that carries an alignment hint
template <size_t Align>
struct aligned_size {
static constexpr size_t align = Align;
size_t size;
aligned_size(size_t s) : size(s) {}
operator size_t() const { return size; }
};
```

[See it on Godbolt](https://godbolt.org/z/hbajKo){: .btn }
12 changes: 12 additions & 0 deletions docs/extended_api/headers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
parent: Extended API
has_children: true
has_toc: false
nav_order: 0
---

# Headers

## Synchronization library

| [\<pipeline\>](./headers/pipeline.md) | [Pipelines](./synchronization_library/pipeline.md) and corresponding [memcpy_async](./asynchronous_operations_library/memcpy_async.md) overloads |
Loading