diff --git a/ext/cl_exp_tensor.asciidoc b/ext/cl_exp_tensor.asciidoc
new file mode 100644
index 000000000..5f8ac60b3
--- /dev/null
+++ b/ext/cl_exp_tensor.asciidoc
@@ -0,0 +1,811 @@
+// Copyright 2023 The Khronos Group. This work is licensed under a
+// Creative Commons Attribution 4.0 International License; see
+// http://creativecommons.org/licenses/by/4.0/
+= cl_exp_tensor
+
+:source-highlighter: coreray
+
+[[cl_exp_tensor]]
+== Tensor Data Type
+
+This extension provides a new opaque OpenCL datatype called
+`cl_tensor`. It is used for storing N-dimensional tensor data in
+implementation-defined memory layout which may be optimized based on
+tensor's use cases. The datatype is designed to be efficiently used
+within the `cl_khr_command_buffers` extension to capture task graphs
+which can utilize tensors as input, output and temporary storage.
+
+=== General information
+
+==== Name Strings
+
+`cl_exp_tensor`
+
+==== Version history
+
+[cols="1,1,3",options="header",]
+|====
+| *Date*     | *Version* | *Description*
+| 2023-11-XX | 0.1.0     | First assigned version.
+|====
+
+==== Dependencies
+
+This extension is written against the OpenCL Specification version 3.0.14.
+
+This extension requires OpenCL 1.2 or later.
+
+==== Contributors
+
+Henry Linjamäki, Intel. +
+Pekka Jääslkeläinen, Intel and Tampere University. +
+Ben Ashbaugh, Intel. +
+
+=== Overview
+
+The new tensor object enables applications to describe N-dimensional
+arrays whose memory layout is opaque to applications. The goals
+of this extension are the following:
+
+* Enable implementations to have freedom of placement data of the tensors for
+  improving performance of the kernels which use them. This extension
+  is designed such it allows implementations to determine optimal
+  memory layouts for the tensors based on their use cases for
+  increased performance, by means of, for example, analyzing kernels’ access
+  patterns or, in case of built-in kernels, by inspecting the tensor
+  arguments they operate on.
+
+* Reduce details and boilerplate needed for performance portable implementation of
+  applications by being less dependent on platform or device specifics
+  on the memory layout / data arrangements which matters for
+  performance. Such specifics may include:
+
+** alignment of data (e.g. for avoiding misaligned memory accesses)
+
+** arrangement of data required by kernels (column-major vs row-major
+   for matrix multiplication, NHWC vs NCHW for neural network
+   convolution)
+
+** arrangement of the data into tiles (or “packing”) for improving
+   cache and TLB hits
+
+** arrangement of data into specific tiles in order to exploit complex
+   HW operations such as matrix multiplications (Intel AMX, AMD matrix
+   cores).
+
+** arrangement of data into rows separated by a stride in order to
+   avoid bank conflicts in GPUs.
+
+The tensor data type is designed to be efficiently used together with command buffers (cl_khr_command_buffers)
+and built-in kernels, including kernels to be provided by the Defined
+Built-in Kernels (cl_khr_defined_builtin_kernels) extension that is being prepared together with this extension.
+
+=== Modifications to OpenCL
+
+==== New Section: 5.x Tensor Objects
+
+A tensor object stores an N-dimensional array of elements. The memory
+layout of the tensor is opaque to the application. When a tensor
+object is created it is initially not associated to any storage for the tensor elements.
+ A storage is bound to a tensor
+by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
+without storage can be set as kernel arguments for kernels which
+accepts them. Kernels which have tensor arguments must have storage
+assigned to them prior enqueuing the kernels for execution.
+
+==== New OpenCL Functions added to Tensor Objects section
+
+To create a tensor use:
+
+[source,c]
+----
+cl_tensor clCreateTensor(
+    cl_context context,
+    const cl_tensor_peoperties *properties,
+    size_t rank,
+    const size_t* shape,
+    cl_tensor_datatype dtype,
+    cl_int *errcode_ret);
+----
+
+* _context_ is a valid OpenCL context used to create the tensor object.
+
+* _properties_ is an optional list of properties for the tensor object
+  and their corresponding values. The list is terminated with the
+  special property 0. If no properties are required, properties may be
+  NULL. This extension does not define any optional properties for
+  tensors.
+
+* _rank_ is the number of dimensions. Zero value creates a "scalar"
+  tensor which has no dimensions but has storage for one element.
+
+* _shape_ is a list of sizes of the dimensions. The length of the list
+  must be _rank_ elements. _shape_ can be NULL if _rank_ value is
+  zero. All the first _rank_ values in the list must be non-zero.
+
+* _dtype_ is the element type of _tensor_. Refer to the
+  <<TensorDtypes>> table for the types.
+
+* _errcode_ret_ may return an appropriate error code. If errcode_ret
+  is NULL, no error code is returned.
+
+clCreateTensor function creates a `rank`-dimensional tensor with
+`shape[0] * shape[1] * ... * shape[rank-1]` elements of _dtype_
+type. At the creation time of the tensor, it does not have
+storage. The storage is assigned to the tensor by calling
+clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR.
+
+A command that refers to a tensor must be bound to a valid buffer
+object before enqueuing or recording the command.
+
+*clCreateTensor* returns a valid non-zero tensor object and errcode_ret
+is set to CL_SUCCESS if the tensor object is created
+successfully. Otherwise, they return a NULL value with one of the
+following error values returned in errcode_ret:
+
+* CL_INVALID_CONTEXT if context is not a valid context.
+
+* CL_INVALID_PROPERTY if a property name in properties is not a
+  supported property name, if the value specified for a supported
+  property name is not valid, or if the same property name is
+  specified more than once.
+
+* CL_INVALID_VALUE if a value specified in dtype is invalid.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
+.Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.
+[cols="1,1,1",stripes=even]
+[#TensorDtypes]
+|===
+| *Tensor element data type* | *Description* | *API type*
+
+| CL_TENSOR_BOOL | 1-bit signedless integer.  |
+cl_uchar. footnote:[only LSB bit is considered when writing data to
+tensor. When reading data from tensor the boolean value will be
+written as 0 or 1. The boolean values in the tensor may be packed densenly]
+| CL_TENSOR_INT8       | 8-bit signed integer.            | cl_char.
+| CL_TENSOR_INT16      | 16-bit signed integer.           | cl_short.
+| CL_TENSOR_INT32      | 32-bit signed integer.           | cl_int.
+| CL_TENSOR_INT64      | 64-bit signed integer.           | cl_long.
+| CL_TENSOR_UINT8      | 8-bit unsigned integer.          | cl_uchar.
+| CL_TENSOR_UINT16     | 16-bit unsigned integer.         | cl_ushort.
+| CL_TENSOR_UINT32     | 32-bit unsigned integer.         | cl_uint.
+| CL_TENSOR_UINT64     | 64-bit unsigned integer.         | cl_ulong.
+| CL_TENSOR_HALF       | Half precision floating-point.   | cl_half.
+| CL_TENSOR_BFLOAT16   | 16-bit brain floating-point.     | cl_ushort
+| CL_TENSOR_FLOAT      | Single precision floating-point. | cl_float.
+| CL_TENSOR_DOUBLE     | Double precision floating-point. | cl_double.
+| CL_TENSOR_COMPLEX64  | 64-bit complex floating-point with
+  32-bit real and imaginary part. | cl_float2
+| CL_TENSOR_COMPLEX128 | 128-bit complex floating-point with
+  64-bit real and imaginary part. | cl_double2
+|===
+
+To retain a tensor object, call the function
+
+[source,c]
+----
+cl_int clRetainTensorObject(cl_tensor tensor);
+----
+
+* _tensor_ is the tensor object to be retained.
+
+The _tensor_ reference count is incremented.
+
+*clRetainTensor* returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if the tensor is not a valid tensor object.
+
+To release a tensor object, call the function
+
+[source,c]
+----
+cl_int clReleaseTensorObject(cl_tensor tensor);
+----
+
+* _tensor_ is the tensor object to be released.
+
+The _tensor_ reference count is decremented.
+
+The tensor object is deleted once the number of instances that are
+retained to tensor become zero and the tensor object is no longer
+needed by any enqueued or recorded commands that use _tensor_. Using
+this function to release a reference that was not obtained by creating
+the object or by calling *clRetainTensor* causes undefined behavior.
+
+*clReleaseTensor* returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if tensor is not a valid tensor object.
+
+// TODO: add clSetTensorObjectDestructorCallback?
+
+To return information about a tensor object, call the function
+
+[source,c]
+----
+cl_int clGetTensorInfo(
+  cl_tensor tensor,
+  cl_tensor_info param_name,
+  size_t param_value_size,
+  void* param_value,
+  size_t* param_value_size_ret);
+----
+
+* _tensor_ specifies the tensor object being queried.
+
+* _param_name_ specifies the information to query. The list of
+  supported param_name types and the information returned in
+  _param_value_ by clGetTensorInfo is described in the <<Tensor Object
+  Queries>> table.
+
+* _param_value_ is a pointer to memory where the appropriate result
+  being queried is returned. If _param_value_ is NULL, it is ignored.
+
+* _param_value_size_ is used to specify the size in bytes of memory
+  pointed to by _param_value_. This size must be ≥ size of return type
+  as described in the <<Tensor Object Queries>> table.
+
+* _param_value_size_ret_ returns the actual size in bytes of data
+  being queried by _param_name_. If _param_value_size_ret_ is NULL, it is
+  ignored.
+
+*clGetTensorInfo* returns CL_SUCCESS if the function is executed
+ succesfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_TENSOR if _tensor_ is not a valid tensor object.
+
+[#Tensor Object Quaries]
+.List of supported param_names by clGetTensorInfo
+[cols="2,1,2",stripes=odd]
+|===
+| CL_TENSOR_RANK  | size_t             | Return the tensor rank.
+| CL_TENSOR_SHAPE | size_t[]           | Return the tensor shape.
+| CL_TENSOR_DTYPE | cl_tensor_datatype | Return the tensor data type.
+
+| CL_TENSOR_BOUND_TO_BUFFER | cl_bool | Return true if the tensor is
+bound to a buffer.
+
+| CL_TENSOR_BUFFER | cl_mem a| If CL_TENSOR_BOUND_TO_BUFFER is true,
+return the buffer object the tensor is bound to. Otherwise,
+clGetTensorInfo call returns:
+
+* CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object.
+
+* CL_INVALID_PROPERTY otherwise.
+
+| CL_TENSOR_CONTEXT | cl_context | Return the context specified when
+  the tensor object is created.
+
+| CL_TENSOR_REFERENCE_COUNT | cl_uint | Return the tensor reference
+count.
+|===
+
+The following functions are for reading from a tensor to host memory /
+buffer object or to write to a tensor object from host memory / buffer
+object.
+
+[source,c]
+----
+cl_int clEnqueueImportFromTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+----
+
+[source,c]
+----
+cl_int clEnqueueExportToTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);
+----
+
+* _command_queue_ is a valid host command-queue in which the read /
+  write command will be queued. _command_queue_ and _tensor_ must be
+  created with the same OpenCL context.
+
+* _tensor_ refers to a valid tensor object which is bound to a buffer.
+
+* _blocking_command_ indicate if the read and write operations are
+  blocking or non-blocking (see below).
+
+* _tensor_origin_ defines the offset coordinates in _tensor_ for start of
+  the regions to read / write tensor data. The length of the array
+  must be at least rank the the _tensor_.
+
+* _mem_origin_ defines the offset coordinates in the memory region
+  pointed by _buffer_ or _host_ptr_ expressed in elements of _tensor_
+  data type. The length of the array must be at least rank the the
+  _tensor_.
+
+* _region_ defines the region being read or written expressed in in
+  elements of _tensor_ data type. The length of the array must be at
+  least rank the the _tensor_. If _region_ is NULL then _tensor_'s
+  shape will be used as the region.
+
+* _mem_pitch_ defines the length of each dimension in elements to be
+  used for the memory region of _buffer_ or _host_ptr_. The length of
+  the array must be at least the rank of _tensor_ minus one. if
+  _mem_pitch_ is NULL or _mem_pitch_[i] is zero, _mem_pitch_[i] is
+  computed as _region_[i + 1].
+
+* _buffer_ and _host_ptr_ refer to a valid buffer object / host
+  allocation where data is to be read into or to be written from.
+  Either the _buffer_ or _host_ptr_ can be non-NULL in which case the
+  non-NULL argument is used as the operand for the operation.
+
+* _event_wait_list_ and _num_events_in_wait_list_ specify events that
+  need to complete before this particular command can be executed. If
+  _event_wait_list_ is NULL, then this particular command does not
+  wait on any event to complete. If _event_wait_list_ is NULL,
+  _num_events_in_wait_list_ must be 0. If _event_wait_list_ is not
+  NULL, the list of events pointed to by _event_wait_list_ must be
+  valid and _num_events_in_wait_list_ must be greater than 0. The
+  events specified in _event_wait_list_ act as synchronization
+  points. The context associated with events in _event_wait_list_ and
+  _command_queue_ must be the same. The memory associated with
+  _event_wait_list_ can be reused or freed after the function returns.
+
+* _event_ returns an event object that identifies this read / write
+  command and can be used to query or queue a wait for this command to
+  complete. If _event_ is NULL or the enqueue is unsuccessful, no
+  event will be created and therefore it will not be possible to query
+  the status of this command or to wait for this command to
+  complete. If _event_wait_list_ and _event_ are not NULL, _event_
+  must not refer to an element of the _event_wait_list_ array.
+
+The *clEnqueueExportToTensor* function copies contents of the buffer
+object / host allocation to tensor's storage in
+implementation-defined, opaque memory layout. The
+*clEnqueueImportFromTensor* function copies data from tensor's
+storage to buffer object / host allocation.
+
+The elements of buffer object / host allocation are mapped to tensor
+coordinates and vice versa as follows in pseudo C code:
+
+[source,c]
+----
+tensor_element(
+  tensor_origin[0] + i[0],
+  tensor_origin[1] + i[1],
+  ...,
+  tensor_origin[N-2] + i[N-2],
+  tensor_origin[N-2] + i[N-1]) ==
+((TENSOR_DATATYPE *)buffer_or_host_ptr)[
+  (mem_origin[0] + i[0]) * pitch(0) +
+  (mem_origin[1] + i[1]) * pitch(1) +
+  ... +
+  (mem_origin[N-2] + i[N-2]) * pitch(N-2) +
+  (mem_origin[N-1] + i[N-1])];
+----
+
+Where the `N` is tensor rank, the `i[X]` is a tensor coordinate with
+inclusive range of `0..<region[X]-1>` and the `pitch` is computed as
+follows in pseudo C code:
+
+[source,c]
+----
+size_t pitch(size_t dim) {
+  size_t pitch = 1;
+  for (size_t i = dim; i < tensor_rank - 1; i++)
+    pitch *=
+      (mem_pitch != NULL || mem_pitch[i] == 0) ? mem_pitch[i] : region[i + 1];
+  return pitch;
+}
+----
+
+For `dim` in `0..(tensor_rank()-1)`. The `tensor_element()` represents
+an abstract function that accesses a tensor element in its storage at
+given coordinate. The method how the coordinates translate to tensor
+storage addresses is unspecified.
+
+*clEnqueueImportFromTensor* and *clEnqueueExportToTensor*
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_COMMAND_QUEUE if _command_queue_ is not a valid host
+  command-queue.
+
+* CL_INVALID_CONTEXT if the context associated with _command_queue_
+  and buffer are not the same or if the context associated with
+  _command_queue_ and events in _event_wait_list_ are not the same.
+
+* CL_INVALID_MEM_OBJECT if _buffer_ is not a valid buffer object.
+
+* CL_INVALID_VALUE if _tensor_origin_ or _mem_origin_ is NULL.
+
+* CL_INVALID_VALUE if the region being read or written specified by
+  (_mem_origin_, _region_, _mem_pitch_) is out of bounds.
+
+* CL_INVALID_VALUE if any _region_ array element is 0.
+
+* CL_INVALID_VALUE if _mem_pitch_ is not NULL and _mem_pitch_[i] is
+  not 0 and _mem_pitch_[i] is less than _region_[i].
+
+* CL_INVALID_VALUE if _buffer_ and _host_ptr_ both are NULL or non-NULL.
+
+* CL_INVALID_EVENT_WAIT_LIST if _event_wait_list_ is NULL and
+  _num_events_in_wait_list_ > 0, or _event_wait_list_ is not NULL and
+  _num_events_in_wait_list_ is 0, or if event objects in
+  _event_wait_list_ are not valid events.
+
+* CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+  operations are blocking and the execution status of any of the
+  events in _event_wait_list_ is a negative integer value.
+
+* CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+  memory for data store associated with memory object the _tensor_ is
+  bound to.
+
+* CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
+// TODO: add clEnqueueCopyTensor
+
+// TODO: add clEnqueueFillTensor?
+
+If *cl_khr_command_buffer* is supported, then the following command
+buffer counterparts of the *clEnqueueImportFromTensor* and
+*clEnqueueExportToTensor* commands are available.
+
+[source,c]
+----
+cl_int clCommandImportFromTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);
+----
+
+[source,c]
+----
+cl_int clCommandExportToTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);
+----
+
+* _command_buffer_ refers to valid command-buffer object.
+
+* For _command_queue_, _tensor_, _tensor_origin_, _mem_origin_,
+  _region_, _mem_pitch_, _buffer_ and _host_ptr_ parameters refer to
+  *clEnqueueImportFromTensor*.
+
+* For _num_sync_points_in_wait_list_, _sync_point_wait_list_,
+  _sync_point_, _mutable_handle_ parameters refer to
+  *clCommandCopyBufferKHR*.
+
+*clCommandImportFromTensorKHR* and *clCommandImportFromTensorKHR*
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:
+
+* CL_INVALID_COMMAND_QUEUE if _command_queue_ is not NULL.
+
+* CL_INVALID_COMMAND_BUFFER_KHR if _command_buffer_ is not a valid
+  command-buffer.
+
+* CL_INVALID_CONTEXT if the context associated with _command_queue_
+  and _command_buffer_ is not the same.
+
+* CL_INVALID_OPERATION if _command_buffer_ has been finalized.
+
+* CL_INVALID_VALUE if _mutable_handle_ is not NULL.
+
+* CL_INVALID_SYNC_POINT_WAIT_LIST_KHR if _sync_point_wait_list_ is
+  NULL and _num_sync_points_in_wait_list_ is > 0, or
+  _sync_point_wait_list_ is not NULL and _num_sync_points_in_wait_list_ is
+  0, or if synchronization-point objects in _sync_point_wait_list_ are
+  not valid synchronization-points.
+
+* CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+  required by the OpenCL implementation on the device.
+
+* CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+  required by the OpenCL implementation on the host.
+
+==== Add New Buffer Property in Section 5.2.1
+
+[cols="2,1,2",stripes=odd]
+|===
+| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool
+a| This property can be set if *cl_khr_command_buffer* extension is
+supported.
+
+NOTE: This property temporarily lives here and will be moved to
+a separate extension proposal.
+
+If the value is true, create a "temporary" buffer object that only can
+be used on commands recorded in command buffers. Non-recording
+command enqueue functions must return CL_INVALID_OPERATION if the
+command refers to a temporary buffer object.
+
+The temporary buffer objects are managed by command buffers. When a
+temporary buffer object is used by multiple command buffer, the object
+receives disjoint storage for each command buffer.
+
+// Consequently, Data may not be exchanged between command buffers through
+// temporary buffers.
+
+Storage of the temporary buffer objects may be allocated on-demand
+basis. At the times the buffer is not needed, OpenCL implementations
+may reuse storage for other tasks within the command buffer.
+
+Contents of the temporary buffers are not guaranteed to be preserved
+across command buffer executions.
+
+| CL_MEM_BIND_TO_TENSOR | cl_tensor a| Use the created buffer as
+storage for the given valid tensor. To succeed creating the buffer,
+the target tensor may not have storage already and _size_
+argument of the clCreateBufferWithProperties() must be zero.
+
+Size of the memory buffer is implementation-defined and it can be
+queried with clGetTensorInfo().
+
+Memory layout of the tensor in the created memory buffer is
+implementation-defined and opaque to the applications and it may
+change at unspecified points.  Implementation may use non-contiguous
+allocations to store the tensor data and implementation may store
+auxiliary data within the allocations.  Therefore, reading from or
+writing to the memory buffer directly using the cl_mem handle leads to
+undefined behavior.
+
+If the tensor is already bound to a buffer object,
+clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
+error code.
+|===
+
+==== Add New Memory Object Query in Section 5.5.5
+
+[cols="2,1,2",stripes=odd]
+|===
+| CL_MEM_COMMAND_BUFFER_TEMPORARY | cl_bool | This property can be
+queried if *cl_khr_command_buffer* extension is supported.
+
+Return true if the _memobj_ is temporary buffer object for command
+buffers.
+|===
+
+==== Add New Error Codes in Appendix F
+
+[cols="2,3", stripes=odd]
+|===
+| CL_TENSOR_BOUND_TO_BUFFER | Returned when attempting to bind a
+  buffer object to a tensor which already has been bound to the same
+  or another.
+| CL_INVALID_TENSOR | Returned then the specified tensor is not a
+  valid tensor object.
+|===
+
+=== Sample Codes
+
+Helper functions used in the follow up tensor code samples:
+
+[source,c]
+----
+cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical matmul kernel signature in pseudo OpenCL C for
+  // illustrative purposes:
+  //
+  //   kernel void matmul(global read_only tensor_t, global read_only tensor_t,
+  //                      global write_only tensor_t);
+
+  cl_kernel matmul_kernel = /* Omitted. */;
+  clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &out);
+  return matmul_kernel;
+}
+
+cl_kernel create_add_kernel(
+  cl_context ctx, std::span<cl_device_id> device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
+  // purposes:
+  //
+  // kernel void add(global read_only tensor_t, global read_only tensor_t,
+  //                 global write_only tensor_t);
+
+  cl_tensor add_kernel = /* Omitted. */;
+  clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &lhs);
+  clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &rhs);
+  clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &out);
+  return add_kernel;
+}
+----
+An example usage of tensors on a command queue:
+
+[source,c]
+----
+constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Allocate storage for the tensors. The buffer size must be set to
+// zero when the buffer is bound to a tensor. OpenCL implementation
+// may determine optimal data layout and the storage needed for it,
+// based on the tensor's uses (the 'matmul' and 'add' kernels in this
+// sample) so far.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
+  0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE,
+  0, nullptr, &err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY,
+  0, nullptr, &err);
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+// Copies data into in0 tensor while possibly rearranging the data to the
+// optimal data layout.
+clEnqueueExportToTensor(
+  cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+clEnqueueExportToTensor(
+  cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
+  nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueImportFromTensor(
+  cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
+  nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);
+----
+
+An example use of tensors in a command buffer when cl_khr_command_buffer
+extension is supported:
+
+[source,c]
+----
+constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Bind command buffer managed storage to tensors.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in0, 0},
+  CL_MEM_READ_ONLY, 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */,
+  nullptr, &err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in1, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in2, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, t0, 0},
+  CL_MEM_READ_WRITE, 0, nullptr, &err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, out, 0},
+  CL_MEM_WRITE_ONLY, 0, nullptr, &err);
+
+std::vector<float> in0_data = ...;
+std::vector<float> in1_data = ...;
+std::vector<float> out_data(b * m * n);
+
+cl_command_buffer_khr cb =
+  clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &err);
+
+cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
+clCommandExportToTensorKHR(
+  cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, in0_data.data(), 0, nullptr, &in0_syncp);
+clCommandExportToTensorKHR(
+  cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, in1_data.data(), 0, nullptr, &in1_syncp);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
+  2, {in0_syncp, in2_syncp}, &matmul_syncp, nullptr);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
+  1, {matmul_syncp}, &add_syncp, nullptr);
+clCommandImportFromTensorKHR(
+  cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
+
+// Finalize the command buffer. At this point the OpenCL
+// implementation may reserve enough storage for all the tensor
+// temporaries. Temporary tensors might be eliminated - for example,
+// OpenCL implementation could use 'out' tensor to store result of
+// matmul_kernel , thus, eliminating the need of 't0' tensor.
+clFinalizeCommandBufferKHR(cmd_b);
+
+// Temporary tensors used in a command buffer can't be read or written
+// into. A hypothetical reason is that the finalized command buffer
+// might not use some of the tensor.
+assert(clEnqueueImportFromTensor(..., t0, ...) == CL_INVALID_OPERATION);
+----
+
+=== Open Questions ===
+
+. Should we have support for tensors with undefined shape and tensors
+  with unknown / symbolic dimension sizes like in ONNX?
++
+--
+// https://onnx.ai/onnx/repo-docs/ShapeInference.html
+*UNRESOLVED*
+--
+
+. Should we define OpenCL C language features for accessing tensors?
++
+--
+*RESOLVED*: OpenCL C support for tensors can be introduced later in a
+            separate extension. Built-in kernels may benefit from this
+            extension as it is.
+--
diff --git a/ext/cl_exp_tensor.html b/ext/cl_exp_tensor.html
new file mode 100644
index 000000000..ad5d348eb
--- /dev/null
+++ b/ext/cl_exp_tensor.html
@@ -0,0 +1,1599 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+<meta charset="UTF-8">
+<meta http-equiv="X-UA-Compatible" content="IE=edge">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<meta name="generator" content="Asciidoctor 2.0.16">
+<title>cl_exp_tensor</title>
+<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700">
+<style>
+/*! Asciidoctor default stylesheet | MIT License | https://asciidoctor.org */
+/* Uncomment the following line when using as a custom stylesheet */
+/* @import "https://fonts.googleapis.com/css?family=Open+Sans:300,300italic,400,400italic,600,600italic%7CNoto+Serif:400,400italic,700,700italic%7CDroid+Sans+Mono:400,700"; */
+html{font-family:sans-serif;-webkit-text-size-adjust:100%}
+a{background:none}
+a:focus{outline:thin dotted}
+a:active,a:hover{outline:0}
+h1{font-size:2em;margin:.67em 0}
+b,strong{font-weight:bold}
+abbr{font-size:.9em}
+abbr[title]{cursor:help;border-bottom:1px dotted #dddddf;text-decoration:none}
+dfn{font-style:italic}
+hr{height:0}
+mark{background:#ff0;color:#000}
+code,kbd,pre,samp{font-family:monospace;font-size:1em}
+pre{white-space:pre-wrap}
+q{quotes:"\201C" "\201D" "\2018" "\2019"}
+small{font-size:80%}
+sub,sup{font-size:75%;line-height:0;position:relative;vertical-align:baseline}
+sup{top:-.5em}
+sub{bottom:-.25em}
+img{border:0}
+svg:not(:root){overflow:hidden}
+figure{margin:0}
+audio,video{display:inline-block}
+audio:not([controls]){display:none;height:0}
+fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}
+legend{border:0;padding:0}
+button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}
+button,input{line-height:normal}
+button,select{text-transform:none}
+button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}
+button[disabled],html input[disabled]{cursor:default}
+input[type=checkbox],input[type=radio]{padding:0}
+button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}
+textarea{overflow:auto;vertical-align:top}
+table{border-collapse:collapse;border-spacing:0}
+*,::before,::after{box-sizing:border-box}
+html,body{font-size:100%}
+body{background:#fff;color:rgba(0,0,0,.8);padding:0;margin:0;font-family:"Noto Serif","DejaVu Serif",serif;line-height:1;position:relative;cursor:auto;-moz-tab-size:4;-o-tab-size:4;tab-size:4;word-wrap:anywhere;-moz-osx-font-smoothing:grayscale;-webkit-font-smoothing:antialiased}
+a:hover{cursor:pointer}
+img,object,embed{max-width:100%;height:auto}
+object,embed{height:100%}
+img{-ms-interpolation-mode:bicubic}
+.left{float:left!important}
+.right{float:right!important}
+.text-left{text-align:left!important}
+.text-right{text-align:right!important}
+.text-center{text-align:center!important}
+.text-justify{text-align:justify!important}
+.hide{display:none}
+img,object,svg{display:inline-block;vertical-align:middle}
+textarea{height:auto;min-height:50px}
+select{width:100%}
+.subheader,.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{line-height:1.45;color:#7a2518;font-weight:400;margin-top:0;margin-bottom:.25em}
+div,dl,dt,dd,ul,ol,li,h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6,pre,form,p,blockquote,th,td{margin:0;padding:0}
+a{color:#2156a5;text-decoration:underline;line-height:inherit}
+a:hover,a:focus{color:#1d4b8f}
+a img{border:0}
+p{line-height:1.6;margin-bottom:1.25em;text-rendering:optimizeLegibility}
+p aside{font-size:.875em;line-height:1.35;font-style:italic}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{font-family:"Open Sans","DejaVu Sans",sans-serif;font-weight:300;font-style:normal;color:#ba3925;text-rendering:optimizeLegibility;margin-top:1em;margin-bottom:.5em;line-height:1.0125em}
+h1 small,h2 small,h3 small,#toctitle small,.sidebarblock>.content>.title small,h4 small,h5 small,h6 small{font-size:60%;color:#e99b8f;line-height:0}
+h1{font-size:2.125em}
+h2{font-size:1.6875em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.375em}
+h4,h5{font-size:1.125em}
+h6{font-size:1em}
+hr{border:solid #dddddf;border-width:1px 0 0;clear:both;margin:1.25em 0 1.1875em}
+em,i{font-style:italic;line-height:inherit}
+strong,b{font-weight:bold;line-height:inherit}
+small{font-size:60%;line-height:inherit}
+code{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;font-weight:400;color:rgba(0,0,0,.9)}
+ul,ol,dl{line-height:1.6;margin-bottom:1.25em;list-style-position:outside;font-family:inherit}
+ul,ol{margin-left:1.5em}
+ul li ul,ul li ol{margin-left:1.25em;margin-bottom:0}
+ul.square li ul,ul.circle li ul,ul.disc li ul{list-style:inherit}
+ul.square{list-style-type:square}
+ul.circle{list-style-type:circle}
+ul.disc{list-style-type:disc}
+ol li ul,ol li ol{margin-left:1.25em;margin-bottom:0}
+dl dt{margin-bottom:.3125em;font-weight:bold}
+dl dd{margin-bottom:1.25em}
+blockquote{margin:0 0 1.25em;padding:.5625em 1.25em 0 1.1875em;border-left:1px solid #ddd}
+blockquote,blockquote p{line-height:1.6;color:rgba(0,0,0,.85)}
+@media screen and (min-width:768px){h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2}
+h1{font-size:2.75em}
+h2{font-size:2.3125em}
+h3,#toctitle,.sidebarblock>.content>.title{font-size:1.6875em}
+h4{font-size:1.4375em}}
+table{background:#fff;margin-bottom:1.25em;border:1px solid #dedede;word-wrap:normal}
+table thead,table tfoot{background:#f7f8f7}
+table thead tr th,table thead tr td,table tfoot tr th,table tfoot tr td{padding:.5em .625em .625em;font-size:inherit;color:rgba(0,0,0,.8);text-align:left}
+table tr th,table tr td{padding:.5625em .625em;font-size:inherit;color:rgba(0,0,0,.8)}
+table tr.even,table tr.alt{background:#f8f8f7}
+table thead tr th,table tfoot tr th,table tbody tr td,table tr td,table tfoot tr td{line-height:1.6}
+h1,h2,h3,#toctitle,.sidebarblock>.content>.title,h4,h5,h6{line-height:1.2;word-spacing:-.05em}
+h1 strong,h2 strong,h3 strong,#toctitle strong,.sidebarblock>.content>.title strong,h4 strong,h5 strong,h6 strong{font-weight:400}
+.center{margin-left:auto;margin-right:auto}
+.stretch{width:100%}
+.clearfix::before,.clearfix::after,.float-group::before,.float-group::after{content:" ";display:table}
+.clearfix::after,.float-group::after{clear:both}
+:not(pre).nobreak{word-wrap:normal}
+:not(pre).nowrap{white-space:nowrap}
+:not(pre).pre-wrap{white-space:pre-wrap}
+:not(pre):not([class^=L])>code{font-size:.9375em;font-style:normal!important;letter-spacing:0;padding:.1em .5ex;word-spacing:-.15em;background:#f7f7f8;border-radius:4px;line-height:1.45;text-rendering:optimizeSpeed}
+pre{color:rgba(0,0,0,.9);font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;line-height:1.45;text-rendering:optimizeSpeed}
+pre code,pre pre{color:inherit;font-size:inherit;line-height:inherit}
+pre>code{display:block}
+pre.nowrap,pre.nowrap pre{white-space:pre;word-wrap:normal}
+em em{font-style:normal}
+strong strong{font-weight:400}
+.keyseq{color:rgba(51,51,51,.8)}
+kbd{font-family:"Droid Sans Mono","DejaVu Sans Mono",monospace;display:inline-block;color:rgba(0,0,0,.8);font-size:.65em;line-height:1.45;background:#f7f7f7;border:1px solid #ccc;border-radius:3px;box-shadow:0 1px 0 rgba(0,0,0,.2),inset 0 0 0 .1em #fff;margin:0 .15em;padding:.2em .5em;vertical-align:middle;position:relative;top:-.1em;white-space:nowrap}
+.keyseq kbd:first-child{margin-left:0}
+.keyseq kbd:last-child{margin-right:0}
+.menuseq,.menuref{color:#000}
+.menuseq b:not(.caret),.menuref{font-weight:inherit}
+.menuseq{word-spacing:-.02em}
+.menuseq b.caret{font-size:1.25em;line-height:.8}
+.menuseq i.caret{font-weight:bold;text-align:center;width:.45em}
+b.button::before,b.button::after{position:relative;top:-1px;font-weight:400}
+b.button::before{content:"[";padding:0 3px 0 2px}
+b.button::after{content:"]";padding:0 2px 0 3px}
+p a>code:hover{color:rgba(0,0,0,.9)}
+#header,#content,#footnotes,#footer{width:100%;margin:0 auto;max-width:62.5em;*zoom:1;position:relative;padding-left:.9375em;padding-right:.9375em}
+#header::before,#header::after,#content::before,#content::after,#footnotes::before,#footnotes::after,#footer::before,#footer::after{content:" ";display:table}
+#header::after,#content::after,#footnotes::after,#footer::after{clear:both}
+#content{margin-top:1.25em}
+#content::before{content:none}
+#header>h1:first-child{color:rgba(0,0,0,.85);margin-top:2.25rem;margin-bottom:0}
+#header>h1:first-child+#toc{margin-top:8px;border-top:1px solid #dddddf}
+#header>h1:only-child,body.toc2 #header>h1:nth-last-child(2){border-bottom:1px solid #dddddf;padding-bottom:8px}
+#header .details{border-bottom:1px solid #dddddf;line-height:1.45;padding-top:.25em;padding-bottom:.25em;padding-left:.25em;color:rgba(0,0,0,.6);display:flex;flex-flow:row wrap}
+#header .details span:first-child{margin-left:-.125em}
+#header .details span.email a{color:rgba(0,0,0,.85)}
+#header .details br{display:none}
+#header .details br+span::before{content:"\00a0\2013\00a0"}
+#header .details br+span.author::before{content:"\00a0\22c5\00a0";color:rgba(0,0,0,.85)}
+#header .details br+span#revremark::before{content:"\00a0|\00a0"}
+#header #revnumber{text-transform:capitalize}
+#header #revnumber::after{content:"\00a0"}
+#content>h1:first-child:not([class]){color:rgba(0,0,0,.85);border-bottom:1px solid #dddddf;padding-bottom:8px;margin-top:0;padding-top:1rem;margin-bottom:1.25rem}
+#toc{border-bottom:1px solid #e7e7e9;padding-bottom:.5em}
+#toc>ul{margin-left:.125em}
+#toc ul.sectlevel0>li>a{font-style:italic}
+#toc ul.sectlevel0 ul.sectlevel1{margin:.5em 0}
+#toc ul{font-family:"Open Sans","DejaVu Sans",sans-serif;list-style-type:none}
+#toc li{line-height:1.3334;margin-top:.3334em}
+#toc a{text-decoration:none}
+#toc a:active{text-decoration:underline}
+#toctitle{color:#7a2518;font-size:1.2em}
+@media screen and (min-width:768px){#toctitle{font-size:1.375em}
+body.toc2{padding-left:15em;padding-right:0}
+#toc.toc2{margin-top:0!important;background:#f8f8f7;position:fixed;width:15em;left:0;top:0;border-right:1px solid #e7e7e9;border-top-width:0!important;border-bottom-width:0!important;z-index:1000;padding:1.25em 1em;height:100%;overflow:auto}
+#toc.toc2 #toctitle{margin-top:0;margin-bottom:.8rem;font-size:1.2em}
+#toc.toc2>ul{font-size:.9em;margin-bottom:0}
+#toc.toc2 ul ul{margin-left:0;padding-left:1em}
+#toc.toc2 ul.sectlevel0 ul.sectlevel1{padding-left:0;margin-top:.5em;margin-bottom:.5em}
+body.toc2.toc-right{padding-left:0;padding-right:15em}
+body.toc2.toc-right #toc.toc2{border-right-width:0;border-left:1px solid #e7e7e9;left:auto;right:0}}
+@media screen and (min-width:1280px){body.toc2{padding-left:20em;padding-right:0}
+#toc.toc2{width:20em}
+#toc.toc2 #toctitle{font-size:1.375em}
+#toc.toc2>ul{font-size:.95em}
+#toc.toc2 ul ul{padding-left:1.25em}
+body.toc2.toc-right{padding-left:0;padding-right:20em}}
+#content #toc{border:1px solid #e0e0dc;margin-bottom:1.25em;padding:1.25em;background:#f8f8f7;border-radius:4px}
+#content #toc>:first-child{margin-top:0}
+#content #toc>:last-child{margin-bottom:0}
+#footer{max-width:none;background:rgba(0,0,0,.8);padding:1.25em}
+#footer-text{color:hsla(0,0%,100%,.8);line-height:1.44}
+#content{margin-bottom:.625em}
+.sect1{padding-bottom:.625em}
+@media screen and (min-width:768px){#content{margin-bottom:1.25em}
+.sect1{padding-bottom:1.25em}}
+.sect1:last-child{padding-bottom:0}
+.sect1+.sect1{border-top:1px solid #e7e7e9}
+#content h1>a.anchor,h2>a.anchor,h3>a.anchor,#toctitle>a.anchor,.sidebarblock>.content>.title>a.anchor,h4>a.anchor,h5>a.anchor,h6>a.anchor{position:absolute;z-index:1001;width:1.5ex;margin-left:-1.5ex;display:block;text-decoration:none!important;visibility:hidden;text-align:center;font-weight:400}
+#content h1>a.anchor::before,h2>a.anchor::before,h3>a.anchor::before,#toctitle>a.anchor::before,.sidebarblock>.content>.title>a.anchor::before,h4>a.anchor::before,h5>a.anchor::before,h6>a.anchor::before{content:"\00A7";font-size:.85em;display:block;padding-top:.1em}
+#content h1:hover>a.anchor,#content h1>a.anchor:hover,h2:hover>a.anchor,h2>a.anchor:hover,h3:hover>a.anchor,#toctitle:hover>a.anchor,.sidebarblock>.content>.title:hover>a.anchor,h3>a.anchor:hover,#toctitle>a.anchor:hover,.sidebarblock>.content>.title>a.anchor:hover,h4:hover>a.anchor,h4>a.anchor:hover,h5:hover>a.anchor,h5>a.anchor:hover,h6:hover>a.anchor,h6>a.anchor:hover{visibility:visible}
+#content h1>a.link,h2>a.link,h3>a.link,#toctitle>a.link,.sidebarblock>.content>.title>a.link,h4>a.link,h5>a.link,h6>a.link{color:#ba3925;text-decoration:none}
+#content h1>a.link:hover,h2>a.link:hover,h3>a.link:hover,#toctitle>a.link:hover,.sidebarblock>.content>.title>a.link:hover,h4>a.link:hover,h5>a.link:hover,h6>a.link:hover{color:#a53221}
+details,.audioblock,.imageblock,.literalblock,.listingblock,.stemblock,.videoblock{margin-bottom:1.25em}
+details{margin-left:1.25rem}
+details>summary{cursor:pointer;display:block;position:relative;line-height:1.6;margin-bottom:.625rem;-webkit-tap-highlight-color:transparent}
+details>summary::before{content:"";border:solid transparent;border-left:solid;border-width:.3em 0 .3em .5em;position:absolute;top:.5em;left:-1.25rem;transform:translateX(15%)}
+details[open]>summary::before{border:solid transparent;border-top:solid;border-width:.5em .3em 0;transform:translateY(15%)}
+details>summary::after{content:"";width:1.25rem;height:1em;position:absolute;top:.3em;left:-1.25rem}
+.admonitionblock td.content>.title,.audioblock>.title,.exampleblock>.title,.imageblock>.title,.listingblock>.title,.literalblock>.title,.stemblock>.title,.openblock>.title,.paragraph>.title,.quoteblock>.title,table.tableblock>.title,.verseblock>.title,.videoblock>.title,.dlist>.title,.olist>.title,.ulist>.title,.qlist>.title,.hdlist>.title{text-rendering:optimizeLegibility;text-align:left;font-family:"Noto Serif","DejaVu Serif",serif;font-size:1rem;font-style:italic}
+table.tableblock.fit-content>caption.title{white-space:nowrap;width:0}
+.paragraph.lead>p,#preamble>.sectionbody>[class=paragraph]:first-of-type p{font-size:1.21875em;line-height:1.6;color:rgba(0,0,0,.85)}
+.admonitionblock>table{border-collapse:separate;border:0;background:none;width:100%}
+.admonitionblock>table td.icon{text-align:center;width:80px}
+.admonitionblock>table td.icon img{max-width:none}
+.admonitionblock>table td.icon .title{font-weight:bold;font-family:"Open Sans","DejaVu Sans",sans-serif;text-transform:uppercase}
+.admonitionblock>table td.content{padding-left:1.125em;padding-right:1.25em;border-left:1px solid #dddddf;color:rgba(0,0,0,.6);word-wrap:anywhere}
+.admonitionblock>table td.content>:last-child>:last-child{margin-bottom:0}
+.exampleblock>.content{border:1px solid #e6e6e6;margin-bottom:1.25em;padding:1.25em;background:#fff;border-radius:4px}
+.exampleblock>.content>:first-child{margin-top:0}
+.exampleblock>.content>:last-child{margin-bottom:0}
+.sidebarblock{border:1px solid #dbdbd6;margin-bottom:1.25em;padding:1.25em;background:#f3f3f2;border-radius:4px}
+.sidebarblock>:first-child{margin-top:0}
+.sidebarblock>:last-child{margin-bottom:0}
+.sidebarblock>.content>.title{color:#7a2518;margin-top:0;text-align:center}
+.exampleblock>.content>:last-child>:last-child,.exampleblock>.content .olist>ol>li:last-child>:last-child,.exampleblock>.content .ulist>ul>li:last-child>:last-child,.exampleblock>.content .qlist>ol>li:last-child>:last-child,.sidebarblock>.content>:last-child>:last-child,.sidebarblock>.content .olist>ol>li:last-child>:last-child,.sidebarblock>.content .ulist>ul>li:last-child>:last-child,.sidebarblock>.content .qlist>ol>li:last-child>:last-child{margin-bottom:0}
+.literalblock pre,.listingblock>.content>pre{border-radius:4px;overflow-x:auto;padding:1em;font-size:.8125em}
+@media screen and (min-width:768px){.literalblock pre,.listingblock>.content>pre{font-size:.90625em}}
+@media screen and (min-width:1280px){.literalblock pre,.listingblock>.content>pre{font-size:1em}}
+.literalblock pre,.listingblock>.content>pre:not(.highlight),.listingblock>.content>pre[class=highlight],.listingblock>.content>pre[class^="highlight "]{background:#f7f7f8}
+.literalblock.output pre{color:#f7f7f8;background:rgba(0,0,0,.9)}
+.listingblock>.content{position:relative}
+.listingblock code[data-lang]::before{display:none;content:attr(data-lang);position:absolute;font-size:.75em;top:.425rem;right:.5rem;line-height:1;text-transform:uppercase;color:inherit;opacity:.5}
+.listingblock:hover code[data-lang]::before{display:block}
+.listingblock.terminal pre .command::before{content:attr(data-prompt);padding-right:.5em;color:inherit;opacity:.5}
+.listingblock.terminal pre .command:not([data-prompt])::before{content:"$"}
+.listingblock pre.highlightjs{padding:0}
+.listingblock pre.highlightjs>code{padding:1em;border-radius:4px}
+.listingblock pre.prettyprint{border-width:0}
+.prettyprint{background:#f7f7f8}
+pre.prettyprint .linenums{line-height:1.45;margin-left:2em}
+pre.prettyprint li{background:none;list-style-type:inherit;padding-left:0}
+pre.prettyprint li code[data-lang]::before{opacity:1}
+pre.prettyprint li:not(:first-child) code[data-lang]::before{display:none}
+table.linenotable{border-collapse:separate;border:0;margin-bottom:0;background:none}
+table.linenotable td[class]{color:inherit;vertical-align:top;padding:0;line-height:inherit;white-space:normal}
+table.linenotable td.code{padding-left:.75em}
+table.linenotable td.linenos{border-right:1px solid;opacity:.35;padding-right:.5em}
+pre.pygments .lineno{border-right:1px solid;opacity:.35;display:inline-block;margin-right:.75em}
+pre.pygments .lineno::before{content:"";margin-right:-.125em}
+.quoteblock{margin:0 1em 1.25em 1.5em;display:table}
+.quoteblock:not(.excerpt)>.title{margin-left:-1.5em;margin-bottom:.75em}
+.quoteblock blockquote,.quoteblock p{color:rgba(0,0,0,.85);font-size:1.15rem;line-height:1.75;word-spacing:.1em;letter-spacing:0;font-style:italic;text-align:justify}
+.quoteblock blockquote{margin:0;padding:0;border:0}
+.quoteblock blockquote::before{content:"\201c";float:left;font-size:2.75em;font-weight:bold;line-height:.6em;margin-left:-.6em;color:#7a2518;text-shadow:0 1px 2px rgba(0,0,0,.1)}
+.quoteblock blockquote>.paragraph:last-child p{margin-bottom:0}
+.quoteblock .attribution{margin-top:.75em;margin-right:.5ex;text-align:right}
+.verseblock{margin:0 1em 1.25em}
+.verseblock pre{font-family:"Open Sans","DejaVu Sans",sans-serif;font-size:1.15rem;color:rgba(0,0,0,.85);font-weight:300;text-rendering:optimizeLegibility}
+.verseblock pre strong{font-weight:400}
+.verseblock .attribution{margin-top:1.25rem;margin-left:.5ex}
+.quoteblock .attribution,.verseblock .attribution{font-size:.9375em;line-height:1.45;font-style:italic}
+.quoteblock .attribution br,.verseblock .attribution br{display:none}
+.quoteblock .attribution cite,.verseblock .attribution cite{display:block;letter-spacing:-.025em;color:rgba(0,0,0,.6)}
+.quoteblock.abstract blockquote::before,.quoteblock.excerpt blockquote::before,.quoteblock .quoteblock blockquote::before{display:none}
+.quoteblock.abstract blockquote,.quoteblock.abstract p,.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{line-height:1.6;word-spacing:0}
+.quoteblock.abstract{margin:0 1em 1.25em;display:block}
+.quoteblock.abstract>.title{margin:0 0 .375em;font-size:1.15em;text-align:center}
+.quoteblock.excerpt>blockquote,.quoteblock .quoteblock{padding:0 0 .25em 1em;border-left:.25em solid #dddddf}
+.quoteblock.excerpt,.quoteblock .quoteblock{margin-left:0}
+.quoteblock.excerpt blockquote,.quoteblock.excerpt p,.quoteblock .quoteblock blockquote,.quoteblock .quoteblock p{color:inherit;font-size:1.0625rem}
+.quoteblock.excerpt .attribution,.quoteblock .quoteblock .attribution{color:inherit;font-size:.85rem;text-align:left;margin-right:0}
+p.tableblock:last-child{margin-bottom:0}
+td.tableblock>.content{margin-bottom:1.25em;word-wrap:anywhere}
+td.tableblock>.content>:last-child{margin-bottom:-1.25em}
+table.tableblock,th.tableblock,td.tableblock{border:0 solid #dedede}
+table.grid-all>*>tr>*{border-width:1px}
+table.grid-cols>*>tr>*{border-width:0 1px}
+table.grid-rows>*>tr>*{border-width:1px 0}
+table.frame-all{border-width:1px}
+table.frame-ends{border-width:1px 0}
+table.frame-sides{border-width:0 1px}
+table.frame-none>colgroup+*>:first-child>*,table.frame-sides>colgroup+*>:first-child>*{border-top-width:0}
+table.frame-none>:last-child>:last-child>*,table.frame-sides>:last-child>:last-child>*{border-bottom-width:0}
+table.frame-none>*>tr>:first-child,table.frame-ends>*>tr>:first-child{border-left-width:0}
+table.frame-none>*>tr>:last-child,table.frame-ends>*>tr>:last-child{border-right-width:0}
+table.stripes-all tr,table.stripes-odd tr:nth-of-type(odd),table.stripes-even tr:nth-of-type(even),table.stripes-hover tr:hover{background:#f8f8f7}
+th.halign-left,td.halign-left{text-align:left}
+th.halign-right,td.halign-right{text-align:right}
+th.halign-center,td.halign-center{text-align:center}
+th.valign-top,td.valign-top{vertical-align:top}
+th.valign-bottom,td.valign-bottom{vertical-align:bottom}
+th.valign-middle,td.valign-middle{vertical-align:middle}
+table thead th,table tfoot th{font-weight:bold}
+tbody tr th{background:#f7f8f7}
+tbody tr th,tbody tr th p,tfoot tr th,tfoot tr th p{color:rgba(0,0,0,.8);font-weight:bold}
+p.tableblock>code:only-child{background:none;padding:0}
+p.tableblock{font-size:1em}
+ol{margin-left:1.75em}
+ul li ol{margin-left:1.5em}
+dl dd{margin-left:1.125em}
+dl dd:last-child,dl dd:last-child>:last-child{margin-bottom:0}
+ol>li p,ul>li p,ul dd,ol dd,.olist .olist,.ulist .ulist,.ulist .olist,.olist .ulist{margin-bottom:.625em}
+ul.checklist,ul.none,ol.none,ul.no-bullet,ol.no-bullet,ol.unnumbered,ul.unstyled,ol.unstyled{list-style-type:none}
+ul.no-bullet,ol.no-bullet,ol.unnumbered{margin-left:.625em}
+ul.unstyled,ol.unstyled{margin-left:0}
+ul.checklist>li>p:first-child{margin-left:-1em}
+ul.checklist>li>p:first-child>.fa-square-o:first-child,ul.checklist>li>p:first-child>.fa-check-square-o:first-child{width:1.25em;font-size:.8em;position:relative;bottom:.125em}
+ul.checklist>li>p:first-child>input[type=checkbox]:first-child{margin-right:.25em}
+ul.inline{display:flex;flex-flow:row wrap;list-style:none;margin:0 0 .625em -1.25em}
+ul.inline>li{margin-left:1.25em}
+.unstyled dl dt{font-weight:400;font-style:normal}
+ol.arabic{list-style-type:decimal}
+ol.decimal{list-style-type:decimal-leading-zero}
+ol.loweralpha{list-style-type:lower-alpha}
+ol.upperalpha{list-style-type:upper-alpha}
+ol.lowerroman{list-style-type:lower-roman}
+ol.upperroman{list-style-type:upper-roman}
+ol.lowergreek{list-style-type:lower-greek}
+.hdlist>table,.colist>table{border:0;background:none}
+.hdlist>table>tbody>tr,.colist>table>tbody>tr{background:none}
+td.hdlist1,td.hdlist2{vertical-align:top;padding:0 .625em}
+td.hdlist1{font-weight:bold;padding-bottom:1.25em}
+td.hdlist2{word-wrap:anywhere}
+.literalblock+.colist,.listingblock+.colist{margin-top:-.5em}
+.colist td:not([class]):first-child{padding:.4em .75em 0;line-height:1;vertical-align:top}
+.colist td:not([class]):first-child img{max-width:none}
+.colist td:not([class]):last-child{padding:.25em 0}
+.thumb,.th{line-height:0;display:inline-block;border:4px solid #fff;box-shadow:0 0 0 1px #ddd}
+.imageblock.left{margin:.25em .625em 1.25em 0}
+.imageblock.right{margin:.25em 0 1.25em .625em}
+.imageblock>.title{margin-bottom:0}
+.imageblock.thumb,.imageblock.th{border-width:6px}
+.imageblock.thumb>.title,.imageblock.th>.title{padding:0 .125em}
+.image.left,.image.right{margin-top:.25em;margin-bottom:.25em;display:inline-block;line-height:0}
+.image.left{margin-right:.625em}
+.image.right{margin-left:.625em}
+a.image{text-decoration:none;display:inline-block}
+a.image object{pointer-events:none}
+sup.footnote,sup.footnoteref{font-size:.875em;position:static;vertical-align:super}
+sup.footnote a,sup.footnoteref a{text-decoration:none}
+sup.footnote a:active,sup.footnoteref a:active{text-decoration:underline}
+#footnotes{padding-top:.75em;padding-bottom:.75em;margin-bottom:.625em}
+#footnotes hr{width:20%;min-width:6.25em;margin:-.25em 0 .75em;border-width:1px 0 0}
+#footnotes .footnote{padding:0 .375em 0 .225em;line-height:1.3334;font-size:.875em;margin-left:1.2em;margin-bottom:.2em}
+#footnotes .footnote a:first-of-type{font-weight:bold;text-decoration:none;margin-left:-1.05em}
+#footnotes .footnote:last-of-type{margin-bottom:0}
+#content #footnotes{margin-top:-.625em;margin-bottom:0;padding:.75em 0}
+.gist .file-data>table{border:0;background:#fff;width:100%;margin-bottom:0}
+.gist .file-data>table td.line-data{width:99%}
+div.unbreakable{page-break-inside:avoid}
+.big{font-size:larger}
+.small{font-size:smaller}
+.underline{text-decoration:underline}
+.overline{text-decoration:overline}
+.line-through{text-decoration:line-through}
+.aqua{color:#00bfbf}
+.aqua-background{background:#00fafa}
+.black{color:#000}
+.black-background{background:#000}
+.blue{color:#0000bf}
+.blue-background{background:#0000fa}
+.fuchsia{color:#bf00bf}
+.fuchsia-background{background:#fa00fa}
+.gray{color:#606060}
+.gray-background{background:#7d7d7d}
+.green{color:#006000}
+.green-background{background:#007d00}
+.lime{color:#00bf00}
+.lime-background{background:#00fa00}
+.maroon{color:#600000}
+.maroon-background{background:#7d0000}
+.navy{color:#000060}
+.navy-background{background:#00007d}
+.olive{color:#606000}
+.olive-background{background:#7d7d00}
+.purple{color:#600060}
+.purple-background{background:#7d007d}
+.red{color:#bf0000}
+.red-background{background:#fa0000}
+.silver{color:#909090}
+.silver-background{background:#bcbcbc}
+.teal{color:#006060}
+.teal-background{background:#007d7d}
+.white{color:#bfbfbf}
+.white-background{background:#fafafa}
+.yellow{color:#bfbf00}
+.yellow-background{background:#fafa00}
+span.icon>.fa{cursor:default}
+a span.icon>.fa{cursor:inherit}
+.admonitionblock td.icon [class^="fa icon-"]{font-size:2.5em;text-shadow:1px 1px 2px rgba(0,0,0,.5);cursor:default}
+.admonitionblock td.icon .icon-note::before{content:"\f05a";color:#19407c}
+.admonitionblock td.icon .icon-tip::before{content:"\f0eb";text-shadow:1px 1px 2px rgba(155,155,0,.8);color:#111}
+.admonitionblock td.icon .icon-warning::before{content:"\f071";color:#bf6900}
+.admonitionblock td.icon .icon-caution::before{content:"\f06d";color:#bf3400}
+.admonitionblock td.icon .icon-important::before{content:"\f06a";color:#bf0000}
+.conum[data-value]{display:inline-block;color:#fff!important;background:rgba(0,0,0,.8);border-radius:50%;text-align:center;font-size:.75em;width:1.67em;height:1.67em;line-height:1.67em;font-family:"Open Sans","DejaVu Sans",sans-serif;font-style:normal;font-weight:bold}
+.conum[data-value] *{color:#fff!important}
+.conum[data-value]+b{display:none}
+.conum[data-value]::after{content:attr(data-value)}
+pre .conum[data-value]{position:relative;top:-.125em}
+b.conum *{color:inherit!important}
+.conum:not([data-value]):empty{display:none}
+dt,th.tableblock,td.content,div.footnote{text-rendering:optimizeLegibility}
+h1,h2,p,td.content,span.alt,summary{letter-spacing:-.01em}
+p strong,td.content strong,div.footnote strong{letter-spacing:-.005em}
+p,blockquote,dt,td.content,span.alt,summary{font-size:1.0625rem}
+p{margin-bottom:1.25rem}
+.sidebarblock p,.sidebarblock dt,.sidebarblock td.content,p.tableblock{font-size:1em}
+.exampleblock>.content{background:#fffef7;border-color:#e0e0dc;box-shadow:0 1px 4px #e0e0dc}
+.print-only{display:none!important}
+@page{margin:1.25cm .75cm}
+@media print{*{box-shadow:none!important;text-shadow:none!important}
+html{font-size:80%}
+a{color:inherit!important;text-decoration:underline!important}
+a.bare,a[href^="#"],a[href^="mailto:"]{text-decoration:none!important}
+a[href^="http:"]:not(.bare)::after,a[href^="https:"]:not(.bare)::after{content:"(" attr(href) ")";display:inline-block;font-size:.875em;padding-left:.25em}
+abbr[title]{border-bottom:1px dotted}
+abbr[title]::after{content:" (" attr(title) ")"}
+pre,blockquote,tr,img,object,svg{page-break-inside:avoid}
+thead{display:table-header-group}
+svg{max-width:100%}
+p,blockquote,dt,td.content{font-size:1em;orphans:3;widows:3}
+h2,h3,#toctitle,.sidebarblock>.content>.title{page-break-after:avoid}
+#header,#content,#footnotes,#footer{max-width:none}
+#toc,.sidebarblock,.exampleblock>.content{background:none!important}
+#toc{border-bottom:1px solid #dddddf!important;padding-bottom:0!important}
+body.book #header{text-align:center}
+body.book #header>h1:first-child{border:0!important;margin:2.5em 0 1em}
+body.book #header .details{border:0!important;display:block;padding:0!important}
+body.book #header .details span:first-child{margin-left:0!important}
+body.book #header .details br{display:block}
+body.book #header .details br+span::before{content:none!important}
+body.book #toc{border:0!important;text-align:left!important;padding:0!important;margin:0!important}
+body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-break-before:always}
+.listingblock code[data-lang]::before{display:block}
+#footer{padding:0 .9375em}
+.hide-on-print{display:none!important}
+.print-only{display:block!important}
+.hide-for-print{display:none!important}
+.show-for-print{display:inherit!important}}
+@media amzn-kf8,print{#header>h1:first-child{margin-top:1.25rem}
+.sect1{padding:0!important}
+.sect1+.sect1{border:0}
+#footer{background:none}
+#footer-text{color:rgba(0,0,0,.6);font-size:.9em}}
+@media amzn-kf8{#header,#content,#footnotes,#footer{padding:0}}
+</style>
+</head>
+<body class="article">
+<div id="header">
+<h1>cl_exp_tensor</h1>
+</div>
+<div id="content">
+<div class="sect1">
+<h2 id="cl_exp_tensor">Tensor Data Type</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>This extension provides a new opaque OpenCL datatype called
+<code>cl_tensor</code>. It is used for storing N-dimensional tensor data in
+implementation-defined memory layout which may be optimized based on
+tensor&#8217;s use cases. The datatype is designed to be efficiently used
+within the <code>cl_khr_command_buffers</code> extension to capture task graphs
+which can utilize tensors as input, output and temporary storage.</p>
+</div>
+<div class="sect2">
+<h3 id="_general_information">General information</h3>
+<div class="sect3">
+<h4 id="_name_strings">Name Strings</h4>
+<div class="paragraph">
+<p><code>cl_exp_tensor</code></p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_version_history">Version history</h4>
+<table class="tableblock frame-all grid-all stretch">
+<colgroup>
+<col style="width: 20%;">
+<col style="width: 20%;">
+<col style="width: 60%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Date</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Version</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2023-11-XX</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.1.0</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">First assigned version.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_dependencies">Dependencies</h4>
+<div class="paragraph">
+<p>This extension is written against the OpenCL Specification version 3.0.14.</p>
+</div>
+<div class="paragraph">
+<p>This extension requires OpenCL 1.2 or later.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_contributors">Contributors</h4>
+<div class="paragraph">
+<p>Henry Linjamäki, Intel.<br>
+Pekka Jääslkeläinen, Intel and Tampere University.<br>
+Ben Ashbaugh, Intel.<br></p>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_overview">Overview</h3>
+<div class="paragraph">
+<p>The new tensor object enables applications to describe N-dimensional
+arrays whose memory layout is opaque to applications. The goals
+of this extension are the following:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Enable implementations to have freedom of placement data of the tensors for
+improving performance of the kernels which use them. This extension
+is designed such it allows implementations to determine optimal
+memory layouts for the tensors based on their use cases for
+increased performance, by means of, for example, analyzing kernels’ access
+patterns or, in case of built-in kernels, by inspecting the tensor
+arguments they operate on.</p>
+</li>
+<li>
+<p>Reduce details and boilerplate needed for performance portable implementation of
+applications by being less dependent on platform or device specifics
+on the memory layout / data arrangements which matters for
+performance. Such specifics may include:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>alignment of data (e.g. for avoiding misaligned memory accesses)</p>
+</li>
+<li>
+<p>arrangement of data required by kernels (column-major vs row-major
+for matrix multiplication, NHWC vs NCHW for neural network
+convolution)</p>
+</li>
+<li>
+<p>arrangement of the data into tiles (or “packing”) for improving
+cache and TLB hits</p>
+</li>
+<li>
+<p>arrangement of data into specific tiles in order to exploit complex
+HW operations such as matrix multiplications (Intel AMX, AMD matrix
+cores).</p>
+</li>
+<li>
+<p>arrangement of data into rows separated by a stride in order to
+avoid bank conflicts in GPUs.</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The tensor data type is designed to be efficiently used together with command buffers (cl_khr_command_buffers)
+and built-in kernels, including kernels to be provided by the Defined
+Built-in Kernels (cl_khr_defined_builtin_kernels) extension that is being prepared together with this extension.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_modifications_to_opencl">Modifications to OpenCL</h3>
+<div class="sect3">
+<h4 id="_new_section_5_x_tensor_objects">New Section: 5.x Tensor Objects</h4>
+<div class="paragraph">
+<p>A tensor object stores an N-dimensional array of elements. The memory
+layout of the tensor is opaque to the application. When a tensor
+object is created it is initially not associated to any storage for the tensor elements.
+ A storage is bound to a tensor
+by creating a memory buffer with CL_MEM_BIND_TO_BUFFER. Tensor objects
+without storage can be set as kernel arguments for kernels which
+accepts them. Kernels which have tensor arguments must have storage
+assigned to them prior enqueuing the kernels for execution.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_new_opencl_functions_added_to_tensor_objects_section">New OpenCL Functions added to Tensor Objects section</h4>
+<div class="paragraph">
+<p>To create a tensor use:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_tensor clCreateTensor(
+    cl_context context,
+    const cl_tensor_peoperties *properties,
+    size_t rank,
+    const size_t* shape,
+    cl_tensor_datatype dtype,
+    cl_int *errcode_ret);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>context</em> is a valid OpenCL context used to create the tensor object.</p>
+</li>
+<li>
+<p><em>properties</em> is an optional list of properties for the tensor object
+and their corresponding values. The list is terminated with the
+special property 0. If no properties are required, properties may be
+NULL. This extension does not define any optional properties for
+tensors.</p>
+</li>
+<li>
+<p><em>rank</em> is the number of dimensions. Zero value creates a "scalar"
+tensor which has no dimensions but has storage for one element.</p>
+</li>
+<li>
+<p><em>shape</em> is a list of sizes of the dimensions. The length of the list
+must be <em>rank</em> elements. <em>shape</em> can be NULL if <em>rank</em> value is
+zero. All the first <em>rank</em> values in the list must be non-zero.</p>
+</li>
+<li>
+<p><em>dtype</em> is the element type of <em>tensor</em>. Refer to the
+<a href="#TensorDtypes">Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.</a> table for the types.</p>
+</li>
+<li>
+<p><em>errcode_ret</em> may return an appropriate error code. If errcode_ret
+is NULL, no error code is returned.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>clCreateTensor function creates a <code>rank</code>-dimensional tensor with
+<code>shape[0] * shape[1] * &#8230;&#8203; * shape[rank-1]</code> elements of <em>dtype</em>
+type. At the creation time of the tensor, it does not have
+storage. The storage is assigned to the tensor by calling
+clCreateBufferWithProperties() with CL_MEM_BIND_TO_TENSOR.</p>
+</div>
+<div class="paragraph">
+<p>A command that refers to a tensor must be bound to a valid buffer
+object before enqueuing or recording the command.</p>
+</div>
+<div class="paragraph">
+<p><strong>clCreateTensor</strong> returns a valid non-zero tensor object and errcode_ret
+is set to CL_SUCCESS if the tensor object is created
+successfully. Otherwise, they return a NULL value with one of the
+following error values returned in errcode_ret:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_CONTEXT if context is not a valid context.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY if a property name in properties is not a
+supported property name, if the value specified for a supported
+property name is not valid, or if the same property name is
+specified more than once.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if a value specified in dtype is invalid.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<table id="TensorDtypes" class="tableblock frame-all grid-all stripes-even stretch">
+<caption class="title">Table 1. Tensor element types. The API type indicates the corresponding type for copying elements from an host allocation / buffer object to tensor or vice versa.</caption>
+<colgroup>
+<col style="width: 33.3333%;">
+<col style="width: 33.3333%;">
+<col style="width: 33.3334%;">
+</colgroup>
+<thead>
+<tr>
+<th class="tableblock halign-left valign-top"><strong>Tensor element data type</strong></th>
+<th class="tableblock halign-left valign-top"><strong>Description</strong></th>
+<th class="tableblock halign-left valign-top"><strong>API type</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOOL</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1-bit signedless integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uchar. <sup class="footnote">[<a id="_footnoteref_1" class="footnote" href="#_footnotedef_1" title="View footnote.">1</a>]</sup></p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT8</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_char.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_short.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT32</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_int.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_INT64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit signed integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_long.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT8</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">8-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uchar.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ushort.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT32</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_UINT64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit unsigned integer.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ulong.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_HALF</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Half precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_half.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BFLOAT16</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">16-bit brain floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_ushort</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_FLOAT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Single precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_float.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DOUBLE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Double precision floating-point.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_double.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX64</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">64-bit complex floating-point with
+  32-bit real and imaginary part.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_float2</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_COMPLEX128</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">128-bit complex floating-point with
+  64-bit real and imaginary part.</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_double2</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>To retain a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clRetainTensorObject(cl_tensor tensor);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> is the tensor object to be retained.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <em>tensor</em> reference count is incremented.</p>
+</div>
+<div class="paragraph">
+<p><strong>clRetainTensor</strong> returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if the tensor is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>To release a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clReleaseTensorObject(cl_tensor tensor);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> is the tensor object to be released.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <em>tensor</em> reference count is decremented.</p>
+</div>
+<div class="paragraph">
+<p>The tensor object is deleted once the number of instances that are
+retained to tensor become zero and the tensor object is no longer
+needed by any enqueued or recorded commands that use <em>tensor</em>. Using
+this function to release a reference that was not obtained by creating
+the object or by calling <strong>clRetainTensor</strong> causes undefined behavior.</p>
+</div>
+<div class="paragraph">
+<p><strong>clReleaseTensor</strong> returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if tensor is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>To return information about a tensor object, call the function</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clGetTensorInfo(
+  cl_tensor tensor,
+  cl_tensor_info param_name,
+  size_t param_value_size,
+  void* param_value,
+  size_t* param_value_size_ret);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>tensor</em> specifies the tensor object being queried.</p>
+</li>
+<li>
+<p><em>param_name</em> specifies the information to query. The list of
+supported param_name types and the information returned in
+<em>param_value</em> by clGetTensorInfo is described in the <a href="#Tensor Object
+Queries">[Tensor Object
+Queries]</a> table.</p>
+</li>
+<li>
+<p><em>param_value</em> is a pointer to memory where the appropriate result
+being queried is returned. If <em>param_value</em> is NULL, it is ignored.</p>
+</li>
+<li>
+<p><em>param_value_size</em> is used to specify the size in bytes of memory
+pointed to by <em>param_value</em>. This size must be ≥ size of return type
+as described in the <a href="#Tensor Object Queries">[Tensor Object Queries]</a> table.</p>
+</li>
+<li>
+<p><em>param_value_size_ret</em> returns the actual size in bytes of data
+being queried by <em>param_name</em>. If <em>param_value_size_ret</em> is NULL, it is
+ignored.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clGetTensorInfo</strong> returns CL_SUCCESS if the function is executed
+ succesfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_TENSOR if <em>tensor</em> is not a valid tensor object.</p>
+</li>
+</ul>
+</div>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<caption class="title">Table 2. List of supported param_names by clGetTensorInfo</caption>
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_RANK</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor rank.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_SHAPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">size_t[]</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor shape.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_DTYPE</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor_datatype</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor data type.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOUND_TO_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return true if the tensor is
+bound to a buffer.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_mem</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>If CL_TENSOR_BOUND_TO_BUFFER is true,
+return the buffer object the tensor is bound to. Otherwise,
+clGetTensorInfo call returns:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_MEM_OBJECT if the tensor is not bound to a buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_PROPERTY otherwise.</p>
+</li>
+</ul>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_CONTEXT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_context</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the context specified when
+  the tensor object is created.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_REFERENCE_COUNT</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_uint</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Return the tensor reference
+count.</p></td>
+</tr>
+</tbody>
+</table>
+<div class="paragraph">
+<p>The following functions are for reading from a tensor to host memory /
+buffer object or to write to a tensor object from host memory / buffer
+object.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueImportFromTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clEnqueueExportToTensor(
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  cl_bool blocking_command,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_events_in_wait_list,
+  const cl_event* event_wait_list,
+  cl_event* event);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>command_queue</em> is a valid host command-queue in which the read /
+write command will be queued. <em>command_queue</em> and <em>tensor</em> must be
+created with the same OpenCL context.</p>
+</li>
+<li>
+<p><em>tensor</em> refers to a valid tensor object which is bound to a buffer.</p>
+</li>
+<li>
+<p><em>blocking_command</em> indicate if the read and write operations are
+blocking or non-blocking (see below).</p>
+</li>
+<li>
+<p><em>tensor_origin</em> defines the offset coordinates in <em>tensor</em> for start of
+the regions to read / write tensor data. The length of the array
+must be at least rank the the <em>tensor</em>.</p>
+</li>
+<li>
+<p><em>mem_origin</em> defines the offset coordinates in the memory region
+pointed by <em>buffer</em> or <em>host_ptr</em> expressed in elements of <em>tensor</em>
+data type. The length of the array must be at least rank the the
+<em>tensor</em>.</p>
+</li>
+<li>
+<p><em>region</em> defines the region being read or written expressed in in
+elements of <em>tensor</em> data type. The length of the array must be at
+least rank the the <em>tensor</em>. If <em>region</em> is NULL then <em>tensor</em>'s
+shape will be used as the region.</p>
+</li>
+<li>
+<p><em>mem_pitch</em> defines the length of each dimension in elements to be
+used for the memory region of <em>buffer</em> or <em>host_ptr</em>. The length of
+the array must be at least the rank of <em>tensor</em> minus one. if
+<em>mem_pitch</em> is NULL or <em>mem_pitch</em>[i] is zero, <em>mem_pitch</em>[i] is
+computed as <em>region</em>[i + 1].</p>
+</li>
+<li>
+<p><em>buffer</em> and <em>host_ptr</em> refer to a valid buffer object / host
+allocation where data is to be read into or to be written from.
+Either the <em>buffer</em> or <em>host_ptr</em> can be non-NULL in which case the
+non-NULL argument is used as the operand for the operation.</p>
+</li>
+<li>
+<p><em>event_wait_list</em> and <em>num_events_in_wait_list</em> specify events that
+need to complete before this particular command can be executed. If
+<em>event_wait_list</em> is NULL, then this particular command does not
+wait on any event to complete. If <em>event_wait_list</em> is NULL,
+<em>num_events_in_wait_list</em> must be 0. If <em>event_wait_list</em> is not
+NULL, the list of events pointed to by <em>event_wait_list</em> must be
+valid and <em>num_events_in_wait_list</em> must be greater than 0. The
+events specified in <em>event_wait_list</em> act as synchronization
+points. The context associated with events in <em>event_wait_list</em> and
+<em>command_queue</em> must be the same. The memory associated with
+<em>event_wait_list</em> can be reused or freed after the function returns.</p>
+</li>
+<li>
+<p><em>event</em> returns an event object that identifies this read / write
+command and can be used to query or queue a wait for this command to
+complete. If <em>event</em> is NULL or the enqueue is unsuccessful, no
+event will be created and therefore it will not be possible to query
+the status of this command or to wait for this command to
+complete. If <em>event_wait_list</em> and <em>event</em> are not NULL, <em>event</em>
+must not refer to an element of the <em>event_wait_list</em> array.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The <strong>clEnqueueExportToTensor</strong> function copies contents of the buffer
+object / host allocation to tensor&#8217;s storage in
+implementation-defined, opaque memory layout. The
+<strong>clEnqueueImportFromTensor</strong> function copies data from tensor&#8217;s
+storage to buffer object / host allocation.</p>
+</div>
+<div class="paragraph">
+<p>The elements of buffer object / host allocation are mapped to tensor
+coordinates and vice versa as follows in pseudo C code:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">tensor_element(
+  tensor_origin[0] + i[0],
+  tensor_origin[1] + i[1],
+  ...,
+  tensor_origin[N-2] + i[N-2],
+  tensor_origin[N-2] + i[N-1]) ==
+((TENSOR_DATATYPE *)buffer_or_host_ptr)[
+  (mem_origin[0] + i[0]) * pitch(0) +
+  (mem_origin[1] + i[1]) * pitch(1) +
+  ... +
+  (mem_origin[N-2] + i[N-2]) * pitch(N-2) +
+  (mem_origin[N-1] + i[N-1])];</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Where the <code>N</code> is tensor rank, the <code>i[X]</code> is a tensor coordinate with
+inclusive range of <code>0..&lt;region[X]-1&gt;</code> and the <code>pitch</code> is computed as
+follows in pseudo C code:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">size_t pitch(size_t dim) {
+  size_t pitch = 1;
+  for (size_t i = dim; i &lt; tensor_rank - 1; i++)
+    pitch *=
+      (mem_pitch != NULL || mem_pitch[i] == 0) ? mem_pitch[i] : region[i + 1];
+  return pitch;
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>For <code>dim</code> in <code>0..(tensor_rank()-1)</code>. The <code>tensor_element()</code> represents
+an abstract function that accesses a tensor element in its storage at
+given coordinate. The method how the coordinates translate to tensor
+storage addresses is unspecified.</p>
+</div>
+<div class="paragraph">
+<p><strong>clEnqueueImportFromTensor</strong> and <strong>clEnqueueExportToTensor</strong>
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not a valid host
+command-queue.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>
+and buffer are not the same or if the context associated with
+<em>command_queue</em> and events in <em>event_wait_list</em> are not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_MEM_OBJECT if <em>buffer</em> is not a valid buffer object.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>tensor_origin</em> or <em>mem_origin</em> is NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if the region being read or written specified by
+(<em>mem_origin</em>, <em>region</em>, <em>mem_pitch</em>) is out of bounds.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if any <em>region</em> array element is 0.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>mem_pitch</em> is not NULL and <em>mem_pitch</em>[i] is
+not 0 and <em>mem_pitch</em>[i] is less than <em>region</em>[i].</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>buffer</em> and <em>host_ptr</em> both are NULL or non-NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_EVENT_WAIT_LIST if <em>event_wait_list</em> is NULL and
+<em>num_events_in_wait_list</em> &gt; 0, or <em>event_wait_list</em> is not NULL and
+<em>num_events_in_wait_list</em> is 0, or if event objects in
+<em>event_wait_list</em> are not valid events.</p>
+</li>
+<li>
+<p>CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST if the read and write
+operations are blocking and the execution status of any of the
+events in <em>event_wait_list</em> is a negative integer value.</p>
+</li>
+<li>
+<p>CL_MEM_OBJECT_ALLOCATION_FAILURE if there is a failure to allocate
+memory for data store associated with memory object the <em>tensor</em> is
+bound to.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>If <strong>cl_khr_command_buffer</strong> is supported, then the following command
+buffer counterparts of the <strong>clEnqueueImportFromTensor</strong> and
+<strong>clEnqueueExportToTensor</strong> commands are available.</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandImportFromTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);</code></pre>
+</div>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_int clCommandExportToTensorKHR(
+  cl_command_buffer_khr command_buffer,
+  cl_command_queue command_queue,
+  cl_tensor tensor,
+  const size_t* tensor_origin,
+  const size_t* mem_origin,
+  const size_t* region,
+  const size_t* mem_pitch,
+  cl_mem buffer,
+  const void* host_ptr,
+  cl_uint num_sync_points_in_wait_list,
+  const cl_sync_point_khr* sync_point_wait_list,
+  cl_sync_point_khr* sync_point,
+  cl_mutable_command_khr* mutable_handle);</code></pre>
+</div>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><em>command_buffer</em> refers to valid command-buffer object.</p>
+</li>
+<li>
+<p>For <em>command_queue</em>, <em>tensor</em>, <em>tensor_origin</em>, <em>mem_origin</em>,
+<em>region</em>, <em>mem_pitch</em>, <em>buffer</em> and <em>host_ptr</em> parameters refer to
+<strong>clEnqueueImportFromTensor</strong>.</p>
+</li>
+<li>
+<p>For <em>num_sync_points_in_wait_list</em>, <em>sync_point_wait_list</em>,
+<em>sync_point</em>, <em>mutable_handle</em> parameters refer to
+<strong>clCommandCopyBufferKHR</strong>.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p><strong>clCommandImportFromTensorKHR</strong> and <strong>clCommandImportFromTensorKHR</strong>
+returns CL_SUCCESS if the function is executed
+successfully. Otherwise, it returns one of the following errors:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CL_INVALID_COMMAND_QUEUE if <em>command_queue</em> is not NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_COMMAND_BUFFER_KHR if <em>command_buffer</em> is not a valid
+command-buffer.</p>
+</li>
+<li>
+<p>CL_INVALID_CONTEXT if the context associated with <em>command_queue</em>
+and <em>command_buffer</em> is not the same.</p>
+</li>
+<li>
+<p>CL_INVALID_OPERATION if <em>command_buffer</em> has been finalized.</p>
+</li>
+<li>
+<p>CL_INVALID_VALUE if <em>mutable_handle</em> is not NULL.</p>
+</li>
+<li>
+<p>CL_INVALID_SYNC_POINT_WAIT_LIST_KHR if <em>sync_point_wait_list</em> is
+NULL and <em>num_sync_points_in_wait_list</em> is &gt; 0, or
+<em>sync_point_wait_list</em> is not NULL and <em>num_sync_points_in_wait_list</em> is
+0, or if synchronization-point objects in <em>sync_point_wait_list</em> are
+not valid synchronization-points.</p>
+</li>
+<li>
+<p>CL_OUT_OF_RESOURCES if there is a failure to allocate resources
+required by the OpenCL implementation on the device.</p>
+</li>
+<li>
+<p>CL_OUT_OF_HOST_MEMORY if there is a failure to allocate resources
+required by the OpenCL implementation on the host.</p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="_add_new_buffer_property_in_section_5_2_1">Add New Buffer Property in Section 5.2.1</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>This property can be set if <strong>cl_khr_command_buffer</strong> extension is
+supported.</p>
+</div>
+<div class="admonitionblock note">
+<table>
+<tr>
+<td class="icon">
+<div class="title">Note</div>
+</td>
+<td class="content">
+This property temporarily lives here and will be moved to
+a separate extension proposal.
+</td>
+</tr>
+</table>
+</div>
+<div class="paragraph">
+<p>If the value is true, create a "temporary" buffer object that only can
+be used on commands recorded in command buffers. Non-recording
+command enqueue functions must return CL_INVALID_OPERATION if the
+command refers to a temporary buffer object.</p>
+</div>
+<div class="paragraph">
+<p>The temporary buffer objects are managed by command buffers. When a
+temporary buffer object is used by multiple command buffer, the object
+receives disjoint storage for each command buffer.</p>
+</div>
+<div class="paragraph">
+<p>Storage of the temporary buffer objects may be allocated on-demand
+basis. At the times the buffer is not needed, OpenCL implementations
+may reuse storage for other tasks within the command buffer.</p>
+</div>
+<div class="paragraph">
+<p>Contents of the temporary buffers are not guaranteed to be preserved
+across command buffer executions.</p>
+</div></div></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_BIND_TO_TENSOR</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_tensor</p></td>
+<td class="tableblock halign-left valign-top"><div class="content"><div class="paragraph">
+<p>Use the created buffer as
+storage for the given valid tensor. To succeed creating the buffer,
+the target tensor may not have storage already and <em>size</em>
+argument of the clCreateBufferWithProperties() must be zero.</p>
+</div>
+<div class="paragraph">
+<p>Size of the memory buffer is implementation-defined and it can be
+queried with clGetTensorInfo().</p>
+</div>
+<div class="paragraph">
+<p>Memory layout of the tensor in the created memory buffer is
+implementation-defined and opaque to the applications and it may
+change at unspecified points.  Implementation may use non-contiguous
+allocations to store the tensor data and implementation may store
+auxiliary data within the allocations.  Therefore, reading from or
+writing to the memory buffer directly using the cl_mem handle leads to
+undefined behavior.</p>
+</div>
+<div class="paragraph">
+<p>If the tensor is already bound to a buffer object,
+clCreateBufferWithProperties call returns CL_TENSOR_BOUND_TO_BUFFER
+error code.</p>
+</div></div></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_add_new_memory_object_query_in_section_5_5_5">Add New Memory Object Query in Section 5.5.5</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 20%;">
+<col style="width: 40%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_MEM_COMMAND_BUFFER_TEMPORARY</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">cl_bool</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">This property can be
+queried if <strong>cl_khr_command_buffer</strong> extension is supported.</p>
+<p class="tableblock">Return true if the <em>memobj</em> is temporary buffer object for command
+buffers.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="sect3">
+<h4 id="_add_new_error_codes_in_appendix_f">Add New Error Codes in Appendix F</h4>
+<table class="tableblock frame-all grid-all stripes-odd stretch">
+<colgroup>
+<col style="width: 40%;">
+<col style="width: 60%;">
+</colgroup>
+<tbody>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_TENSOR_BOUND_TO_BUFFER</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned when attempting to bind a
+  buffer object to a tensor which already has been bound to the same
+  or another.</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">CL_INVALID_TENSOR</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">Returned then the specified tensor is not a
+  valid tensor object.</p></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_sample_codes">Sample Codes</h3>
+<div class="paragraph">
+<p>Helper functions used in the follow up tensor code samples:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">cl_kernel create_matmul_kernel(
+  cl_context ctx, std::span&lt;cl_device_id&gt; device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical matmul kernel signature in pseudo OpenCL C for
+  // illustrative purposes:
+  //
+  //   kernel void matmul(global read_only tensor_t, global read_only tensor_t,
+  //                      global write_only tensor_t);
+
+  cl_kernel matmul_kernel = /* Omitted. */;
+  clSetKernelArg(matmul_kernel, 0, sizeof(cl_tensor), &amp;lhs);
+  clSetKernelArg(matmul_kernel, 1, sizeof(cl_tensor), &amp;rhs);
+  clSetKernelArg(matmul_kernel, 2, sizeof(cl_tensor), &amp;out);
+  return matmul_kernel;
+}
+
+cl_kernel create_add_kernel(
+  cl_context ctx, std::span&lt;cl_device_id&gt; device_span,
+  cl_tensor lhs, cl_tensor rhs, cl_tensor out) {
+  // A hypothetical add kernel signature in pseudo OpenCL C for illustrative
+  // purposes:
+  //
+  // kernel void add(global read_only tensor_t, global read_only tensor_t,
+  //                 global write_only tensor_t);
+
+  cl_tensor add_kernel = /* Omitted. */;
+  clSetKernelArg(add_kernel, 0, sizeof(cl_tensor), &amp;lhs);
+  clSetKernelArg(add_kernel, 1, sizeof(cl_tensor), &amp;rhs);
+  clSetKernelArg(add_kernel, 2, sizeof(cl_tensor), &amp;out);
+  return add_kernel;
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example usage of tensors on a command queue:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Allocate storage for the tensors. The buffer size must be set to
+// zero when the buffer is bound to a tensor. OpenCL implementation
+// may determine optimal data layout and the storage needed for it,
+// based on the tensor's uses (the 'matmul' and 'add' kernels in this
+// sample) so far.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in0, 0}, CL_MEM_READ_ONLY,
+  0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */, nullptr, &amp;err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in1, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &amp;err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, in2, 0}, CL_MEM_READ_ONLY,
+  0, nullptr, &amp;err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, t0, 0}, CL_MEM_READ_WRITE,
+  0, nullptr, &amp;err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_BIND_TO_TENSOR, out, 0}, CL_MEM_WRITE_ONLY,
+  0, nullptr, &amp;err);
+
+std::vector&lt;float&gt; in0_data = ...;
+std::vector&lt;float&gt; in1_data = ...;
+std::vector&lt;float&gt; out_data(b * m * n);
+
+// Copies data into in0 tensor while possibly rearranging the data to the
+// optimal data layout.
+clEnqueueExportToTensor(
+  cmd_q, in0, false, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, in0_data.data(), 0, nullptr, nullptr);
+clEnqueueExportToTensor(
+  cmd_q, in1, false, {0, 0, 0}, {0, 0, 0}, {b, k, n},
+  nullptr, nullptr, in1_data.data(), 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, matmul_kernel, 3, matmul_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueNDRangeKernel(
+  cmd_q, add_kernel, 3, add_grid, nullptr, nullptr, 0, nullptr, nullptr);
+clEnqueueImportFromTensor(
+  cmd_q, out, false,  {0, 0, 0}, {0, 0, 0}, {b, m, n},
+  nullptr, nullptr, out_data.data(), 0, nullptr, nullptr);</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>An example use of tensors in a command buffer when cl_khr_command_buffer
+extension is supported:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="highlight"><code class="language-c" data-lang="c">constexpr size_t b = 64, m = 100, n = 200, k = 50;
+
+cl_int err;
+cl_tensor in0 = clCreateTensor(ctx, nullptr, 3, {b, m, k}, CL_TENSOR_FLOAT, err);
+cl_tensor in1 = clCreateTensor(ctx, nullptr, 3, {b, k, n}, CL_TENSOR_FLOAT, err);
+cl_tensor in2 = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor t0  = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+cl_tensor out = clCreateTensor(ctx, nullptr, 3, {b, m, n}, CL_TENSOR_FLOAT, err);
+
+cl_kernel matmul_kernel = create_matmul_kernel(ctx, device_span, in0, in1, t0);
+cl_kernel add_kernel = create_add_kernel(ctx, device_span, t0, in2, out);
+
+// Bind command buffer managed storage to tensors.
+//
+// NOTE: same temporary tensor handle used in multiple command buffers
+//       will have separate storage. IOW, command buffers may not exchange
+//       data via temporary buffers between them.
+cl_mem in0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in0, 0},
+  CL_MEM_READ_ONLY, 0 /* must be zero for CL_MEM_BIND_TO_TENSOR. */,
+  nullptr, &amp;err);
+cl_mem in1_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in1, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &amp;err);
+cl_mem in2_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, in2, 0},
+  CL_MEM_READ_ONLY, 0, nullptr, &amp;err);
+cl_mem t0_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, t0, 0},
+  CL_MEM_READ_WRITE, 0, nullptr, &amp;err);
+cl_mem out_mem = clCreateBufferWithProperties(
+  ctx, {CL_MEM_COMMAND_BUFFER_TEMPORARY, true, CL_MEM_BIND_TO_TENSOR, out, 0},
+  CL_MEM_WRITE_ONLY, 0, nullptr, &amp;err);
+
+std::vector&lt;float&gt; in0_data = ...;
+std::vector&lt;float&gt; in1_data = ...;
+std::vector&lt;float&gt; out_data(b * m * n);
+
+cl_command_buffer_khr cb =
+  clCreateCommandBufferKHR(num_queues, queue_list, nullptr, &amp;err);
+
+cl_sync_point_khr in0_syncp, in1_syncp, matmul_syncp, add_syncp;
+clCommandExportToTensorKHR(
+  cmd_b, cmd_q, in0, {0, 0, 0}, {0, 0, 0}, {b, m, k},
+  nullptr, nullptr, in0_data.data(), 0, nullptr, &amp;in0_syncp);
+clCommandExportToTensorKHR(
+  cmd_b, cmd_q, in1, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, in1_data.data(), 0, nullptr, &amp;in1_syncp);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, matmul_kernel, 3, matmul_grid, nullptr, nullptr,
+  2, {in0_syncp, in2_syncp}, &amp;matmul_syncp, nullptr);
+clCommandNDRangeKernelKHR(
+  cmd_b, cmd_q, nullptr, add_kernel, 3, add_grid, nullptr, nullptr,
+  1, {matmul_syncp}, &amp;add_syncp, nullptr);
+clCommandImportFromTensorKHR(
+  cmd_b, cmd_q, out, {0, 0, 0}, {0, 0, 0}, {b, k, m},
+  nullptr, nullptr, out_data.data(), 1, {add_syncp}, nullptr);
+
+// Finalize the command buffer. At this point the OpenCL
+// implementation may reserve enough storage for all the tensor
+// temporaries. Temporary tensors might be eliminated - for example,
+// OpenCL implementation could use 'out' tensor to store result of
+// matmul_kernel , thus, eliminating the need of 't0' tensor.
+clFinalizeCommandBufferKHR(cmd_b);
+
+// Temporary tensors used in a command buffer can't be read or written
+// into. A hypothetical reason is that the finalized command buffer
+// might not use some of the tensor.
+assert(clEnqueueImportFromTensor(..., t0, ...) == CL_INVALID_OPERATION);</code></pre>
+</div>
+</div>
+</div>
+<div class="sect2">
+<h3 id="_open_questions">Open Questions</h3>
+<div class="olist arabic">
+<ol class="arabic">
+<li>
+<p>Should we have support for tensors with undefined shape and tensors
+with unknown / symbolic dimension sizes like in ONNX?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>UNRESOLVED</strong></p>
+</div>
+</div>
+</div>
+</li>
+<li>
+<p>Should we define OpenCL C language features for accessing tensors?</p>
+<div class="openblock">
+<div class="content">
+<div class="paragraph">
+<p><strong>RESOLVED</strong>: OpenCL C support for tensors can be introduced later in a
+            separate extension. Built-in kernels may benefit from this
+            extension as it is.</p>
+</div>
+</div>
+</div>
+</li>
+</ol>
+</div>
+</div>
+</div>
+</div>
+</div>
+<div id="footnotes">
+<hr>
+<div class="footnote" id="_footnotedef_1">
+<a href="#_footnoteref_1">1</a>. only LSB bit is considered when writing data to tensor. When reading data from tensor the boolean value will be written as 0 or 1. The boolean values in the tensor may be packed densenly
+</div>
+</div>
+<div id="footer">
+<div id="footer-text">
+Last updated 2023-11-17 12:20:18 +0200
+</div>
+</div>
+</body>
+</html>
\ No newline at end of file