Skip to content

v0.2.0

Compare
Choose a tag to compare
@charles-r-earp charles-r-earp released this 30 Mar 03:38
· 26 commits to main since this release
  • Removed async traits and methods.
  • Core functionality reimplemented in krnl:
    • Only targets Vulkan, more portable than Metal / DX12.
    • Metal is supported via MoltenVK.
      • GPGPU kernels implemented inline in Rust:
        • Kernels can be defined in the same file, near where they are invoked.
        • Modules allow sharing code between host and device.
        • Kernel bindings are type safe, checked at compile time.
        • Simple iterator patterns can be implemented without unsafe.
        • Supports specialization constants provided at runtime.
        • DeviceInfo includes useful properties:
          • Max / default threads per group.
          • Max / min threads per subgroup.
        • With DebugPrintf, kernel panics produce errors on the host.
        • krnlc generates a device crate and invokes spirv-builder.
          • spirv-builder / spirv-tools are compiled once on install.
          • Significantly streamlines and accelerates workflow.
        • Kernels are compressed to reduce package and binary size.
    • Device operations readily execute:
      • Block until kernels / transfers can queue.
      • An operation can be queued while another is executing.
      • Reduced latency, better repeatability, reliability, and performance.
    • Device buffers can be copied by the host if host visible.
    • Large buffer copies are streamed rather than allocating a large temporary.
      • Reuses a few small buffers for transfers.
      • Overlaps host and device copies.
      • Performance significantly closer to CUDA.
      • Also streams between devices.
    • Device buffers can be i32::MAX bytes (~2 GB, up from 256 MB).
    • Scalar / ScalarBufferBase replaces Float / FloatBuffer:
      • Streamlined conversions between buffers.
    • Buffers can be sliced.
    • Supports wasm (without device feature).
  • TensorBase and ScalarBufferBase implemented with krnl::BufferBase and krnl::ScalarBufferBase:
    • Streamlined conversions between tensor types.
    • Host ops accelerated with rayon.
    • Improved and streamlined device gemm kernel.
    • Device sum and sum_axis use subgroup reductions for improved performance.
  • Replaced Criterion trait with Accuracy / CrossEntropyLoss traits.
  • ops::AddAssign implemented by Tensor and Variable.
  • Implement ndarray::linalg::Dot for Tensor and Variable.
  • Direct convolution algorithm for better host performance.
  • Removed learn::kmeans.
  • Redesigned autograd:
    • Autograd replaced with VariableBuilder:
      • Nodes and edges applied when building a Variable.
      • Backward edges are simply f(output_grad) -> input_grad.
    • Gradients are automatically accumulated.
    • Parameter and Variable are separate types (instead of VertexBase).
      • Parameters can be converted to Variables.
  • Redesigned Layer trait:
    • for_each_parameter fn's instead of returning a Vec.
    • Cast layers to a ScalarType.
    • Removed enumeration of child layers.
  • Redesigned Forward trait:
    • Generic over input and output type.
  • Derive improvements:
    • Removed layer attribute.
    • Supports enums.
    • Fields can be skipped.
  • Redesigned Optimizer trait:
    • Added learning rate.
    • Accepts a single parameter instead of a slice.
  • Parameter optimizer::State:
    • Can be serialized / deserialized with serde.
  • Simplified Iris dataset.
  • MNIST dataset:
    • Replaced downloader with curl.
    • Decompress in parallel with rayon.

MSRV: 1.70.0