Skip to content

Releases: NVIDIA/warp


01 Oct 07:23
Choose a tag to compare


[1.4.0] - 2024-10-01


  • Support for a new wp.static(expr) function that allows arbitrary Python expressions to be evaluated at the time of
    function/kernel definition (docs).
  • Support for stream priorities to hint to the device that it should process pending work
    in high-priority streams over pending work in low-priority streams when possible
  • Adaptive sparse grid geometry to warp.fem (docs).
  • Support for defining wp.kernel and wp.func objects from within closures.
  • Support for defining multiple versions of kernels, functions, and structs without manually assigning unique keys.
  • Support for default argument values for user functions decorated with wp.func.
  • Allow passing custom launch dimensions to jax_kernel() (GH-310).
  • JAX interoperability examples for sharding and matrix multiplication (docs).
  • Interoperability support for the PaddlePaddle ML framework (GH-318).
  • Support wp.mod() for vector types (GH-282).
  • Expose the modulo operator % to Python's runtime scalar and vector types.
  • Support for fp64 atomic_add, atomic_max, and atomic_min (GH-284).
  • Support for quaternion indexing (e.g. q.w).
  • Support shadowing builtin functions (GH-308).
  • Support for redefining function overloads.
  • Add an ocean sample to the omni.warp extension.
  • warp.sim.VBDIntegrator now supports body-particle collision.
  • Add a contributing guide to the Sphinx docs .
  • Add documentation for dynamic code generation (docs).


  • wp.sim.Model.edge_indices now includes boundary edges.
  • Unexposed wp.rand*(), wp.sample*(), and wp.poisson() from the Python scope.
  • Skip unused functions in module code generation, improving performance.
  • Avoid reloading modules if their content does not change, improving performance.
  • wp.Mesh.points is now a property instead of a raw data member, its reference can be changed after the mesh is initialized.
  • Improve error message when invalid objects are referenced in a Warp kernel.
  • if/else/elif statements with constant conditions are resolved at compile time with no branches being inserted in the generated code.
  • Include all non-hidden builtins in the stub file.
  • Improve accuracy of symmetric eigenvalues routine in warp.fem.


  • Fix for wp.func erroring out when defining a Tuple as a return type hint (GH-302).
  • Fix array in-place op (+=, -=) adjoints to compute gradients correctly in the backwards pass
  • Fix vector, matrix in-place assignment adjoints to compute gradients correctly in the backwards pass, e.g.: v[1] = x
  • Fix a bug in which Python docstrings would be created as local function variables in generated code.
  • Fix a bug with autograd array access validation in functions from different modules.
  • Fix a rare crash during error reporting on some systems due to glibc mismatches.
  • Handle --num_tiles 1 in (GH-306).
  • Fix the computation of body contact forces in FeatherstoneIntegrator when bodies and particles collide.
  • Fix bug in FeatherstoneIntegrator where eval_rigid_jacobian could give incorrect results or reach an infinite
    loop when the body and joint indices were not in the same order. Added Model.joint_ancestor to fix the indexing
    from a joint to its parent joint in the articulation.
  • Fix wrong vertex index passed to add_edges() called from ModelBuilder.add_cloth_mesh() (GH-319).
  • Add a workaround for uninitialized memory read warning in the compute-sanitizer initcheck tool when using wp.Mesh.
  • Fix name clashes when Warp functions and structs are returned from Python functions multiple times.
  • Fix name clashes between Warp functions and structs defined in different modules.
  • Fix code generation errors when overloading generic kernels defined in a Python function.
  • Fix issues with unrelated functions being treated as overloads (e.g., closures).
  • Fix handling of stream argument in array.__dlpack__().
  • Fix a bug related to reloading CPU modules.
  • Fix a crash when kernel functions are not found in CPU modules.
  • Fix conditions not being evaluated as expected in while statements.
  • Fix printing Boolean and 8-bit integer values.
  • Fix array interface type strings used for Boolean and 8-bit integer values.
  • Fix initialization error when setting struct members.
  • Fix Warp not being initialized upon entering a wp.Tape context.
  • Use kDLBool instead of kDLUInt for DLPack interop of Booleans.


04 Sep 20:54
Choose a tag to compare

[1.3.3] - 2024-09-04

  • Bug fixes
    • Fix an aliasing issue with zero-copy array initialization from NumPy introduced in Warp 1.3.0.
    • Fix wp.Volume.load_from_numpy() behavior when bg_value is a sequence of values.

[1.3.2] - 2024-08-30

  • Bug fixes
    • Fix accuracy of 3x3 SVD wp.svd3 with fp64 numbers (GH-281).
    • Fix module hashing when a kernel argument contained a struct array (GH-287).
    • Fix a bug in wp.bvh_query_ray() where the direction instead of the reciprocal direction was used
    • Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
      will no longer be unloaded before the graph is released.
    • Fix a bug in wp.sim.collide.triangle_closest_point_barycentric() where the returned barycentric coordinates may be
      incorrect when the closest point lies on an edge.
    • Fix 32-bit overflow when array shape is specified using np.int32.
    • Fix handling of integer indices in the input_output_mask argument to autograd.jacobian and
      autograd.jacobian_fd (GH-289).
    • Fix ModelBuilder.collapse_fixed_joints() to correctly update the body centers of mass and the
      ModelBuilder.articulation_start array.
    • Fix precedence of closure constants over global constants.
    • Fix quadrature point indexing in wp.fem.ExplicitQuadrature (regression from 1.3.0).
  • Documentation improvements
    • Add missing return types for built-in functions.
    • Clarify that atomic operations also return the previous value.
    • Clarify that wp.bvh_query_aabb() returns parts that overlap the bounding volume.

[1.3.1] - 2024-07-27

  • Remove wp.synchronize() from PyTorch autograd function example
  • Tape.check_kernel_array_access() and Tape.reset_array_read_flags() are now private methods.
  • Fix reporting unmatched argument types

[1.3.0] - 2024-07-25

  • Warp Core improvements

    • Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see for commands to install CUDA 11.x binaries for older drivers
    • Add information to the module load print outs to indicate whether a module was
      compiled (compiled), loaded from the cache (cached), or was unable to be
      loaded (error).
    • wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
    • Add wp.clear_kernel_cache() to the public API. This is equivalent to
    • Add code-completion support for wp.config variables.
    • Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
    • Improve error messages for unsupported Python operations such as sequence construction in kernels
    • Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
    • Add support for PEP 563's from __future__ import annotations (GH-256).
    • Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
    • Add faster Torch interop path using return_ctype argument to wp.from_torch()
    • Handle incompatible CUDA driver versions gracefully
    • Add wp.abs() and wp.sign() for vector types
    • Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
    • Add support for creating volumes with anisotropic transforms
    • Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
    • Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
    • Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
    • Implement the assignment operator for wp.quat
    • Make the geometry-related built-ins available only from within kernels
    • Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
    • Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)
  • warp.autograd improvements:

    • New warp.autograd module with utility functions gradcheck(), jacobian(), and jacobian_fd() for debugging kernel Jacobians (docs)
    • Add array overwrite detection, if wp.config.verify_autograd_array_access is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)
    • Fix bug where modification of @wp.func_replay functions and native snippets would not trigger module recompilation
    • Add documentation for dynamic loop autograd limitations
  • warp.sim improvements:

    • Improve memory usage and performance for rigid body contact handling when self.rigid_mesh_contact_max is zero (default behavior).
    • The mask argument to wp.sim.eval_fk() now accepts both integer and boolean arrays to mask articulations.
    • Fix handling of ModelBuilder.joint_act in ModelBuilder.collapse_fixed_joints() (affected floating-base systems)
    • Fix and improve implementation of ModelBuilder.plot_articulation() to visualize the articulation tree of a rigid-body mechanism
    • Fix ShapeInstancer __new__() method (missing instance return and *args parameter)
    • Fix handling of upaxis variable in ModelBuilder and the rendering thereof in OpenGLRenderer
  • warp.sparse improvements:

    • Sparse matrix allocations (from bsr_from_triplets(), bsr_axpy(), etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.
    • bsr_assign() now supports changing block shape (including CSR/BSR conversions)
    • Add Python operator overloads for common sparse matrix operations, e.g A += 0.5 * B, y = x @ C
  • warp.fem new features and fixes:

    • Support for variable number of nodes per element
    • Global wp.fem.lookup() operator now supports wp.fem.Tetmesh and wp.fem.Trimesh2D geometries
    • Simplified defining custom subdomains (wp.fem.Subdomain), free-slip boundary conditions
    • New field types: wp.fem.UniformField, wp.fem.ImplicitField and wp.fem.NonconformingField
    • New streamlines, magnetostatics and nonconforming_contact examples, updated mixed_elasticity to use a nonlinear model
    • Function spaces can now export VTK-compatible cells for visualization
    • Fixed edge cases with NanoVDB function spaces
    • Fixed differentiability of wp.fem.PicQuadrature w.r.t. positions and measures


30 Aug 15:32
Choose a tag to compare

[1.3.2] - 2024-08-30

  • Bug fixes
    • Fix accuracy of 3x3 SVD wp.svd3 with fp64 numbers (GH-281).
    • Fix module hashing when a kernel argument contained a struct array (GH-287).
    • Fix a bug in wp.bvh_query_ray() where the direction instead of the reciprocal direction was used
    • Fix errors when launching a CUDA graph after a module is reloaded. Modules that were used during graph capture
      will no longer be unloaded before the graph is released.
    • Fix a bug in wp.sim.collide.triangle_closest_point_barycentric() where the returned barycentric coordinates may be
      incorrect when the closest point lies on an edge.
    • Fix 32-bit overflow when array shape is specified using np.int32.
    • Fix handling of integer indices in the input_output_mask argument to autograd.jacobian and
      autograd.jacobian_fd (GH-289).
    • Fix ModelBuilder.collapse_fixed_joints() to correctly update the body centers of mass and the
      ModelBuilder.articulation_start array.
    • Fix precedence of closure constants over global constants.
    • Fix quadrature point indexing in wp.fem.ExplicitQuadrature (regression from 1.3.0).
  • Documentation improvements
    • Add missing return types for built-in functions.
    • Clarify that atomic operations also return the previous value.
    • Clarify that wp.bvh_query_aabb() returns parts that overlap the bounding volume.

[1.3.1] - 2024-07-27

  • Remove wp.synchronize() from PyTorch autograd function example
  • Tape.check_kernel_array_access() and Tape.reset_array_read_flags() are now private methods.
  • Fix reporting unmatched argument types

[1.3.0] - 2024-07-25

  • Warp Core improvements
    • Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see for commands to install CUDA 11.x binaries for older drivers
    • Add information to the module load print outs to indicate whether a module was
      compiled (compiled), loaded from the cache (cached), or was unable to be
      loaded (error).
    • wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
    • Add wp.clear_kernel_cache() to the public API. This is equivalent to
    • Add code-completion support for wp.config variables.
    • Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
    • Improve error messages for unsupported Python operations such as sequence construction in kernels
    • Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
    • Add support for PEP 563's from __future__ import annotations (GH-256).
    • Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
    • Add faster Torch interop path using return_ctype argument to wp.from_torch()
    • Handle incompatible CUDA driver versions gracefully
    • Add wp.abs() and wp.sign() for vector types
    • Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
    • Add support for creating volumes with anisotropic transforms
    • Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
    • Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
    • Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
    • Implement the assignment operator for wp.quat
    • Make the geometry-related built-ins available only from within kernels
    • Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
    • Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)


28 Jul 05:18
Choose a tag to compare

[1.3.1] - 2024-07-27

  • Remove wp.synchronize() from PyTorch autograd function example
  • Tape.check_kernel_array_access() and Tape.reset_array_read_flags() are now private methods.
  • Fix reporting unmatched argument types

[1.3.0] - 2024-07-25

  • Warp Core improvements

    • Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see for commands to install CUDA 11.x binaries for older drivers
    • Add information to the module load print outs to indicate whether a module was
      compiled (compiled), loaded from the cache (cached), or was unable to be
      loaded (error).
    • wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
    • Add wp.clear_kernel_cache() to the public API. This is equivalent to
    • Add code-completion support for wp.config variables.
    • Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
    • Improve error messages for unsupported Python operations such as sequence construction in kernels
    • Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
    • Add support for PEP 563's from __future__ import annotations (GH-256).
    • Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
    • Add faster Torch interop path using return_ctype argument to wp.from_torch()
    • Handle incompatible CUDA driver versions gracefully
    • Add wp.abs() and wp.sign() for vector types
    • Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
    • Add support for creating volumes with anisotropic transforms
    • Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
    • Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
    • Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
    • Implement the assignment operator for wp.quat
    • Make the geometry-related built-ins available only from within kernels
    • Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
    • Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)
  • warp.autograd improvements:

    • New warp.autograd module with utility functions gradcheck(), jacobian(), and jacobian_fd() for debugging kernel Jacobians (docs)
    • Add array overwrite detection, if wp.config.verify_autograd_array_access is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)
    • Fix bug where modification of @wp.func_replay functions and native snippets would not trigger module recompilation
    • Add documentation for dynamic loop autograd limitations
  • warp.sim improvements:

    • Improve memory usage and performance for rigid body contact handling when self.rigid_mesh_contact_max is zero (default behavior).
    • The mask argument to wp.sim.eval_fk() now accepts both integer and boolean arrays to mask articulations.
    • Fix handling of ModelBuilder.joint_act in ModelBuilder.collapse_fixed_joints() (affected floating-base systems)
    • Fix and improve implementation of ModelBuilder.plot_articulation() to visualize the articulation tree of a rigid-body mechanism
    • Fix ShapeInstancer __new__() method (missing instance return and *args parameter)
    • Fix handling of upaxis variable in ModelBuilder and the rendering thereof in OpenGLRenderer
  • warp.sparse improvements:

    • Sparse matrix allocations (from bsr_from_triplets(), bsr_axpy(), etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.
    • bsr_assign() now supports changing block shape (including CSR/BSR conversions)
    • Add Python operator overloads for common sparse matrix operations, e.g A += 0.5 * B, y = x @ C
  • warp.fem new features and fixes:

    • Support for variable number of nodes per element
    • Global wp.fem.lookup() operator now supports wp.fem.Tetmesh and wp.fem.Trimesh2D geometries
    • Simplified defining custom subdomains (wp.fem.Subdomain), free-slip boundary conditions
    • New field types: wp.fem.UniformField, wp.fem.ImplicitField and wp.fem.NonconformingField
    • New streamlines, magnetostatics and nonconforming_contact examples, updated mixed_elasticity to use a nonlinear model
    • Function spaces can now export VTK-compatible cells for visualization
    • Fixed edge cases with NanoVDB function spaces
    • Fixed differentiability of wp.fem.PicQuadrature w.r.t. positions and measures


26 Jul 04:43
Choose a tag to compare

[1.3.0] - 2024-07-25

  • Warp Core improvements

    • Update to CUDA 12.x by default (requires NVIDIA driver 525 or newer), please see for commands to install CUDA 11.x binaries for older drivers
    • Add information to the module load print outs to indicate whether a module was
      compiled (compiled), loaded from the cache (cached), or was unable to be
      loaded (error).
    • wp.config.verbose = True now also prints out a message upon the entry to a wp.ScopedTimer.
    • Add wp.clear_kernel_cache() to the public API. This is equivalent to
    • Add code-completion support for wp.config variables.
    • Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns (GH-224)
    • Improve error messages for unsupported Python operations such as sequence construction in kernels
    • Update wp.matmul() CPU fallback to use dtype explicitly in np.matmul() call
    • Add support for PEP 563's from __future__ import annotations (GH-256).
    • Allow passing external arrays/tensors to wp.launch() directly via __cuda_array_interface__ and __array_interface__, up to 2.5x faster conversion from PyTorch
    • Add faster Torch interop path using return_ctype argument to wp.from_torch()
    • Handle incompatible CUDA driver versions gracefully
    • Add wp.abs() and wp.sign() for vector types
    • Expose scalar arithmetic operators to Python's runtime (e.g.: wp.float16(1.23) * wp.float16(2.34))
    • Add support for creating volumes with anisotropic transforms
    • Allow users to pass function arguments by keyword in a kernel using standard Python calling semantics
    • Add additional documentation and examples demonstrating wp.copy(), wp.clone(), and array.assign() differentiability
    • Add __new__() methods for all class __del__() methods to handle when a class instance is created but not instantiated before garbage collection
    • Implement the assignment operator for wp.quat
    • Make the geometry-related built-ins available only from within kernels
    • Rename the API-facing query types to remove their _t suffix: wp.BVHQuery, wp.HashGridQuery, wp.MeshQueryAABB, wp.MeshQueryPoint, and wp.MeshQueryRay
    • Add wp.array(ptr=...) to allow initializing arrays from pointer addresses inside of kernels (GH-206)
  • warp.autograd improvements:

    • New warp.autograd module with utility functions gradcheck(), jacobian(), and jacobian_fd() for debugging kernel Jacobians (docs)
    • Add array overwrite detection, if wp.config.verify_autograd_array_access is true in-place operations on arrays on the Tape that could break gradient computation will be detected (docs)
    • Fix bug where modification of @wp.func_replay functions and native snippets would not trigger module recompilation
    • Add documentation for dynamic loop autograd limitations
  • warp.sim improvements:

    • Improve memory usage and performance for rigid body contact handling when self.rigid_mesh_contact_max is zero (default behavior).
    • The mask argument to wp.sim.eval_fk() now accepts both integer and boolean arrays to mask articulations.
    • Fix handling of ModelBuilder.joint_act in ModelBuilder.collapse_fixed_joints() (affected floating-base systems)
    • Fix and improve implementation of ModelBuilder.plot_articulation() to visualize the articulation tree of a rigid-body mechanism
    • Fix ShapeInstancer __new__() method (missing instance return and *args parameter)
    • Fix handling of upaxis variable in ModelBuilder and the rendering thereof in OpenGLRenderer
  • warp.sparse improvements:

    • Sparse matrix allocations (from bsr_from_triplets(), bsr_axpy(), etc.) can now be captured in CUDA graphs; exact number of non-zeros can be optionally requested asynchronously.
    • bsr_assign() now supports changing block shape (including CSR/BSR conversions)
    • Add Python operator overloads for common sparse matrix operations, e.g A += 0.5 * B, y = x @ C
  • warp.fem new features and fixes:

    • Support for variable number of nodes per element
    • Global wp.fem.lookup() operator now supports wp.fem.Tetmesh and wp.fem.Trimesh2D geometries
    • Simplified defining custom subdomains (wp.fem.Subdomain), free-slip boundary conditions
    • New field types: wp.fem.UniformField, wp.fem.ImplicitField and wp.fem.NonconformingField
    • New streamlines, magnetostatics and nonconforming_contact examples, updated mixed_elasticity to use a nonlinear model
    • Function spaces can now export VTK-compatible cells for visualization
    • Fixed edge cases with NanoVDB function spaces
    • Fixed differentiability of wp.fem.PicQuadrature w.r.t. positions and measures


04 Jul 19:07
Choose a tag to compare

[1.2.2] - 2024-07-04

  • Support for NumPy >= 2.0

[1.2.1] - 2024-06-14

  • Fix generic function caching
  • Fix Warp not being initialized when constructing arrays with wp.array()
  • Fix wp.is_mempool_access_supported() not resolving the provided device arguments to wp.context.Device

[1.2.0] - 2024-06-06

  • Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
  • Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
  • Improve kernel cache reuse by hashing just the local module constants. Previously, a
    module's hash was affected by all wp.constant() variables declared in a Warp program.
  • Revised module compilation process to allow multiple processes to use the same kernel cache directory.
    Cached kernels will now be stored in hash-specific subdirectory.
  • Add runtime checks for wp.MarchingCubes on field dimensions and size
  • Fix memory leak in wp.Mesh BVH (GH-225)
  • Use C++17 when building the Warp library and user kernels
  • Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
  • Extended NanoVDB support (see warp.Volume):
    • Add support for data-agnostic index grids, allocation at voxel granularity
    • New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
    • Zero-copy aliasing of in-memory grids, support for multi-grid buffers
    • Grid introspection and blind data access capabilities
    • warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
    • Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
    • Prevent wp.volume_store() from overwriting grid background values
  • Improve validation of user-provided fields and values in warp.fem
  • Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
  • wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
  • Clarify terms for external contributions, please see for details
  • Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
  • Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
  • Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
  • Fix for handling of --msvc_path in build scripts
  • Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
  • Fix for wp.randn() to ensure return values are finite
  • Fix for slicing of arrays with gradients in kernels
  • Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
  • Fix for handling of bool types in generic kernels
  • Publish CUDA 12.5 binaries for Hopper support, see for details


14 Jun 21:16
Choose a tag to compare

[1.2.1] - 2024-06-14

  • Fix generic function caching
  • Fix Warp not being initialized when constructing arrays with wp.array()
  • Fix wp.is_mempool_access_supported() not resolving the provided device arguments to wp.context.Device

[1.2.0] - 2024-06-06

  • Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
  • Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
  • Improve kernel cache reuse by hashing just the local module constants. Previously, a
    module's hash was affected by all wp.constant() variables declared in a Warp program.
  • Revised module compilation process to allow multiple processes to use the same kernel cache directory.
    Cached kernels will now be stored in hash-specific subdirectory.
  • Add runtime checks for wp.MarchingCubes on field dimensions and size
  • Fix memory leak in wp.Mesh BVH (GH-225)
  • Use C++17 when building the Warp library and user kernels
  • Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
  • Extended NanoVDB support (see warp.Volume):
    • Add support for data-agnostic index grids, allocation at voxel granularity
    • New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
    • Zero-copy aliasing of in-memory grids, support for multi-grid buffers
    • Grid introspection and blind data access capabilities
    • warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
    • Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
    • Prevent wp.volume_store() from overwriting grid background values
  • Improve validation of user-provided fields and values in warp.fem
  • Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
  • wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
  • Clarify terms for external contributions, please see for details
  • Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
  • Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
  • Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
  • Fix for handling of --msvc_path in build scripts
  • Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
  • Fix for wp.randn() to ensure return values are finite
  • Fix for slicing of arrays with gradients in kernels
  • Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
  • Fix for handling of bool types in generic kernels
  • Publish CUDA 12.5 binaries for Hopper support, see for details


07 Jun 03:53
Choose a tag to compare

[1.2.0] - 2024-06-06

  • Add a not-a-number floating-point constant that can be used as wp.NAN or wp.nan.
  • Add wp.isnan(), wp.isinf(), and wp.isfinite() for scalars, vectors, matrices, etc.
  • Improve kernel cache reuse by hashing just the local module constants. Previously, a
    module's hash was affected by all wp.constant() variables declared in a Warp program.
  • Revised module compilation process to allow multiple processes to use the same kernel cache directory.
    Cached kernels will now be stored in hash-specific subdirectory.
  • Add runtime checks for wp.MarchingCubes on field dimensions and size
  • Fix memory leak in wp.Mesh BVH (GH-225)
  • Use C++17 when building the Warp library and user kernels
  • Increase PTX target architecture up to sm_75 (from sm_70), enabling Turing ISA features
  • Extended NanoVDB support (see warp.Volume):
    • Add support for data-agnostic index grids, allocation at voxel granularity
    • New wp.volume_lookup_index(), wp.volume_sample_index() and generic wp.volume_sample()/wp.volume_lookup()/wp.volume_store() kernel-level functions
    • Zero-copy aliasing of in-memory grids, support for multi-grid buffers
    • Grid introspection and blind data access capabilities
    • warp.fem can now work directly on NanoVDB grids using warp.fem.Nanogrid
    • Fixed wp.volume_sample_v() and wp.volume_store_*() adjoints
    • Prevent wp.volume_store() from overwriting grid background values
  • Improve validation of user-provided fields and values in warp.fem
  • Support headless rendering of wp.render.OpenGLRenderer via pyglet.options["headless"] = True
  • wp.render.RegisteredGLBuffer can fall back to CPU-bound copying if CUDA/OpenGL interop is not available
  • Clarify terms for external contributions, please see for details
  • Improve performance of wp.sparse.bsr_mm() by ~5x on benchmark problems
  • Fix for XPBD incorrectly indexing into of joint actuations joint_act arrays
  • Fix for mass matrix gradients computation in wp.sim.FeatherstoneIntegrator()
  • Fix for handling of --msvc_path in build scripts
  • Fix for wp.copy() params to record dest and src offset parameters on wp.Tape()
  • Fix for wp.randn() to ensure return values are finite
  • Fix for slicing of arrays with gradients in kernels
  • Fix for function overload caching, ensure module is rebuilt if any function overloads are modified
  • Fix for handling of bool types in generic kernels
  • Publish CUDA 12.5 binaries for Hopper support, see for details

[1.1.1] - 2024-05-24

  • wp.init() is no longer required to be called explicitly and will be performed on first call to the API
  • Speed up omni.warp.core's startup time


08 May 15:54
Choose a tag to compare

[1.1.0] - 2024-05-09

  • Support returning a value from @wp.func_native CUDA functions using type hints
  • Improved differentiability of the wp.sim.FeatherstoneIntegrator
  • Fix gradient propagation for rigid body contacts in wp.sim.collide()
  • Added support for event-based timing, see wp.ScopedTimer()
  • Added Tape visualization and debugging functions, see wp.Tape.visualize()
  • Support constructing Warp arrays from objects that define the __cuda_array_interface__ attribute
  • Support copying a struct to another device, use to migrate struct arrays
  • Allow rigid shapes to not have any collisions with other shapes in wp.sim.Model
  • Change default test behavior to test redundant GPUs (up to 2x)
  • Test each example in an individual subprocess
  • Polish and optimize various examples and tests
  • Allow non-contiguous point arrays to be passed to
  • Upgrade LLVM to 18.1.3 for from-source builds and Linux x86-64 builds
  • Build DLL source code as C++17 and require GCC 9.4 as a minimum
  • Array clone, assign, and copy are now differentiable
  • Use Ruff for formatting and linting
  • Various documentation improvements (infinity, math constants, etc.)
  • Improve URDF importer, handle joint armature
  • Allow builtins.bool to be used in Warp data structures
  • Use external gradient arrays in backward passes when passed to wp.launch()
  • Add Conjugate Residual linear solver, see
  • Fix propagation of gradients on aliased copy of variables in kernels
  • Facilitate debugging and speed up import warp by eliminating raising any exceptions
  • Improve support for nested vec/mat assignments in structs
  • Recommend Python 3.9 or higher, which is required for JAX and soon PyTorch.
  • Support gradient propagation for indexing sliced multi-dimensional arrays, i.e. a[i][j] vs. a[i, j]
  • Provide an informative message if setting DLL C-types failed, instructing to try rebuilding the library

[1.0.3] - 2024-04-17

  • Add a support_level entry to the configuration file of the extensions


22 Mar 20:42
Choose a tag to compare

[1.0.2] - 2024-03-22

  • Make examples runnable from any location
  • Fix the examples not running directly from their Python file
  • Add the example gallery to the documentation
  • Update examples USD location
  • Update description