Skip to content

[Bug] [ROCm]: tests/kernels/mamba/test_mamba_ssm_ssd.py::test_mamba_chunk_scan_cont_batch: Core dump #20885

@tjtanaa

Description

@tjtanaa

Your current environment

The output of python collect_env.py
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (x86_64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version                : Could not collect
CMake version                : version 3.26.4
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.7.0+git6fd4078
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 6.4.43483-a187df25c

==============================
      Python Environment
==============================
Python version               : 3.10.12 (main, Feb  4 2025, 14:57:36) [GCC 11.4.0] (64-bit runtime)
Python platform              : Linux-5.15.0-116-generic-x86_64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : AMD Instinct MI300X (gfx942:sramecc+:xnack-)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 6.4.43483
MIOpen runtime version       : 3.4.0
Is XNNPACK available         : True

==============================
Versions of relevant libraries
==============================
[pip3] conch-triton-kernels==1.2.1
[pip3] numpy==2.2.6
[pip3] pyzmq==26.4.0
[pip3] torch==2.7.0+git6fd4078
[pip3] torchao==0.11.0
[pip3] torchaudio==2.7.0a0+654fee8
[pip3] torchvision==0.22.0+9eb57cd
[pip3] transformers==4.52.4
[pip3] triton==3.3.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : 6.4.43483-a187df25c
Neuron SDK Version           : N/A
vLLM Version                 : 0.1.dev7675+gf184e89 (git sha: f184e89)

==============================
     Environment Variables
==============================
TORCHINDUCTOR_MAX_AUTOTUNE_POINTWISE=1
NCCL_MIN_NCHANNELS=112
TORCHINDUCTOR_MAX_AUTOTUNE=1
PYTORCH_ROCM_ARCH=gfx942
TORCH_BLAS_PREFER_HIPBLASLT=1
LD_LIBRARY_PATH=/opt/rocm-6.4.1/lib:/usr/local/lib:
VLLM_USE_TRITON_FLASH_ATTN=0
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY

🐛 Describe the bug

pytest -svvvv tests/kernels/mamba/test_mamba_ssm_ssd.py::test_mamba_chunk_scan_cont_batch: Core dump

Fail:

:0:rocdevice.cpp            :2991: 1164852989842 us:  Callback: Queue 0x7f56c1800000 aborting with error : HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. code: 0x29                 
Fatal Python error: Aborted     
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 117 in pytest_runtest_protocol
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 367 in pytest_runtestloop
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 343 in _main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 289 in wrap_session
  File "/usr/local/lib/python3.10/dist-packages/_pytest/main.py", line 336 in pytest_cmdline_main
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 175 in main
  File "/usr/local/lib/python3.10/dist-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/usr/local/bin/pytest", line 33 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nest
ed, torch._C._nn, torch._C._sparse, torch._C._special, zstandard.backend_c, charset_normalizer.md, yaml._yaml, PIL._imaging, regex._regex, markupsafe._speedups, sklearn.__check_build._check_build, scipy._lib._ccallback_c, numpy.random._common, numpy
.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._cspar
setools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg._matfuncs_expm, scip
y.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.s
parse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sp
arse.csgraph._matching, scipy.sparse.csgraph._reordering, psutil._psutil_linux, psutil._psutil_posix, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree,
 scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb
                                                                                                                                                                                                                          
                                                                                                                                                                                                                                                         
Thread 0x00007f60fbd081c0 (most recent call first):                                                                                                                                                                                                      
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 1040 in synchronize                                                                                                                                                        
  File "/usr/local/lib/python3.10/dist-packages/triton/testing.py", line 146 in do_bench                                                                                                                                                                   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py", line 170 in _bench                                                                                                                                                         
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py", line 192 in <dictcomp>                                                                                                                                                     
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py", line 192 in run                                                                                                                                                            
  File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 348 in <lambda>                                                                                                                                                             
  File "/app/upstreamupgradeaiter/mtp-v1/vllm/model_executor/layers/mamba/ops/ssd_chunk_state.py", line 712 in chunk_state_varlen                                                                                                                          File "/app/upstreamupgradeaiter/mtp-v1/vllm/model_executor/layers/mamba/ops/ssd_combined.py", line 155 in _mamba_chunk_scan_combined_fwd                                                                                                               
  File "/app/upstreamupgradeaiter/mtp-v1/vllm/model_executor/layers/mamba/ops/ssd_combined.py", line 208 in mamba_chunk_scan_combined                                                                                                                    
  File "/app/upstreamupgradeaiter/mtp-v1/tests/kernels/mamba/test_mamba_ssm_ssd.py", line 281 in test_mamba_chunk_scan_cont_batch                                                                                                                        
  File "/usr/local/lib/python3.10/dist-packages/_pytest/python.py", line 156 in pytest_pyfunc_call                          
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/python.py", line 1670 in runtest
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 178 in pytest_runtest_call
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_callers.py", line 121 in _multicall
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/usr/local/lib/python3.10/dist-packages/pluggy/_hooks.py", line 512 in __call__
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 246 in <lambda>
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 344 in from_call
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 245 in call_and_report
  File "/usr/local/lib/python3.10/dist-packages/_pytest/runner.py", line 136 in runtestprotocol
  _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._cython_nnls, scipy._lib._uarray._uarray, scipy.linalg._deco
mp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.interpolate._fitpack, scipy.i
nterpolate._dfitpack, scipy.interpolate._dierckx, scipy.interpolate._ppoly, scipy.interpolate._interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.interpolate._bspl, scipy.special.cython_special, scipy.stats._stats, 
scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._biasedurn, scipy.stats._stats_pythran, scipy.stats._levy_stable.levyst, scipy.stats._ansari_swilk_statistics, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.ndimage._nd_image, scipy.ndimage._r
ank_filter_1d, _ni_label, scipy.ndimage._ni_label, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas.
_libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tsli
bs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, nume
xpr.interpreter, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas
._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, skle
arn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._
cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metri
cs._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwis
e_fast, zmq.backend.cython._zmq, PIL._imagingft, hiredis.hiredis, msgspec._core, pybase64._pybase64, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket
.reader_c, frozenlist._frozenlist, hip_utils, __triton_launcher (total: 192)
Aborted

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions