Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] DeepSeek-R1 mode load stuck in H20 #50975

Open
rainmaple opened this issue Feb 28, 2025 · 7 comments
Open

[Serve] DeepSeek-R1 mode load stuck in H20 #50975

rainmaple opened this issue Feb 28, 2025 · 7 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't llm serve Ray Serve Related Issue

Comments

@rainmaple
Copy link

What happened + What you expected to happen

Env

using vLLM 0.7.2 model DeepSeek-R1 671b , cuda

nvidia-smi

(base) [root@adbpg-h20-test ~]# nvidia-smi
Thu Feb 27 19:11:35 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06             Driver Version: 535.183.06   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H20                     Off | 00000000:00:01.0 Off |                    0 |
| N/A   35C    P0             116W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H20                     Off | 00000000:00:02.0 Off |                    0 |
| N/A   32C    P0             113W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H20                     Off | 00000000:00:03.0 Off |                    0 |
| N/A   35C    P0             117W / 500W |  320MiB/ 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H20                     Off | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0             113W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA H20                     Off | 00000000:00:05.0 Off |                    0 |
| N/A   31C    P0             114W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA H20                     Off | 00000000:00:06.0 Off |                    0 |
| N/A   35C    P0             120W / 500W |  320MiB/ 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA H20                     Off | 00000000:00:07.0 Off |                    0 |
| N/A   32C    P0             113W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA H20                     Off | 00000000:00:08.0 Off |                    0 |
| N/A   35C    P0             114W / 500W |  320MiB / 97871MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

start

serve run serve-r1:build_app model="/data/DeepSeek-R1" pipeline-parallel-size=1 tensor-parallel-size=8 accelerator="GPU" max-model-len=4096

problem

serve log info: load model using triton MLA then hang.... that's the problem, have no problem in vLLM 0.6.5, but only 0.7.2 support the DeepSeekR1, so i turn to the 0.7.2 version .

Versions / Dependencies

  • ray --version
    2025-02-27 19:19:42,359 - INFO - Note: detected 128 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable.
    2025-02-27 19:19:42,359 - INFO - Note: NumExpr detected 128 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
    2025-02-27 19:19:42,359 - INFO - NumExpr defaulting to 8 threads.
    ray, version 2.40.0

  • vllm 0.7.2

  • nvcc:
    NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2023 NVIDIA Corporation
    Built on Mon_Apr__3_17:16:06_PDT_2023
    Cuda compilation tools, release 12.1, V12.1.105
    Build cuda_12.1.r12.1/compiler.32688072_0

Reproduction script

code serve-r1.py

import os
from typing import Dict, Optional, List
import logging

from fastapi import FastAPI
from starlette.requests import Request
from starlette.responses import StreamingResponse, JSONResponse

from ray import serve

from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.entrypoints.openai.cli_args import make_arg_parser
from vllm.entrypoints.openai.protocol import (
    ChatCompletionRequest,
    ChatCompletionResponse,
    ErrorResponse,
)
from vllm.entrypoints.openai.serving_models import OpenAIServingModels
from vllm.entrypoints.openai.serving_chat import OpenAIServingChat
#from vllm.entrypoints.openai.serving_model import LoRAModulePath, PromptAdapterPath
from vllm.utils import FlexibleArgumentParser
from vllm.entrypoints.logger import RequestLogger
from dataclasses import dataclass

logger = logging.getLogger("ray.serve")
@dataclass
class BaseModelPath:
    name: str
    model_path: str

local_models = [BaseModelPath(name="/data/DeepSeek-R1/", model_path="DeepSeek-R1")]
app = FastAPI()

@serve.deployment(
    autoscaling_config={
        "min_replicas": 1,
        "max_replicas": 10,
        "target_ongoing_requests": 5,
    },
    max_ongoing_requests=10,
)
@serve.ingress(app)
class VLLMDeployment:
    def __init__(
        self,
        engine_args: AsyncEngineArgs,
        response_role: str,
        request_logger: Optional[RequestLogger] = None,
        chat_template: Optional[str] = None,
        chat_template_content_format: Optional[str] = None,
    ):
        logger.info(f"Starting with engine args: {engine_args}")
        #os.environ.pop("CUDA_VISIBLE_DEVICES", None)
        self.openai_serving_chat = None
        self.engine_args = engine_args
        self.response_role = response_role
        self.request_logger = request_logger
        self.chat_template = chat_template
        self.chat_template_content_format = chat_template_content_format
        self.engine = AsyncLLMEngine.from_engine_args(engine_args)


    @app.post("/v1/chat/completions")
    async def create_chat_completion(
        self, request: ChatCompletionRequest, raw_request: Request
    ):
        if not self.openai_serving_chat:
            model_config1 = await self.engine.get_model_config()
            # Determine the name of the served model for the OpenAI client.
            if self.engine_args.served_model_name is not None:
                served_model_names = self.engine_args.served_model_name
            else:
                served_model_names = [self.engine_args.model]
            serving_models = OpenAIServingModels(
                    engine_client=self.engine,
                    model_config=model_config1,
                    base_model_paths=local_models,
                    lora_modules=None,
                    prompt_adapters=None,
                    )
            self.openai_serving_chat = OpenAIServingChat(
                self.engine,
                model_config1,
                serving_models,
                self.response_role,
                request_logger=self.request_logger,
                chat_template=self.chat_template,
                chat_template_content_format=self.chat_template_content_format,
            )
        logger.info(f"Request: {request}")

        generator = await self.openai_serving_chat.create_chat_completion(
            request, raw_request
        )
        if isinstance(generator, ErrorResponse):
            return JSONResponse(
                content=generator.model_dump(), status_code=generator.code
            )
        if request.stream:
            return StreamingResponse(content=generator, media_type="text/event-stream")
        else:
            assert isinstance(generator, ChatCompletionResponse)
            return JSONResponse(content=generator.model_dump())


def parse_vllm_args(cli_args: Dict[str, str]):
    arg_parser = FlexibleArgumentParser(
        description="vLLM OpenAI-Compatible RESTful API server."
    )

    parser = make_arg_parser(arg_parser)
    arg_strings = []
    for key, value in cli_args.items():
        arg_strings.extend([f"--{key}", str(value)])
    logger.info(arg_strings)
    parsed_args = parser.parse_args(args=arg_strings)
    return parsed_args


def build_app(cli_args: Dict[str, str]) -> serve.Application:
    if "accelerator" in cli_args.keys():
        accelerator = cli_args.pop("accelerator")
    else:
        accelerator = "GPU"
    parsed_args = parse_vllm_args(cli_args)
    engine_args = AsyncEngineArgs.from_cli_args(parsed_args)
    engine_args.worker_use_ray = True

    tp = engine_args.tensor_parallel_size
    pp = engine_args.pipeline_parallel_size
    logger.info(f"Tensor parallelism = {tp}")
    pg_resources = []
    pg_resources.append({"CPU": 1})  # for the deployment replica
    for i in range(tp*pp):
        pg_resources.append({"GPU": 1, accelerator: 1})  # for the vLLM actors

    # We use the "STRICT_PACK" strategy below to ensure all vLLM actors are placed on
    # the same Ray node.But here may multi-node use spread
    return VLLMDeployment.options(
        placement_group_bundles=pg_resources, placement_group_strategy="SPREAD"
    ).bind(
        engine_args,
        parsed_args.response_role,
        cli_args.get("request_logger"),
        parsed_args.chat_template,
    )

Issue Severity

High: It blocks me from completing my task.

@rainmaple rainmaple added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Feb 28, 2025
@jcotant1 jcotant1 added the serve Ray Serve Related Issue label Feb 28, 2025
@rainmaple
Copy link
Author

Add-on:
VLLM without rayserve in 0.7.2 have no problem.

@rainmaple rainmaple changed the title [<Ray component: Serve>] DeepSeek-R1 mode load hang in H20 [<Ray component: Serve>] DeepSeek-R1 mode load stuck in H20 Feb 28, 2025
@rainmaple
Copy link
Author

The Log

(pid=391137, ip=172.16.3.243) 2025-02-27 23:02:23.633367: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [repeated 15x across cluster]
(RayWorkerWrapper pid=752474) WARNING 02-27 23:02:34 triton_decode_attention.py:44] The following error message 'operation scheduled before its operands' can be ignored.
(RayWorkerWrapper pid=391137, ip=172.16.3.243) INFO 02-27 23:02:26 __init__.py:207] Automatically detected platform cuda. [repeated 15x across cluster]
(RayWorkerWrapper pid=391137, ip=172.16.3.243) INFO 02-27 23:02:28 cuda.py:160] Using Triton MLA backend. [repeated 15x across cluster]
(RayWorkerWrapper pid=752474) 2025-02-27 23:02:34,618 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
(ServeController pid=752460) WARNING 2025-02-27 23:03:02,813 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
(RayWorkerWrapper pid=391137, ip=172.16.3.243) 2025-02-27 23:02:35,145 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend [repeated 15x across cluster]
^@(ServeController pid=752460) WARNING 2025-02-27 23:03:32,850 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:04:02,872 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:04:32,886 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:05:02,907 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:05:32,940 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
                                                                          (ServeController pid=752460) WARNING 2025-02-27 23:06:02,962 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:06:32,977 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:07:02,979 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:07:32,988 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:08:02,999 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.
(ServeController pid=752460) This may be caused by a slow __init__ or reconfigure method.
^@(ServeController pid=752460) WARNING 2025-02-27 23:08:33,003 controller 752460 -- Deployment 'VLLMDeployment' in application 'default' has 1 replicas that have taken more than 30s to initialize.

@rainmaple
Copy link
Author

rainmaple commented Feb 28, 2025

serve run serve-r1:build_app model="/data/model/DeepSeek-R1" pipeline-parallel-size=2 tensor-parallel-size=8 accelerator="GPU"

the stuck front log

(ServeReplica:default:VLLMDeployment pid=154481) INFO 2025-03-04 17:17:25,889 default_VLLMDeployment yy0677wb -- Starting with engine args: AsyncEngineArgs(model='/data/model/DeepSeek-R1', served_model_name=None, tokenizer='/data/model/DeepSeek-R1', task='auto', skip_tokenizer_init=False, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', seed=0, max_model_len=None, distributed_executor_backend=None, pipeline_parallel_size=2, tensor_parallel_size=8, max_parallel_loading_workers=None, block_size=None, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=True, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, revision=None, code_revision=None, rope_scaling=None, rope_theta=None, hf_overrides=None, tokenizer_revision=None, quantization=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, fully_sharded_loras=False, lora_extra_vocab_size=256, long_lora_scaling_factors=None, lora_dtype='auto', max_cpu_loras=None, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, ray_workers_use_nsight=False, num_gpu_blocks_override=None, num_lookahead_slots=0, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, scheduler_delay_factor=0.0, enable_chunked_prefill=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, speculative_model=None, speculative_model_quantization=None, speculative_draft_tensor_parallel_size=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, qlora_adapter_name_or_path=None, disable_logprobs_during_spec_decoding=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, worker_cls='auto', kv_transfer_config=None, generation_config=None, override_generation_config=None, enable_sleep_mode=False, model_impl='auto', calculate_kv_scales=False, disable_log_requests=False)
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO cudaDriverVersion 12020 [repeated 13x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO Bootstrap : Using eth0:172.16.3.243<0> [repeated 13x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so) [repeated 13x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so [repeated 13x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO NET/Plugin: Using internal network plugin. [repeated 13x across cluster]
(RayWorkerWrapper pid=154466) adbpg-h20-test:154466:154466 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] v [repeated 4x across cluster]
(RayWorkerWrapper pid=4189, ip=172.16.3.243) adbpg-h20-test-2:4189:4189 [0] NCCL INFO Connected all rings [repeated 6x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO Connected all trees [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO NVLS comm 0x105c64e0 headRank 5 nHeads 8 buffSize 1048576 memSize 2097152 nvlsPerRankSize 150994944 nvlsTotalSize 1207959552 [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO 24 coll channels, 24 collnet channels, 16 nvls channels, 32 p2p channels, 32 p2p channels per peer [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO TUNER/Plugin: Plugin load returned 11 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-tuner.so [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO TUNER/Plugin: Using internal tuner plugin. [repeated 7x across cluster]
(RayWorkerWrapper pid=4192, ip=172.16.3.243) adbpg-h20-test-2:4192:4192 [5] NCCL INFO ncclCommInitRank comm 0x105c64e0 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 60 commId 0x3a381d08a49786ca - Init COMPLETE [repeated 7x across cluster]
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:41 config.py:137] Replacing legacy 'type' key with 'rope_type'
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:50 config.py:542] This model supports multiple tasks: {'embed', 'generate', 'classify', 'score', 'reward'}. Defaulting to 'generate'.
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:51 config.py:1401] Defaulting to use ray for distributed inference
(ServeReplica:default:VLLMDeployment pid=153450) WARNING 03-04 17:16:51 arg_utils.py:1135] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:51 config.py:1556] Chunked prefill is enabled with max_num_batched_tokens=2048.
(ServeReplica:default:VLLMDeployment pid=153450) WARNING 03-04 17:16:51 config.py:669] Async output processing can not be enabled with pipeline parallel
(ServeReplica:default:VLLMDeployment pid=153450) WARNING 03-04 17:16:51 fp8.py:52] Detected fp8 checkpoint. Please note that the format is experimental and subject to change.
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:51 config.py:3275] MLA is enabled; forcing chunked prefill and prefix caching to be disabled.
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:51 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.2) with config: model='/data/model/DeepSeek-R1', speculative_config=None, tokenizer='/data/model/DeepSeek-R1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=2, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/data/model/DeepSeek-R1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False, 
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:16:51 ray_distributed_executor.py:149] use_ray_spmd_worker: False
(pid=5081, ip=172.16.3.243) INFO 03-04 17:16:57 __init__.py:190] Automatically detected platform cuda.
(pid=154809) INFO 03-04 17:17:02 __init__.py:190] Automatically detected platform cuda. [repeated 9x across cluster]
(ServeReplica:default:VLLMDeployment pid=153450) INFO 03-04 17:17:07 cuda.py:161] Using Triton MLA backend.
(ServeReplica:default:VLLMDeployment pid=153450) WARNING 03-04 17:17:07 triton_decode_attention.py:44] The following error message 'operation scheduled before its operands' can be ignored.
(pid=155060) INFO 03-04 17:17:02 __init__.py:190] Automatically detected platform cuda. [repeated 6x across cluster]
(RayWorkerWrapper pid=5080, ip=172.16.3.243) INFO 03-04 17:17:07 cuda.py:161] Using Triton MLA backend. [repeated 15x across cluster]
(RayWorkerWrapper pid=5080, ip=172.16.3.243) WARNING 03-04 17:17:07 triton_decode_attention.py:44] The following error message 'operation scheduled before its operands' can be ignored. [repeated 15x across cluster]
(ServeReplica:default:VLLMDeployment pid=154481) INFO 03-04 17:17:24 __init__.py:190] Automatically detected platform cuda.
(ServeReplica:default:VLLMDeployment pid=154481) INFO 03-04 17:17:25 config.py:137] Replacing legacy 'type' key with 'rope_type'
(ServeReplica:default:VLLMDeployment pid=154481) INFO 03-04 17:17:33 config.py:542] This model supports multiple tasks: {'generate', 'reward', 'classify', 'embed', 'score'}. Defaulting to 'generate'.
(ServeReplica:default:VLLMDeployment pid=154481) INFO 03-04 17:17:34 config.py:1401] Defaulting to use ray for distributed inference
(ServeReplica:default:VLLMDeployment pid=154481) WARNING 03-04 17:17:34 arg_utils.py:1135] Chunked prefill is enabled by default for models with max_model_len > 32K. Currently, chunked prefill might not work with some features or models. If you encounter any issues, please disable chunked prefill by setting --enable-chunked-prefill=False.
(ServeReplica:default:VLLMDeployment pid=154481) INFO 03-04 17:17:34 config.py:1556] Chunked prefill is enabled with max_num_batched_tokens=2048.
(ServeReplica:default:VLLMDeployment pid=154481) WARNING 03-04 17:17:34 config.py:669] Async output processing can not be enabled with pipeline parallel
(ServeReplica:default:VLLMDeployment pid=154481) SIGTERM handler is not set because current thread is not the main thread.
(ServeReplica:default:VLLMDeployment pid=154481) Connecting to existing Ray cluster at address: 172.16.3.241:6379...
(ServeReplica:default:VLLMDeployment pid=154481) Calling ray.init() again after it has already been called.
(ServeReplica:default:VLLMDeployment pid=154481) SIGTERM handler is not set because current thread is not the main thread.
(ServeReplica:default:VLLMDeployment pid=154481) Connecting to existing Ray cluster at address: 172.16.3.241:6379...
(ServeReplica:default:VLLMDeployment pid=154481) Calling ray.init() again after it has already been called.
(pid=155233) 2025-03-04 17:17:38.699291: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(pid=155233) 2025-03-04 17:17:38.699291: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
(pid=154590) 2025-03-04 17:17:38.744104: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
(pid=154590) 2025-03-04 17:17:38.744130: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
(pid=154590) 2025-03-04 17:17:38.745127: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
(pid=154590) 2025-03-04 17:17:38.750880: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
(pid=154590) 2025-03-04 17:17:38.744104: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
(pid=154590) To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=154590) 2025-03-04 17:17:38.744130: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
(pid=154590) 2025-03-04 17:17:38.745127: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
(pid=154590) 2025-03-04 17:17:38.750880: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
(pid=154590) To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=155233) 2025-03-04 17:17:39.716617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
(pid=155233) 2025-03-04 17:17:39.716617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
(base) [root@adbpg-h20-test ~]# serve status
proxies:
  db01d89de07c2e5bdec70c9a163d8f70543ede8b5322f5e449f4f3f4: HEALTHY
applications:
  default:
    status: DEPLOYING
    message: ''
    last_deployed_time_s: 1741079754.8409045
    deployments:
      VLLMDeployment:
        status: UPDATING
        status_trigger: CONFIG_UPDATE_STARTED
        replica_states:
          STARTING: 1
        message: ''
target_capacity: null
(base) [root@adbpg-h20-test ~]# 

pid=154590

(gdb) bt
#0  0x00007feeabee0e05 in nanosleep () from /usr/lib64/libpthread.so.0
#1  0x00007fcc85229257 in c10d::detail::(anonymous namespace)::SocketConnectOp::tryConnect(int) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#2  0x00007fcc85229f36 in c10d::detail::(anonymous namespace)::SocketConnectOp::run() [clone .constprop.0] ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#3  0x00007fcc8522a3a4 in c10d::detail::Socket::connect(std::string const&, unsigned short, c10d::detail::SocketOptions const&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#4  0x00007fcc851e8016 in c10d::detail::TCPClient::connect(c10d::detail::SocketAddress const&, c10d::TCPStoreOptions const&, std::shared_ptr<c10d::Backoff>) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#5  0x00007fcc851eaf7c in c10d::TCPStore::TCPStore(std::string, c10d::TCPStoreOptions const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007fccb179b885 in pybind11::cpp_function::initialize<pybind11::detail::initimpl::factory<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::{lambda(std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool)#38}, pybind11::detail::void_type (*)(), c10::intrusive_ptr<c10d::TCPStore, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > (std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool), pybind11::detail::void_type>::execute<pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg_v, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >&, pybind11::arg const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::arg_v const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::call_guard<pybind11::gil_scoped_release> const&) &&::{lambda(pybind11::detail::value_and_holder&, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool)#1}, void, pybind11::detail::value_and_holder, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::gil_scoped_release>(pybind11::call_guard<pybind11::gil_scoped_release>&&, void (*)(pybind11::detail::value_and_holder, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::gil_scoped_release const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#7  0x00007fccb0ecb9ff in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#8  0x0000000000507767 in cfunction_call (func=0x7fee383b2a90, args=<optimized out>, kwargs=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/descrobject.c:543
#9  0x00000000004f077c in _PyObject_MakeTpCall (tstate=0xdcb0a0, callable=0x7fee383b2a90, args=<optimized out>, nargs=<optimized out>, keywords=0x7fcb50fc7400)
    at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#10 0x0000000000505690 in _PyObject_VectorcallTstate (kwnames=0x7fcb50fc7400, nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    args=0x7fcb75f23a70, callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:116
#11 _PyObject_VectorcallTstate (kwnames=0x7fcb50fc7400, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=<optimized out>, 
    args=0x7fcb75f23a70, args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, callable=0x7fee383b2a90, 
    callable@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xdcb0a0, 
    tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#12 method_vectorcall (method=method@entry=0x7fcb75f23e40, args=args@entry=0x7fcb75f23a78, nargsf=<optimized out>, kwnames=0x7fcb50fc7400)
    at /usr/local/src/conda/python-3.9.21/Programs/_functoolsmodule.c:53
#13 0x0000000000505e37 in PyVectorcall_Call (callable=0x7fcb75f23e40, tuple=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#14 0x0000000000502ae2 in slot_tp_init (self=<optimized out>, args=0x7fcb50fd2ae0, kwds=0x7fcb75f26680) at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:6974
#15 0x00000000004f0c30 in type_call (type=<optimized out>, args=0x7fcb50fd2ae0, kwds=0x7fcb75f26680) at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#16 0x00007fccb0eca2db in pybind11_meta_call () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
--Type <RET> for more, q to quit, c to continue without paging--  
#17 0x00000000004f077c in _PyObject_MakeTpCall (tstate=0xdcb0a0, callable=0x5b89390, args=<optimized out>, nargs=<optimized out>, keywords=0x7fee3830b500)
    at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#18 0x00000000004ece2c in _PyObject_VectorcallTstate (kwnames=0x7fee3830b500, nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    args=<optimized out>, callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:116
#19 _PyObject_VectorcallTstate (kwnames=0x7fee3830b500, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#20 PyObject_Vectorcall (kwnames=0x7fee3830b500, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#21 call_function (kwnames=0x7fee3830b500, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xe6d2980, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3537
#23 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#24 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0xde5b430, kwcount=<optimized out>, kwstep=1, defs=0x7fee38309778, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fee3830a710, qualname=0x7fee3830a710)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#25 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#26 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xde5b400, callable=0x7fee383049d0, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#27 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xde5b400, callable=0x7fee383049d0) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#28 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#29 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xde5b240, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#30 0x000000000050bf0c in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#31 gen_send_ex (gen=0x7fcb6efcc510, arg=<optimized out>, exc=<optimized out>, closing=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/cellobject.c:215
#32 0x0000000000574090 in gen_iternext (gen=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/cellobject.c:547
#33 builtin_next (self=<optimized out>, args=args@entry=0xd535da8, nargs=1) at /usr/local/src/conda/python-3.9.21/Modules/getplatform.c:1382
#34 0x00000000004f8974 in cfunction_vectorcall_FASTCALL (func=0x7feea55c8540, args=0xd535da8, nargsf=<optimized out>, kwnames=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/descrobject.c:430
#35 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xd535da8, callable=0x7feea55c8540, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xd535da8, callable=0x7feea55c8540) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#37 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#38 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xd535bc0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#39 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#40 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7fcb6ef8e9b8, kwargs=0x7fcb6ef93338, kwcount=<optimized out>, kwstep=1, defs=0x7fee382fdbb8, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fee3834a5d0, 
    qualname=0x7fee3834a5d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#41 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#42 0x0000000000505bb4 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7fee3831a700) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#43 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7fee3831a700, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:266
#44 PyObject_Call (callable=0x7fee3831a700, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:293
#45 0x00000000004ebea1 in do_call_core (kwdict=0x7fcb75f252c0, callargs=0x7feea55f9040, func=0x7fee3831a700, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#46 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb889047c0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#47 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#48 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7fcb6ea2d558, kwargs=0x7fcb6ef93578, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7fee383130d0, name=0x7fee3834a5d0, 
    qualname=0x7fee3834a5d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#49 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#50 0x0000000000505bb4 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7fee3831a790) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#51 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7fee3831a790, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:266
#52 PyObject_Call (callable=0x7fee3831a790, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:293
#53 0x00000000004ebea1 in do_call_core (kwdict=0x7fcb70dcd100, callargs=0x7feea55f9040, func=0x7fee3831a790, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#54 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb70e73440, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#55 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#56 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7fccd00d95f8, kwargs=0x7fcb6ef3af70, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7fee38313400, name=0x7fee3834a5d0, 
    qualname=0x7fee3834a5d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#57 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#58 0x00000000004e8c42 in _PyObject_VectorcallTstate (kwnames=0x7fccd00d95e0, nargsf=<optimized out>, args=<optimized out>, callable=0x7fee3831a820, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#59 PyObject_Vectorcall (kwnames=0x7fccd00d95e0, nargsf=<optimized out>, args=<optimized out>, callable=0x7fee3831a820)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#60 call_function (kwnames=0x7fccd00d95e0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#61 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb6ef3add0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3537
#62 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
--Type <RET> for more, q to quit, c to continue without paging--
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#63 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7fcb6e8dab08, kwcount=<optimized out>, kwstep=1, defs=0x7fccd01c7eb8, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fccd00d9620, qualname=0x7fccd00d9620)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#64 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#65 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fcb6e8daae8, callable=0x7fcbca118700, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#66 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fcb6e8daae8, callable=0x7fcbca118700) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#67 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#68 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb6e8da950, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#69 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#70 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7fcb6eff49d8, kwcount=<optimized out>, kwstep=1, defs=0x7fcb6f1e2d58, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fcb6ef891b0, qualname=0x7fcb6ef891b0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#71 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#72 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7fcb6eff49b8, callable=0x7fcb6eff2430, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#73 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fcb6eff49b8, callable=0x7fcb6eff2430) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#74 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#75 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb6eff4840, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#76 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#77 function_code_fastcall (tstate=0xdcb0a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7fcb6f270c00)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
#78 0x0000000000505605 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7ffe0a3d51b8, callable=0x7fcb6e9cad30, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#79 method_vectorcall (method=<optimized out>, args=0x7feea55f9058, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.9.21/Programs/_functoolsmodule.c:61
#80 0x00000000004ebea1 in do_call_core (kwdict=0x7fcb75f1b700, callargs=0x7feea55f9040, func=0x7fcb75f1f800, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#81 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb6eff7420, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#82 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#83 function_code_fastcall (tstate=0xdcb0a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7fcbcaf0b800)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
--Type <RET> for more, q to quit, c to continue without paging--
#84 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xd081158, callable=0x7fcbcabf3d30, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#85 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xd081158, callable=0x7fcbcabf3d30) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#86 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#87 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xd080fb0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#88 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#89 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7feea13358a8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7fcb88a37170, qualname=0x7fcb889c66f0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#90 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#91 0x00000000004ebea1 in do_call_core (kwdict=0x7fcb75f23580, callargs=0x7feea1335880, func=0x7fcb88999040, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#92 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb75f1d240, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#93 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#94 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7feea02be368, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x7fcb75f66040, closure=0x7fcb75f58d60, name=0x7fcb75f63a30, 
    qualname=0x7fcb75fb7270) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#95 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#96 0x00000000004ebea1 in do_call_core (kwdict=0x7fcb75f17b00, callargs=0x7feea02be340, func=0x7fcb75f603a0, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#97 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7fcb6f00fcf0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#98 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#99 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7fcb75f8c168, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7fcb75f59380, name=0x7feea13c0210, qualname=0x7feea13be3b0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#100 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#101 0x00007feea3cbcd3f in __Pyx_PyObject_Call(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#102 0x00007feea3ce6411 in __pyx_pw_3ray_7_raylet_12execute_task_3function_executor(_object*, _object*, _object*) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#103 0x00007feea3cbcd3f in __Pyx_PyObject_Call(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#104 0x00007feea3dac688 in __pyx_f_3ray_7_raylet_task_execution_handler(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<s--Type <RET> for more, q to quit, c to continue without paging--
td::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#105 0x00007feea3cc3149 in std::_Function_handler<ray::Status (ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long), ray::Status (*)(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long)>::_M_invoke(std::_Any_data const&, ray::rpc::Address const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::string*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&, bool&&, bool&&, bool&&, long&&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#106 0x00007feea3eee64e in ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#107 0x00007feea3e0c288 in std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>, std::_Placeholder<7>, std::_Placeholder<8>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&, std::string*&&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
--Type <RET> for more, q to quit, c to continue without paging--
#108 0x00007feea3ef6ef6 in ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}::operator()(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) const () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#109 0x00007feea3ef82ca in std::_Function_handler<void (ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>), ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}>::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>&&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#110 0x00007feea3f344a2 in ray::core::InboundRequest::Accept() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#111 0x00007feea3f356d4 in ray::core::ActorSchedulingQueue::AcceptRequestOrRejectIfCanceled(ray::TaskID, ray::core::InboundRequest&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#112 0x00007feea3f371bb in ray::core::ActorSchedulingQueue::ScheduleRequests() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#113 0x00007feea3f3a068 in ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::TaskSpecification const&, ray::Status const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, ray::TaskSpecification) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#114 0x00007feea3ef9663 in ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#115 0x00007feea3e2bc8d in ray::core::CoreWorker::HandlePushTask(ray::rpc::PushTaskRequest, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda()#1}::operator()() const () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#116 0x00007feea4207ef8 in EventTracker::RecordExecution(std::function<void ()> const&, std::shared_ptr<StatsHandle>) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#117 0x00007feea4202ece in std::_Function_handler<void (), instrumented_io_context::post(std::function<void ()>, std::string const&, long)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#118 0x00007feea4203346 in boost::asio::detail::completion_handler<std::function<void ()>, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#119 0x00007feea47da9ab in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#120 0x00007feea47dc329 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#121 0x00007feea47dca32 in boost::asio::io_context::run() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#122 0x00007feea3e2917d in ray::core::CoreWorker::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#123 0x00007feea3efbaf1 in ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#124 0x00007feea3efbd0d in ray::core::CoreWorkerProcess::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#125 0x00007feea3cbdad7 in __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop(_object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#126 0x00000000004fcb78 in method_vectorcall_NOARGS (func=0x7feea50944f0, args=0x7feea02b6be0, nargsf=<optimized out>, kwnames=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/tupleobject.c:438
#127 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7feea02b6be0, callable=0x7feea50944f0, tstate=0xdcb0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#128 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7feea02b6be0, callable=0x7feea50944f0) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#129 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#130 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7feea02b6a60, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#131 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#132 function_code_fastcall (tstate=0xdcb0a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7feea1914a80)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
#133 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xdc3520, callable=0x7feea1370310, tstate=0xdcb0a0)
--Type <RET> for more, q to quit, c to continue without paging-- 
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#134 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xdc3520, callable=0x7feea1370310) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#135 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xdcb0a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#136 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xdc33b0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#137 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#138 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#139 0x00000000004e6787 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4361
#140 0x00000000004e6739 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4377
#141 0x00000000005942bb in PyEval_EvalCode (co=co@entry=0x7feea54fc500, globals=globals@entry=0x7feea5566880, locals=locals@entry=0x7feea5566880)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:828
#142 0x00000000005c1777 in run_eval_code_obj (tstate=0xdcb0a0, co=0x7feea54fc500, globals=0x7feea5566880, locals=0x7feea5566880)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1221
#143 0x00000000005bd780 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7feea5566880, locals=0x7feea5566880, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1242
#144 0x0000000000456695 in pyrun_file (fp=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    filename=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, start=<optimized out>, 
    globals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    locals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    closeit=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, flags=0x7ffe0a3da4f8)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1140
#145 0x00000000005b7462 in pyrun_simple_file (flags=0x7ffe0a3da4f8, closeit=1, filename=0x7feea53bc780, fp=0xdecd00)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:450
#146 PyRun_SimpleFileExFlags (fp=0xdecd00, filename=<optimized out>, closeit=1, flags=0x7ffe0a3da4f8) at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:483
#147 0x00000000005b49de in pymain_run_file (cf=0x7ffe0a3da4f8, config=0xdc88e0) at /croot/python-split_1733933781842/work/build-static/python.c:379
#148 pymain_run_python (exitcode=0x7ffe0a3da4f0, exitcode@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/work/build-static/python.c:608
#149 Py_RunMain () at /croot/python-split_1733933781842/work/build-static/python.c:687
#150 0x0000000000588369 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /croot/python-split_1733933781842/work/build-static/python.c:1133
#151 0x00007feeabbdb193 in __libc_start_main () from /usr/lib64/libc.so.6
#152 0x000000000058821e in _start () at /usr/local/src/conda/python-3.9.21/Modules/future.c:5313

pid=155233

(gdb) bt
#0  0x00007faaed750e05 in nanosleep () from /usr/lib64/libpthread.so.0
#1  0x00007f88c9229257 in c10d::detail::(anonymous namespace)::SocketConnectOp::tryConnect(int) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#2  0x00007f88c9229f36 in c10d::detail::(anonymous namespace)::SocketConnectOp::run() [clone .constprop.0] ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#3  0x00007f88c922a3a4 in c10d::detail::Socket::connect(std::string const&, unsigned short, c10d::detail::SocketOptions const&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#4  0x00007f88c91e8016 in c10d::detail::TCPClient::connect(c10d::detail::SocketAddress const&, c10d::TCPStoreOptions const&, std::shared_ptr<c10d::Backoff>) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#5  0x00007f88c91eaf7c in c10d::TCPStore::TCPStore(std::string, c10d::TCPStoreOptions const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so
#6  0x00007f88e6f9b885 in pybind11::cpp_function::initialize<pybind11::detail::initimpl::factory<torch::distributed::c10d::(anonymous namespace)::c10d_init(_object*, _object*)::{lambda(std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool)#38}, pybind11::detail::void_type (*)(), c10::intrusive_ptr<c10d::TCPStore, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > (std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool), pybind11::detail::void_type>::execute<pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg_v, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::call_guard<pybind11::gil_scoped_release> >(pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >&, pybind11::arg const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::arg_v const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::call_guard<pybind11::gil_scoped_release> const&) &&::{lambda(pybind11::detail::value_and_holder&, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool)#1}, void, pybind11::detail::value_and_holder, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> >, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::gil_scoped_release>(pybind11::call_guard<pybind11::gil_scoped_release>&&, void (*)(pybind11::detail::value_and_holder, std::string const&, unsigned short, std::optional<int>, bool, std::chrono::duration<long, std::ratio<1l, 1000l> >, bool, bool, std::optional<int>, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::class_<c10d, pybind11::detail::void_type (*)()::detail::intrusive_target_default_null_type<c10d> > const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::gil_scoped_release const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#7  0x00007f88e66cb9ff in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
#8  0x0000000000507767 in cfunction_call (func=0x7faa74475a90, args=<optimized out>, kwargs=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/descrobject.c:543
#9  0x00000000004f077c in _PyObject_MakeTpCall (tstate=0x26f70a0, callable=0x7faa74475a90, args=<optimized out>, nargs=<optimized out>, keywords=0x7f879284f840)
    at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#10 0x0000000000505690 in _PyObject_VectorcallTstate (kwnames=0x7f879284f840, nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    args=0x7f87c83ef5b0, callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:116
#11 _PyObject_VectorcallTstate (kwnames=0x7f879284f840, kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=<optimized out>, 
    args=0x7f87c83ef5b0, args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, callable=0x7faa74475a90, 
    callable@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x26f70a0, 
    tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#12 method_vectorcall (method=method@entry=0x7f87c83eff80, args=args@entry=0x7f87c83ef5b8, nargsf=<optimized out>, kwnames=0x7f879284f840)
    at /usr/local/src/conda/python-3.9.21/Programs/_functoolsmodule.c:53
#13 0x0000000000505e37 in PyVectorcall_Call (callable=0x7f87c83eff80, tuple=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#14 0x0000000000502ae2 in slot_tp_init (self=<optimized out>, args=0x7f8792853b80, kwds=0x7f87c83e78c0) at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:6974
#15 0x00000000004f0c30 in type_call (type=<optimized out>, args=0x7f8792853b80, kwds=0x7f87c83e78c0) at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#16 0x00007f88e66ca2db in pybind11_meta_call () from /home/ray/anaconda3/lib/python3.9/site-packages/torch/lib/libtorch_python.so
--Type <RET> for more, q to quit, c to continue without paging--
#17 0x00000000004f077c in _PyObject_MakeTpCall (tstate=0x26f70a0, callable=0x74b4610, args=<optimized out>, nargs=<optimized out>, keywords=0x7faa743d0300)
    at /usr/local/src/conda/python-3.9.21/Modules/condvar.h:3876
#18 0x00000000004ece2c in _PyObject_VectorcallTstate (kwnames=0x7faa743d0300, nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    args=<optimized out>, callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:116
#19 _PyObject_VectorcallTstate (kwnames=0x7faa743d0300, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#20 PyObject_Vectorcall (kwnames=0x7faa743d0300, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#21 call_function (kwnames=0x7faa743d0300, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#22 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xfef9870, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3537
#23 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#24 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0xf786990, kwcount=<optimized out>, kwstep=1, defs=0x7faa743cd6e8, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7faa743cf6c0, qualname=0x7faa743cf6c0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#25 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#26 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xf786960, callable=0x7faa743ca9d0, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#27 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xf786960, callable=0x7faa743ca9d0) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#28 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#29 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xf7867a0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#30 0x000000000050bf0c in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#31 gen_send_ex (gen=0x7f87b082e580, arg=<optimized out>, exc=<optimized out>, closing=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/cellobject.c:215
#32 0x0000000000574090 in gen_iternext (gen=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/cellobject.c:547
#33 builtin_next (self=<optimized out>, args=args@entry=0xfef90a8, nargs=1) at /usr/local/src/conda/python-3.9.21/Modules/getplatform.c:1382
#34 0x00000000004f8974 in cfunction_vectorcall_FASTCALL (func=0x7faae6e38540, args=0xfef90a8, nargsf=<optimized out>, kwnames=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/descrobject.c:430
#35 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0xfef90a8, callable=0x7faae6e38540, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0xfef90a8, callable=0x7faae6e38540) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#37 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#38 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xfef8ec0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#39 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#40 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7f87b07f2f58, kwargs=0x7f87b07f3698, kwcount=<optimized out>, kwstep=1, defs=0x7faa743c2bb8, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7faa744105d0, 
    qualname=0x7faa744105d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#41 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#42 0x0000000000505bb4 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7faa743df700) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#43 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7faa743df700, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:266
#44 PyObject_Call (callable=0x7faa743df700, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:293
#45 0x00000000004ebea1 in do_call_core (kwdict=0x7f87c83ef440, callargs=0x7faae6e69040, func=0x7faa743df700, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#46 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87ca1847c0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#47 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#48 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7f87b07efc38, kwargs=0x7f87c844de18, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7faa743d8040, name=0x7faa744105d0, 
    qualname=0x7faa744105d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#49 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#50 0x0000000000505bb4 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=0x7faa743df790) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:243
#51 _PyObject_Call (kwargs=<optimized out>, args=<optimized out>, callable=0x7faa743df790, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:266
#52 PyObject_Call (callable=0x7faa743df790, args=<optimized out>, kwargs=<optimized out>) at /usr/local/src/conda/python-3.9.21/Include/abstract.h:293
#53 0x00000000004ebea1 in do_call_core (kwdict=0x7f87c826b5c0, callargs=0x7faae6e69040, func=0x7faa743df790, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#54 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87c8311440, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#55 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#56 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=0x7f890819a648, kwargs=0x7f87b07a69a0, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7faa743d8370, name=0x7faa744105d0, 
    qualname=0x7faa744105d0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#57 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#58 0x00000000004e8c42 in _PyObject_VectorcallTstate (kwnames=0x7f890819a630, nargsf=<optimized out>, args=<optimized out>, callable=0x7faa743df820, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#59 PyObject_Vectorcall (kwnames=0x7f890819a630, nargsf=<optimized out>, args=<optimized out>, callable=0x7faa743df820)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#60 call_function (kwnames=0x7f890819a630, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#61 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87b07a6800, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3537
#62 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
--Type <RET> for more, q to quit, c to continue without paging--
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#63 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7f87b015bb08, kwcount=<optimized out>, kwstep=1, defs=0x7f8908280058, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7f890819a670, qualname=0x7f890819a670)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#64 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#65 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f87b015bae8, callable=0x7f880b964700, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#66 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f87b015bae8, callable=0x7f880b964700) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#67 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#68 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87b015b950, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#69 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#70 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7f87b08539d8, kwcount=<optimized out>, kwstep=1, defs=0x7f87b0a46718, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7f87b07eb030, qualname=0x7f87b07eb030)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#71 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#72 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f87b08539b8, callable=0x7f87b0852160, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#73 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f87b08539b8, callable=0x7f87b0852160) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#74 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#75 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87b0853840, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#76 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#77 function_code_fastcall (tstate=0x26f70a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7f87b09ef700)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
#78 0x0000000000505605 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7ffe9d9366c8, callable=0x7f87b024ed30, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:103
#79 method_vectorcall (method=<optimized out>, args=0x7faae6e69058, nargsf=<optimized out>, kwnames=0x0) at /usr/local/src/conda/python-3.9.21/Programs/_functoolsmodule.c:61
#80 0x00000000004ebea1 in do_call_core (kwdict=0x7f87c83ed7c0, callargs=0x7faae6e69040, func=0x7faae0339a40, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#81 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87b085a230, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#82 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#83 function_code_fastcall (tstate=0x26f70a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7f880c773840)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
--Type <RET> for more, q to quit, c to continue without paging--
#84 0x00000000004e7dd5 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7ad3548, callable=0x7f880c6dbd30, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#85 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7ad3548, callable=0x7f880c6dbd30) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#86 call_function (kwnames=0x0, oparg=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, pp_stack=<synthetic pointer>, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#87 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7ad33a0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3520
#88 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#89 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7faae2ba7968, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x7f87ca2b4430, qualname=0x7f87ca243750)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#90 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#91 0x00000000004ebea1 in do_call_core (kwdict=0x7f87c83ed140, callargs=0x7faae2ba7940, func=0x7f87ca216040, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#92 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87c83e9240, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#93 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#94 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7faae032b4a8, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x7f87c8431180, closure=0x7f87c8422c70, name=0x7f87c842db70, 
    qualname=0x7f87c8481270) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#95 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#96 0x00000000004ebea1 in do_call_core (kwdict=0x7f87c83e1cc0, callargs=0x7faae032b480, func=0x7f87c842b3a0, tstate=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5125
#97 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f87b0870cf0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3582
#98 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#99 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x7f87c8456328, kwcount=<optimized out>, kwstep=1, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x7f87c8424500, name=0x7faae2c332b0, qualname=0x7faae2c313b0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#100 0x00000000004f7ef5 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:396
#101 0x00007faae56bcd3f in __Pyx_PyObject_Call(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#102 0x00007faae56e6411 in __pyx_pw_3ray_7_raylet_12execute_task_3function_executor(_object*, _object*, _object*) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#103 0x00007faae56bcd3f in __Pyx_PyObject_Call(_object*, _object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#104 0x00007faae57ac688 in __pyx_f_3ray_7_raylet_task_execution_handler(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<s--Type <RET> for more, q to quit, c to continue without paging--
td::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#105 0x00007faae56c3149 in std::_Function_handler<ray::Status (ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long), ray::Status (*)(ray::rpc::Address const&, ray::rpc::TaskType, std::string, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string, std::string, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*, std::string*, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string, bool, bool, bool, long)>::_M_invoke(std::_Any_data const&, ray::rpc::Address const&, ray::rpc::TaskType&&, std::string&&, ray::core::RayFunction const&, std::unordered_map<std::string, double, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, double> > > const&, std::vector<std::shared_ptr<ray::RayObject>, std::allocator<std::shared_ptr<ray::RayObject> > > const&, std::vector<ray::rpc::ObjectReference, std::allocator<ray::rpc::ObjectReference> > const&, std::string const&, std::string const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, std::shared_ptr<ray::LocalMemoryBuffer>&, bool*&&, std::string*&&, std::vector<ray::ConcurrencyGroup, std::allocator<ray::ConcurrencyGroup> > const&, std::string&&, bool&&, bool&&, bool&&, long&&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#106 0x00007faae58ee64e in ray::core::CoreWorker::ExecuteTask(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#107 0x00007faae580c288 in std::_Function_handler<ray::Status (ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*), std::_Bind<ray::Status (ray::core::CoreWorker::*(ray::core::CoreWorker*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>, std::_Placeholder<6>, std::_Placeholder<7>, std::_Placeholder<8>))(ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > > const&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*, bool*, std::string*)> >::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::shared_ptr<std::unordered_map<std::string, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > >, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::vector<std::pair<long, double>, std::allocator<std::pair<long, double> > > > > > >&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> >, std::allocator<std::pair<ray::ObjectID, std::shared_ptr<ray::RayObject> > > >*&&, std::vector<std::pair<ray::ObjectID, bool>, std::allocator<std::pair<ray::ObjectID, bool> > >*&&, google::protobuf::RepeatedPtrField<ray::rpc::ObjectReferenceCount>*&&, bool*&&, std::string*&&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
--Type <RET> for more, q to quit, c to continue without paging--
#108 0x00007faae58f6ef6 in ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}::operator()(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) const () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#109 0x00007faae58f82ca in std::_Function_handler<void (ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>), ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda(ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)#1}>::_M_invoke(std::_Any_data const&, ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>&&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#110 0x00007faae59344a2 in ray::core::InboundRequest::Accept() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#111 0x00007faae59356d4 in ray::core::ActorSchedulingQueue::AcceptRequestOrRejectIfCanceled(ray::TaskID, ray::core::InboundRequest&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#112 0x00007faae59371bb in ray::core::ActorSchedulingQueue::ScheduleRequests() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#113 0x00007faae593a068 in ray::core::ActorSchedulingQueue::Add(long, long, std::function<void (ray::TaskSpecification const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::TaskSpecification const&, ray::Status const&, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)>, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>, ray::TaskSpecification) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#114 0x00007faae58f9663 in ray::core::TaskReceiver::HandleTask(ray::rpc::PushTaskRequest const&, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#115 0x00007faae582bc8d in ray::core::CoreWorker::HandlePushTask(ray::rpc::PushTaskRequest, ray::rpc::PushTaskReply*, std::function<void (ray::Status, std::function<void ()>, std::function<void ()>)>)::{lambda()#1}::operator()() const () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#116 0x00007faae5c07ef8 in EventTracker::RecordExecution(std::function<void ()> const&, std::shared_ptr<StatsHandle>) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#117 0x00007faae5c02ece in std::_Function_handler<void (), instrumented_io_context::post(std::function<void ()>, std::string const&, long)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#118 0x00007faae5c03346 in boost::asio::detail::completion_handler<std::function<void ()>, boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#119 0x00007faae61da9ab in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#120 0x00007faae61dc329 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#121 0x00007faae61dca32 in boost::asio::io_context::run() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#122 0x00007faae582917d in ray::core::CoreWorker::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#123 0x00007faae58fbaf1 in ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#124 0x00007faae58fbd0d in ray::core::CoreWorkerProcess::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#125 0x00007faae56bdad7 in __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop(_object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#126 0x00000000004fcb78 in method_vectorcall_NOARGS (func=0x7faae69044a0, args=0x7faae0327be0, nargsf=<optimized out>, kwnames=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/tupleobject.c:438
#127 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7faae0327be0, callable=0x7faae69044a0, tstate=0x26f70a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#128 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7faae0327be0, callable=0x7faae69044a0) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#129 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x26f70a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#130 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7faae0327a60, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#131 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#132 function_code_fastcall (tstate=0x26f70a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7faae3187a80)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
#133 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x26ef520, callable=0x7faae2be4310, tstate=0x26f70a0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#134 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x26ef520, callable=0x7faae2be4310) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#135 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x26f70a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#136 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x26ef3b0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#137 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#138 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#139 0x00000000004e6787 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4361
#140 0x00000000004e6739 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4377
#141 0x00000000005942bb in PyEval_EvalCode (co=co@entry=0x7faae6d6c500, globals=globals@entry=0x7faae6dd6880, locals=locals@entry=0x7faae6dd6880)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:828
#142 0x00000000005c1777 in run_eval_code_obj (tstate=0x26f70a0, co=0x7faae6d6c500, globals=0x7faae6dd6880, locals=0x7faae6dd6880)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1221
#143 0x00000000005bd780 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7faae6dd6880, locals=0x7faae6dd6880, flags=<optimized out>, arena=<optimized out>)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1242
#144 0x0000000000456695 in pyrun_file (fp=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    filename=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, start=<optimized out>, 
    globals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    locals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    closeit=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, flags=0x7ffe9d93ba08)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1140
#145 0x00000000005b7462 in pyrun_simple_file (flags=0x7ffe9d93ba08, closeit=1, filename=0x7faae6c2c780, fp=0x2718d00)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:450
#146 PyRun_SimpleFileExFlags (fp=0x2718d00, filename=<optimized out>, closeit=1, flags=0x7ffe9d93ba08) at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:483
#147 0x00000000005b49de in pymain_run_file (cf=0x7ffe9d93ba08, config=0x26f48e0) at /croot/python-split_1733933781842/work/build-static/python.c:379
#148 pymain_run_python (exitcode=0x7ffe9d93ba00, exitcode@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/work/build-static/python.c:608
#149 Py_RunMain () at /croot/python-split_1733933781842/work/build-static/python.c:687
#150 0x0000000000588369 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /croot/python-split_1733933781842/work/build-static/python.c:1133
#151 0x00007faaed44b193 in __libc_start_main () from /usr/lib64/libc.so.6
#152 0x000000000058821e in _start () at /usr/local/src/conda/python-3.9.21/Modules/future.c:5313

seems to be socket connecting why vllm have no such problem

@rainmaple
Copy link
Author

And when pipeline_parallel_size = 2 stuck stack is

Image

@kouroshHakha kouroshHakha added llm and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 3, 2025
@kouroshHakha kouroshHakha self-assigned this Mar 3, 2025
@rainmaple
Copy link
Author

rainmaple commented Mar 4, 2025

pg_resources.append({"GPU": 1, accelerator: 1}) should be pg_resources.append({"CPU": 1, accelerator: 1}), but in 2 node and pp = 2 problem remains, stucking

@rainmaple
Copy link
Author

@kouroshHakha i've update the info above about this problem, may be socket error ?

@rainmaple
Copy link
Author

sometimes it stuck after load all models

#0  0x00007f18da3258ee in epoll_wait () from /usr/lib64/libc.so.6
#1  0x00007f18d2fda429 in boost::asio::detail::epoll_reactor::run(long, boost::asio::detail::op_queue<boost::asio::detail::scheduler_operation>&) ()
   from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#2  0x00007f18d2fda6dd in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#3  0x00007f18d2fdc329 in boost::asio::detail::scheduler::run(boost::system::error_code&) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#4  0x00007f18d2fdca32 in boost::asio::io_context::run() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#5  0x00007f18d262917d in ray::core::CoreWorker::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#6  0x00007f18d26fbaf1 in ray::core::CoreWorkerProcessImpl::RunWorkerTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#7  0x00007f18d26fbd0d in ray::core::CoreWorkerProcess::RunTaskExecutionLoop() () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#8  0x00007f18d24bdad7 in __pyx_pw_3ray_7_raylet_10CoreWorker_7run_task_loop(_object*, _object*) () from /home/ray/anaconda3/lib/python3.9/site-packages/ray/_raylet.so
#9  0x00000000004fcb78 in method_vectorcall_NOARGS (func=0x7f18d37064a0, args=0x7f18cc125be0, nargsf=<optimized out>, kwnames=<optimized out>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/tupleobject.c:438
#10 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x7f18cc125be0, callable=0x7f18d37064a0, tstate=0x18db0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#11 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7f18cc125be0, callable=0x7f18d37064a0) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#12 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x18db0a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#13 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x7f18cc125a60, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#14 0x00000000004f81d3 in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#15 function_code_fastcall (tstate=0x18db0a0, co=<optimized out>, args=<optimized out>, nargs=<optimized out>, globals=0x7f18cff87a40)
    at /usr/local/src/conda/python-3.9.21/Include/abstract.h:330
#16 0x00000000004e808f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, args=0x18d3520, callable=0x7f18cf9e3310, tstate=0x18db0a0)
    at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:118
#17 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x18d3520, callable=0x7f18cf9e3310) at /usr/local/src/conda/python-3.9.21/Objects/pycore_pyerrors.h:127
#18 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x18db0a0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:5077
#19 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0x18d33b0, throwflag=<optimized out>) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:3506
#20 0x00000000004e6afa in _PyEval_EvalFrame (throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/frameobject.c:40
#21 _PyEval_EvalCode (tstate=<optimized out>, _co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kwnames=0x0, 
    kwargs=0x0, kwcount=<optimized out>, kwstep=2, defs=0x0, defcount=<optimized out>, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4329
#22 0x00000000004e6787 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4361
#23 0x00000000004e6739 in PyEval_EvalCodeEx (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:4377
#24 0x00000000005942bb in PyEval_EvalCode (co=co@entry=0x7f18d3b6e500, globals=globals@entry=0x7f18d3bd8880, locals=locals@entry=0x7f18d3bd8880)
    at /usr/local/src/conda/python-3.9.21/Modules/ceval_gil.h:828
#25 0x00000000005c1777 in run_eval_code_obj (tstate=0x18db0a0, co=0x7f18d3b6e500, globals=0x7f18d3bd8880, locals=0x7f18d3bd8880)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1221
#26 0x00000000005bd780 in run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x7f18d3bd8880, locals=0x7f18d3bd8880, flags=<optimized out>, arena=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging-- 
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1242
#27 0x0000000000456695 in pyrun_file (fp=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    filename=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, start=<optimized out>, 
    globals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    locals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
    closeit=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, flags=0x7fff9edbbb48)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:1140
#28 0x00000000005b7462 in pyrun_simple_file (flags=0x7fff9edbbb48, closeit=1, filename=0x7f18d3a2e780, fp=0x18fcd00)
    at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:450
#29 PyRun_SimpleFileExFlags (fp=0x18fcd00, filename=<optimized out>, closeit=1, flags=0x7fff9edbbb48) at /usr/local/src/conda/python-3.9.21/Objects/clinic/marshal.c.h:483
#30 0x00000000005b49de in pymain_run_file (cf=0x7fff9edbbb48, config=0x18d88e0) at /croot/python-split_1733933781842/work/build-static/python.c:379
#31 pymain_run_python (exitcode=0x7fff9edbbb40, exitcode@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
    at /croot/python-split_1733933781842/work/build-static/python.c:608
#32 Py_RunMain () at /croot/python-split_1733933781842/work/build-static/python.c:687
#33 0x0000000000588369 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /croot/python-split_1733933781842/work/build-static/python.c:1133
#34 0x00007f18da24d193 in __libc_start_main () from /usr/lib64/libc.so.6
#35 0x000000000058821e in _start () at /usr/local/src/conda/python-3.9.21/Modules/future.c:5313

@rainmaple rainmaple changed the title [<Ray component: Serve>] DeepSeek-R1 mode load stuck in H20 [Serve>] DeepSeek-R1 mode load stuck in H20 Mar 5, 2025
@rainmaple rainmaple changed the title [Serve>] DeepSeek-R1 mode load stuck in H20 [Serve] DeepSeek-R1 mode load stuck in H20 Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't llm serve Ray Serve Related Issue
Projects
None yet
Development

No branches or pull requests

3 participants