[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

EmmonsCurse · 2024-01-18T12:39:03Z

bug描述 Describe the Bug

0. Error Description

After the merge of #60629 which 'fix bug for program_converter', another error occurred while running inference on the rec_r34_vd_tps_bilstm_attn model as shown below:

1. GPU:

test_rec_r34_vd_tps_bilstm_attn_gpu.py:46: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_gpu_helper.py:75: in get_infer_results
    AnalysisPredictor = Predictor(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <src.test_case.Predictor object at 0x7fdcbdbbacd0>
model_path = '/vdb1/112_workspace/continuous_integration/inference/inference_api_test/python_api_test/Data/python-ocr-infer/rec_r34_vd_tps_bilstm_attn'
predictor_mode = 'Analysis', config_type = 'gpu', batch_size = 1, min_subgraph_size = 1, trt_dynamic_shape_info = None

    def __init__(self,
                 model_path,
                 predictor_mode="Analysis",
                 config_type="cpu",
                 batch_size=1,
                 min_subgraph_size=1,
                 trt_dynamic_shape_info=None):
        """
        init configuration of predictor
        Args:
            model_path(string): the path of test model
            predictor_mode(strings): create native or analysis predictor
            config_type(strings): describe analysis prediction configuration
        """
        configs = DeployConfig(
            model_path=model_path,
            batch_size=batch_size,
            min_subgraph_size=min_subgraph_size,
            trt_dynamic_shape_info=trt_dynamic_shape_info)
        analysis_predictor_config = configs.analysis_config(config_type)
    
        logger.debug("analysis_predictor_config : {}".format(
            analysis_predictor_config))
        configs.summary_config(analysis_predictor_config)  # summary configs
    
        if predictor_mode == "Analysis":
            logger.info("current config is Analysis config")
>           predictor0 = base.core.create_paddle_predictor(
                analysis_predictor_config)
E           RuntimeError: (PreconditionNotMet) Tensor's dimension is out of bound.Tensor's dimension must be equal or less than the size of its memory.But received Tensor's dimension is 2116, memory's size is 0.
E             [Hint: Expected numel() * SizeOf(dtype()) <= memory_size(), but received numel() * SizeOf(dtype()):2116 > memory_size():0.] (at ../paddle/phi/core/dense_tensor_impl.cc:55)

2. CPU

test_rec_r34_vd_tps_bilstm_attn_cpu.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_cpu_helper.py:75: in get_infer_results
    res, ave_time = AnalysisPredictor.analysis_predict(data_path, repeats=2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <src.test_case.Predictor object at 0x7fe670cfb0a0>
json_dir = '/vdb1/112_workspace/continuous_integration/inference/inference_api_test/python_api_test/Data/python-ocr-infer/word_rec_data_3_32_100/data.json'
repeats = 2

    def analysis_predict(self, json_dir, repeats=1):
        """
        use zero copy and analysis config to predict
        Args:
            json_dir(string) : "*.json"
            repeats(int)
        Returns:
            outputs(list|[numpy.array, numpy.array]): list of numpy array
            ave_time(float): infer speed
        """
        # parse json from data file
        input_info = JsonInfo().parse_json(json_dir)
        # assign data to Tensor
        input_names = self.predictor.get_input_names()
        for i, input_data_name in enumerate(input_names):
            record = Record().load_data_from_json(input_info[i])
            record = next(record)
            logger.info("====> input_names[{0}] = {1} <====".format(
                i, input_names[i]))
            input_tensor = self.predictor.get_input_tensor(input_data_name)
            logger.debug("record.data shape is {}".format(record.data.shape))
            input_tensor.copy_from_cpu(record.data)
            if hasattr(record, 'lod'):
                input_tensor.set_lod([record.lod])
    
        cost_time = []
        for i in range(repeats):
            t1 = time.time()
    
>           self.predictor.zero_copy_run()
E           RuntimeError: In user code:
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op
E               attrs=kwargs.get("attrs", None))
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
E               return self.main_program.current_block().append_op(*args, **kwargs)
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 6416, in matmul
E               attrs=attrs)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/stns/tps.py", line 242, in __call__
E               batch_T = layers.matmul(inv_delta_C_tensor, batch_C_prime_with_zeros)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/stns/tps.py", line 256, in __call__
E               batch_P_prime = self.grid_generator(batch_C_prime, I_r_size)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/architectures/rec_model.py", line 110, in __call__
E               inputs = self.tps(image)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/tools/program.py", line 198, in build_export
E               image, outputs = model(mode='export')
E           
E               File "tools/export_model.py", line 67, in main
E               config, eval_program, startup_prog)
E           
E               File "tools/export_model.py", line 93, in <module>
E               main()
E           
E           
E               PreconditionNotMetError: Tensor's dimension is out of bound.Tensor's dimension must be equal or less than the size of its memory.But received Tensor's dimension is 2116, memory's size is 0.
E                 [Hint: Expected numel() * SizeOf(dtype()) <= memory_size(), but received numel() * SizeOf(dtype()):2116 > memory_size():0.] (at ../paddle/phi/core/dense_tensor_impl.cc:55)
E                 [operator < matmul > error]

../../src/test_case.py:282: RuntimeError

1. Operating environment

PaddlePaddle version: develop
OS version: CentOS
Python version: 3.8
GPU: T4
CPU info: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz

2. The problem recurrence steps

git clone https://github.com/PaddlePaddle/continuous_integration.git --depth=1

cd ./continuous_integration/inference/inference_api_test/python_api_test/

project_path=`pwd`
export project_path
cd ${project_path}

# download Data

mkdir -p ./Data
cd ./Data

# download models

wget --no-proxy -q https://sys-p0.bj.bcebos.com/inference/python-ocr-infer.tgz --no-check-certificate
tar -xvf python-ocr-infer.tgz

cd -

# requirements
python -m pip install --upgrade pip -i https://mirror.baidu.com/pypi/simple
python -m pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple

# download paddlepaddle_whl

## error_pr(failed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/d310158ecfaa49726af1c903e59ad535aa496808/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

## the previous commit of error_pr(passed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/2f5efb0287c1b66fb5446eb7fb8e5490dc1fd102/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

# install paddlepaddle_whl
python -m pip install paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

# run case
cd ./tests/gpu
python -m pytest -sv test_rec_r34_vd_tps_bilstm_attn_gpu.py

cd ../cpu
python -m pytest -sv test_rec_r34_vd_tps_bilstm_attn_cpu.py

其他补充信息 Additional Supplementary Information

@zyt1024 Can you help me solve this?

The text was updated successfully, but these errors were encountered:

zyt1024 · 2024-01-18T14:31:18Z

Ok, I will attempt to reproduce it and verify if the error is caused by this pull request. #60629

zyt1024 · 2024-01-22T02:16:58Z

@EmmonsCurse 您好，这个PR的修改并未涉及到matmul的功能，您能再确定一下该问题是因为该PR [fix bug]fix bug for program_converter 引起的吗

EmmonsCurse · 2024-01-22T03:45:45Z

@EmmonsCurse 您好，这个PR的修改并未涉及到matmul的功能，您能再确定一下该问题是因为该PR [fix bug]fix bug for program_converter 引起的吗

@zyt1024 您好，首先，请注意查看全部的报错信息，[operator < matmul > error] 只是 CPU 报错信息的一部分，还有其他的报错信息；其次，相应的复现方法我已附上，已明确给出该问题是在您的 PR 合入后出现的，基于前一个 commit 的编包是执行正常的。如有疑问，请参考给出的复现方法，执行复现即可验证，谢谢。

## error_pr(failed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/d310158ecfaa49726af1c903e59ad535aa496808/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

## the previous commit of error_pr(passed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/2f5efb0287c1b66fb5446eb7fb8e5490dc1fd102/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

zyt1024 · 2024-01-25T06:48:12Z

@EmmonsCurse 谢谢您进行验证，可以帮忙验证下这个PR #61051 合并后是否还会产生其他的bug。

EmmonsCurse · 2024-01-25T07:10:51Z

@EmmonsCurse 谢谢您进行验证，可以帮忙验证下这个PR #61051 合并后是否还会产生其他的bug。

@zyt1024 OK 👌

EmmonsCurse added status/new-issue 新建 type/bug-report 报bug labels Jan 18, 2024

paddle-bot bot assigned wj-Mcat Jan 18, 2024

wj-Mcat removed their assignment Jan 22, 2024

paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Jan 22, 2024

zyt1024 mentioned this issue Jan 23, 2024

【fix bug】fix bug for program_converter #61051

Merged

EmmonsCurse closed this as completed Jan 25, 2024

paddle-bot bot added status/close 已关闭 and removed status/following-up 跟进中 labels Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

EmmonsCurse commented Jan 18, 2024

zyt1024 commented Jan 18, 2024

zyt1024 commented Jan 22, 2024

EmmonsCurse commented Jan 22, 2024

zyt1024 commented Jan 25, 2024

EmmonsCurse commented Jan 25, 2024

[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

Comments

EmmonsCurse commented Jan 18, 2024

bug描述 Describe the Bug

0. Error Description

1. Operating environment

2. The problem recurrence steps

其他补充信息 Additional Supplementary Information

zyt1024 commented Jan 18, 2024

zyt1024 commented Jan 22, 2024

EmmonsCurse commented Jan 22, 2024

zyt1024 commented Jan 25, 2024

EmmonsCurse commented Jan 25, 2024