Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]pr_60629 encountered another error of "RuntimeError: (PreconditionNotMet)" while running inference on the rec_r34_vd_tps_bilstm_attn model. #60957

Closed
EmmonsCurse opened this issue Jan 18, 2024 · 5 comments
Labels

Comments

@EmmonsCurse
Copy link

bug描述 Describe the Bug

0. Error Description

After the merge of #60629 which 'fix bug for program_converter', another error occurred while running inference on the rec_r34_vd_tps_bilstm_attn model as shown below:

1. GPU:

test_rec_r34_vd_tps_bilstm_attn_gpu.py:46: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_gpu_helper.py:75: in get_infer_results
    AnalysisPredictor = Predictor(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <src.test_case.Predictor object at 0x7fdcbdbbacd0>
model_path = '/vdb1/112_workspace/continuous_integration/inference/inference_api_test/python_api_test/Data/python-ocr-infer/rec_r34_vd_tps_bilstm_attn'
predictor_mode = 'Analysis', config_type = 'gpu', batch_size = 1, min_subgraph_size = 1, trt_dynamic_shape_info = None

    def __init__(self,
                 model_path,
                 predictor_mode="Analysis",
                 config_type="cpu",
                 batch_size=1,
                 min_subgraph_size=1,
                 trt_dynamic_shape_info=None):
        """
        init configuration of predictor
        Args:
            model_path(string): the path of test model
            predictor_mode(strings): create native or analysis predictor
            config_type(strings): describe analysis prediction configuration
        """
        configs = DeployConfig(
            model_path=model_path,
            batch_size=batch_size,
            min_subgraph_size=min_subgraph_size,
            trt_dynamic_shape_info=trt_dynamic_shape_info)
        analysis_predictor_config = configs.analysis_config(config_type)
    
        logger.debug("analysis_predictor_config : {}".format(
            analysis_predictor_config))
        configs.summary_config(analysis_predictor_config)  # summary configs
    
        if predictor_mode == "Analysis":
            logger.info("current config is Analysis config")
>           predictor0 = base.core.create_paddle_predictor(
                analysis_predictor_config)
E           RuntimeError: (PreconditionNotMet) Tensor's dimension is out of bound.Tensor's dimension must be equal or less than the size of its memory.But received Tensor's dimension is 2116, memory's size is 0.
E             [Hint: Expected numel() * SizeOf(dtype()) <= memory_size(), but received numel() * SizeOf(dtype()):2116 > memory_size():0.] (at ../paddle/phi/core/dense_tensor_impl.cc:55)

2. CPU

test_rec_r34_vd_tps_bilstm_attn_cpu.py:45: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_cpu_helper.py:75: in get_infer_results
    res, ave_time = AnalysisPredictor.analysis_predict(data_path, repeats=2)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <src.test_case.Predictor object at 0x7fe670cfb0a0>
json_dir = '/vdb1/112_workspace/continuous_integration/inference/inference_api_test/python_api_test/Data/python-ocr-infer/word_rec_data_3_32_100/data.json'
repeats = 2

    def analysis_predict(self, json_dir, repeats=1):
        """
        use zero copy and analysis config to predict
        Args:
            json_dir(string) : "*.json"
            repeats(int)
        Returns:
            outputs(list|[numpy.array, numpy.array]): list of numpy array
            ave_time(float): infer speed
        """
        # parse json from data file
        input_info = JsonInfo().parse_json(json_dir)
        # assign data to Tensor
        input_names = self.predictor.get_input_names()
        for i, input_data_name in enumerate(input_names):
            record = Record().load_data_from_json(input_info[i])
            record = next(record)
            logger.info("====> input_names[{0}] = {1} <====".format(
                i, input_names[i]))
            input_tensor = self.predictor.get_input_tensor(input_data_name)
            logger.debug("record.data shape is {}".format(record.data.shape))
            input_tensor.copy_from_cpu(record.data)
            if hasattr(record, 'lod'):
                input_tensor.set_lod([record.lod])
    
        cost_time = []
        for i in range(repeats):
            t1 = time.time()
    
>           self.predictor.zero_copy_run()
E           RuntimeError: In user code:
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2610, in append_op
E               attrs=kwargs.get("attrs", None))
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
E               return self.main_program.current_block().append_op(*args, **kwargs)
E           
E               File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 6416, in matmul
E               attrs=attrs)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/stns/tps.py", line 242, in __call__
E               batch_T = layers.matmul(inv_delta_C_tensor, batch_C_prime_with_zeros)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/stns/tps.py", line 256, in __call__
E               batch_P_prime = self.grid_generator(batch_C_prime, I_r_size)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/ppocr/modeling/architectures/rec_model.py", line 110, in __call__
E               inputs = self.tps(image)
E           
E               File "/workspace/PaddleOCR_deplpy/PaddleOCR/tools/program.py", line 198, in build_export
E               image, outputs = model(mode='export')
E           
E               File "tools/export_model.py", line 67, in main
E               config, eval_program, startup_prog)
E           
E               File "tools/export_model.py", line 93, in <module>
E               main()
E           
E           
E               PreconditionNotMetError: Tensor's dimension is out of bound.Tensor's dimension must be equal or less than the size of its memory.But received Tensor's dimension is 2116, memory's size is 0.
E                 [Hint: Expected numel() * SizeOf(dtype()) <= memory_size(), but received numel() * SizeOf(dtype()):2116 > memory_size():0.] (at ../paddle/phi/core/dense_tensor_impl.cc:55)
E                 [operator < matmul > error]

../../src/test_case.py:282: RuntimeError

1. Operating environment

PaddlePaddle version: develop
OS version: CentOS
Python version: 3.8
GPU: T4
CPU info: Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz

2. The problem recurrence steps

git clone https://github.com/PaddlePaddle/continuous_integration.git --depth=1

cd ./continuous_integration/inference/inference_api_test/python_api_test/

project_path=`pwd`
export project_path
cd ${project_path}

# download Data

mkdir -p ./Data
cd ./Data

# download models

wget --no-proxy -q https://sys-p0.bj.bcebos.com/inference/python-ocr-infer.tgz --no-check-certificate
tar -xvf python-ocr-infer.tgz

cd -

# requirements
python -m pip install --upgrade pip -i https://mirror.baidu.com/pypi/simple
python -m pip install -r requirements.txt -i https://mirror.baidu.com/pypi/simple

# download paddlepaddle_whl

## error_pr(failed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/d310158ecfaa49726af1c903e59ad535aa496808/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

## the previous commit of error_pr(passed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/2f5efb0287c1b66fb5446eb7fb8e5490dc1fd102/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

# install paddlepaddle_whl
python -m pip install paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

# run case
cd ./tests/gpu
python -m pytest -sv test_rec_r34_vd_tps_bilstm_attn_gpu.py

cd ../cpu
python -m pytest -sv test_rec_r34_vd_tps_bilstm_attn_cpu.py

其他补充信息 Additional Supplementary Information

@zyt1024 Can you help me solve this?

@zyt1024
Copy link
Contributor

zyt1024 commented Jan 18, 2024

Ok, I will attempt to reproduce it and verify if the error is caused by this pull request. #60629

@zyt1024
Copy link
Contributor

zyt1024 commented Jan 22, 2024

@EmmonsCurse 您好,这个PR的修改并未涉及到matmul的功能,您能再确定一下该问题是因为该PR [fix bug]fix bug for program_converter 引起的吗

@wj-Mcat wj-Mcat removed their assignment Jan 22, 2024
@paddle-bot paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Jan 22, 2024
@EmmonsCurse
Copy link
Author

@EmmonsCurse 您好,这个PR的修改并未涉及到matmul的功能,您能再确定一下该问题是因为该PR [fix bug]fix bug for program_converter 引起的吗

@zyt1024 您好,首先,请注意查看全部的报错信息,[operator < matmul > error] 只是 CPU 报错信息的一部分,还有其他的报错信息;其次,相应的复现方法我已附上,已明确给出该问题是在您的 PR 合入后出现的,基于前一个 commit 的编包是执行正常的。如有疑问,请参考给出的复现方法,执行复现即可验证,谢谢。
image

## error_pr(failed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/d310158ecfaa49726af1c903e59ad535aa496808/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

## the previous commit of error_pr(passed)
#wget -q https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-GpuAll-LinuxCentos-Gcc82-Cuda112-Trtoff-Py38-Compile/2f5efb0287c1b66fb5446eb7fb8e5490dc1fd102/paddlepaddle_gpu-0.0.0-cp38-cp38-linux_x86_64.whl

@zyt1024
Copy link
Contributor

zyt1024 commented Jan 25, 2024

@EmmonsCurse 谢谢您进行验证,可以帮忙验证下 这个PR #61051 合并后是否还会产生其他的bug。

@EmmonsCurse
Copy link
Author

@EmmonsCurse 谢谢您进行验证,可以帮忙验证下 这个PR #61051 合并后是否还会产生其他的bug。

@zyt1024 OK 👌

@paddle-bot paddle-bot bot added status/close 已关闭 and removed status/following-up 跟进中 labels Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants