Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

导出推理模型错误 #1963

Closed
lx1054331851 opened this issue Sep 2, 2024 · 8 comments
Closed

导出推理模型错误 #1963

lx1054331851 opened this issue Sep 2, 2024 · 8 comments
Assignees

Comments

@lx1054331851
Copy link

lx1054331851 commented Sep 2, 2024

Checklist:

  1. 查找历史相关issue寻求解答
  2. 翻阅FAQ常见问题汇总和答疑
  3. 确认bug是否在新版本里还未修复
  4. 翻阅PaddleX 部署文档说明

描述问题

复现

  1. c++部署方式

    • 您是否按照文档教程已经正常运行我们提供的demo

    • 您是否在demo基础上修改代码内容?还请您提供运行的代码

  2. c#部署方式

    • 您是否按照文档教程已经正常运行我们提供的demo

    • 您是否在demo基础上修改代码内容?还请您提供运行的代码

    • 如果c# demo无法正常运行,c++ demo是否已经正常运行?

  3. 您使用的模型数据集是?

  4. 请提供您出现的报错信息及相关log

环境

  1. 如果您使用的是python部署方式,请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号
paddlex                   3.0.0b0      /home/ctyun/PaddleX
paddlepaddle-gpu          3.0.0b1
Python 3.10.14
  1. 如果您使用的是c++或c#部署方式,请提供您使用的PaddleX分支、推理引擎(例如PaddleInference)版本号

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS

Ubuntu 22.04.2 LTS
  1. 请问您使用的CUDA/cuDNN的版本号是?
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

报错信息

paddlex: error: unrecognized arguments: --export_inference --save_dir=./inference_model
@TingquanGao TingquanGao self-assigned this Sep 2, 2024
@lx1054331851
Copy link
Author

image

我发现,应该是Paddlex训练的时候,没有model.pdopt、model.pdstates、config.yaml文件拷贝到best_model文件夹里

@TingquanGao
Copy link
Collaborator

运行命令是什么?我们复现一下

@lx1054331851
Copy link
Author

image

paddlex到了3.0,这个文档是不是就不支持了,那部署方式也没看到别的文档

@lx1054331851
Copy link
Author

运行命令是什么?我们复现一下

训练config文件

Global:
    model: CLIP_vit_base_patch16_224
    mode: check_dataset # check_dataset/train/evaluate/predict
    dataset_dir: "./paddlex/data/"
    device: gpu:0
    output: "output/2024-09-03"

CheckDataset:
    convert:
        enable: False
        src_dataset_type: null
    split:
        enable: False
        train_percent: null
        val_percent: null

Train:
    num_classes: 64
    epochs_iters: 20
    batch_size: 64
    learning_rate: 0.0003
    pretrain_weight_path: null
    warmup_steps: 5
    resume_path:
    log_interval: 1
    eval_interval: 1
    save_interval: 1

Evaluate:
    weight_path: "output/2024-09-03/best_model.pdparams"
    log_interval: 1

Predict:
    model_dir: "output/2024-09-03/best_model"
    input_path: "https://imgcdnv1.fabricschina.com.cn/eshop/MassimoDutti/2024-08-24/1724433663-10593.jpg?basic=40p"
    kernel_option:
        run_mode: paddle
        batch_size: 1

训练命令

 python main.py -c paddlex/configs/image_classification/CLIP_vit_base_patch16_224.yaml -o Global.mode=train -o Glob.dataset_dir=./paddlex/data/

训练能够正常运行

训练结果

image

脚本在最后导出模型的时候报错

[2024/09/03 10:47:48] ppcls INFO: [Eval][Epoch 20][Iter: 122/124]CELoss: 0.91889, loss: 0.91889, top1: 0.76817, top5: 0.94347, batch_cost: 0.12325s, reader_cost: 0.05999, ips: 519.28297 images/sec
[2024/09/03 10:47:48] ppcls INFO: [Eval][Epoch 20][Iter: 123/124]CELoss: 0.27858, loss: 0.27858, top1: 0.76825, top5: 0.94349, batch_cost: 0.12226s, reader_cost: 0.05948, ips: 24.53802 images/sec
[2024/09/03 10:47:48] ppcls INFO: [Eval][Epoch 20][Avg]CELoss: 0.95563, loss: 0.95563, top1: 0.76825, top5: 0.94349
[2024/09/03 10:47:48] ppcls INFO: [Eval][Epoch 20][best metric: 0.769777774810791]
[2024/09/03 10:47:50] ppcls INFO: Already save model in /home/ctyun/PaddleX/output/2024-09-03/epoch_20
[2024/09/03 10:47:51] ppcls INFO: Already save model in /home/ctyun/PaddleX/output/2024-09-03/latest
['/root/miniconda3/envs/ctyun/bin/python', 'tools/export_model.py', '-c', '/root/.paddlex/tmp10gb6dam/clsmodel_CLIP_vit_base_patch16_224.yml', '-o', 'Global.export_for_fd=True', '-o', 'Global.infer_config_path=/home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/deploy/configs/inference_cls.yaml']
A new field (export_for_fd) detected!
A new field (infer_config_path) detected!
[2024/09/03 10:47:54] ppcls INFO: 
===========================================================
==        PaddleClas is powered by PaddlePaddle !        ==
===========================================================
==                                                       ==
==   For more info please go to the following website.   ==
==                                                       ==
==       https://github.com/PaddlePaddle/PaddleClas      ==
===========================================================

[2024/09/03 10:47:54] ppcls INFO: Global : 
[2024/09/03 10:47:54] ppcls INFO:     checkpoints : None
[2024/09/03 10:47:54] ppcls INFO:     pretrained_model : /home/ctyun/PaddleX/output/2024-09-03/epoch_20
[2024/09/03 10:47:54] ppcls INFO:     output_dir : /home/ctyun/PaddleX/output/2024-09-03
[2024/09/03 10:47:54] ppcls INFO:     device : cpu
[2024/09/03 10:47:54] ppcls INFO:     save_interval : 1
[2024/09/03 10:47:54] ppcls INFO:     eval_during_train : True
[2024/09/03 10:47:54] ppcls INFO:     eval_interval : 1
[2024/09/03 10:47:54] ppcls INFO:     epochs : 20
[2024/09/03 10:47:54] ppcls INFO:     print_batch_step : 1
[2024/09/03 10:47:54] ppcls INFO:     use_visualdl : True
[2024/09/03 10:47:54] ppcls INFO:     image_shape : [3, 224, 224]
[2024/09/03 10:47:54] ppcls INFO:     save_inference_dir : /home/ctyun/PaddleX/output/2024-09-03/epoch_20
[2024/09/03 10:47:54] ppcls INFO:     export_for_fd : True
[2024/09/03 10:47:54] ppcls INFO:     infer_config_path : /home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/deploy/configs/inference_cls.yaml
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: AMP : 
[2024/09/03 10:47:54] ppcls INFO:     use_amp : False
[2024/09/03 10:47:54] ppcls INFO:     use_fp16_test : False
[2024/09/03 10:47:54] ppcls INFO:     scale_loss : 128.0
[2024/09/03 10:47:54] ppcls INFO:     use_dynamic_loss_scaling : True
[2024/09/03 10:47:54] ppcls INFO:     use_promote : False
[2024/09/03 10:47:54] ppcls INFO:     level : O1
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: Arch : 
[2024/09/03 10:47:54] ppcls INFO:     name : CLIP_vit_base_patch16_224
[2024/09/03 10:47:54] ppcls INFO:     class_num : 64
[2024/09/03 10:47:54] ppcls INFO:     return_embed : False
[2024/09/03 10:47:54] ppcls INFO:     pretrained : True
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: Loss : 
[2024/09/03 10:47:54] ppcls INFO:     Train : 
[2024/09/03 10:47:54] ppcls INFO:         CELoss : 
[2024/09/03 10:47:54] ppcls INFO:             weight : 1.0
[2024/09/03 10:47:54] ppcls INFO:             epsilon : 0.1
[2024/09/03 10:47:54] ppcls INFO:     Eval : 
[2024/09/03 10:47:54] ppcls INFO:         CELoss : 
[2024/09/03 10:47:54] ppcls INFO:             weight : 1.0
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: Optimizer : 
[2024/09/03 10:47:54] ppcls INFO:     name : AdamWDL
[2024/09/03 10:47:54] ppcls INFO:     beta1 : 0.9
[2024/09/03 10:47:54] ppcls INFO:     beta2 : 0.999
[2024/09/03 10:47:54] ppcls INFO:     epsilon : 1e-08
[2024/09/03 10:47:54] ppcls INFO:     weight_decay : 0.05
[2024/09/03 10:47:54] ppcls INFO:     layerwise_decay : 0.6
[2024/09/03 10:47:54] ppcls INFO:     filter_bias_and_bn : True
[2024/09/03 10:47:54] ppcls INFO:     lr : 
[2024/09/03 10:47:54] ppcls INFO:         name : Cosine
[2024/09/03 10:47:54] ppcls INFO:         learning_rate : 0.0003
[2024/09/03 10:47:54] ppcls INFO:         eta_min : 1e-06
[2024/09/03 10:47:54] ppcls INFO:         warmup_epoch : 5
[2024/09/03 10:47:54] ppcls INFO:         warmup_start_lr : 1e-06
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: DataLoader : 
[2024/09/03 10:47:54] ppcls INFO:     Train : 
[2024/09/03 10:47:54] ppcls INFO:         dataset : 
[2024/09/03 10:47:54] ppcls INFO:             name : ClsDataset
[2024/09/03 10:47:54] ppcls INFO:             image_root : /home/ctyun/PaddleX/paddlex/data
[2024/09/03 10:47:54] ppcls INFO:             cls_label_path : /home/ctyun/PaddleX/paddlex/data/train.txt
[2024/09/03 10:47:54] ppcls INFO:             transform_ops : 
[2024/09/03 10:47:54] ppcls INFO:                 DecodeImage : 
[2024/09/03 10:47:54] ppcls INFO:                     to_rgb : True
[2024/09/03 10:47:54] ppcls INFO:                     channel_first : False
[2024/09/03 10:47:54] ppcls INFO:                 RandCropImage : 
[2024/09/03 10:47:54] ppcls INFO:                     size : 224
[2024/09/03 10:47:54] ppcls INFO:                     interpolation : bicubic
[2024/09/03 10:47:54] ppcls INFO:                     backend : pil
[2024/09/03 10:47:54] ppcls INFO:                 RandFlipImage : 
[2024/09/03 10:47:54] ppcls INFO:                     flip_code : 1
[2024/09/03 10:47:54] ppcls INFO:                 TimmAutoAugment : 
[2024/09/03 10:47:54] ppcls INFO:                     config_str : rand-m9-mstd0.5-inc1
[2024/09/03 10:47:54] ppcls INFO:                     interpolation : bicubic
[2024/09/03 10:47:54] ppcls INFO:                     img_size : 224
[2024/09/03 10:47:54] ppcls INFO:                 NormalizeImage : 
[2024/09/03 10:47:54] ppcls INFO:                     scale : 1.0/255.0
[2024/09/03 10:47:54] ppcls INFO:                     mean : [0.485, 0.456, 0.406]
[2024/09/03 10:47:54] ppcls INFO:                     std : [0.229, 0.224, 0.225]
[2024/09/03 10:47:54] ppcls INFO:                     order : 
[2024/09/03 10:47:54] ppcls INFO:                 RandomErasing : 
[2024/09/03 10:47:54] ppcls INFO:                     EPSILON : 0.25
[2024/09/03 10:47:54] ppcls INFO:                     sl : 0.02
[2024/09/03 10:47:54] ppcls INFO:                     sh : 1.0/3.0
[2024/09/03 10:47:54] ppcls INFO:                     r1 : 0.3
[2024/09/03 10:47:54] ppcls INFO:                     attempt : 10
[2024/09/03 10:47:54] ppcls INFO:                     use_log_aspect : True
[2024/09/03 10:47:54] ppcls INFO:                     mode : pixel
[2024/09/03 10:47:54] ppcls INFO:         sampler : 
[2024/09/03 10:47:54] ppcls INFO:             name : DistributedBatchSampler
[2024/09/03 10:47:54] ppcls INFO:             batch_size : 64
[2024/09/03 10:47:54] ppcls INFO:             drop_last : True
[2024/09/03 10:47:54] ppcls INFO:             shuffle : True
[2024/09/03 10:47:54] ppcls INFO:         loader : 
[2024/09/03 10:47:54] ppcls INFO:             num_workers : 4
[2024/09/03 10:47:54] ppcls INFO:             use_shared_memory : True
[2024/09/03 10:47:54] ppcls INFO:     Eval : 
[2024/09/03 10:47:54] ppcls INFO:         dataset : 
[2024/09/03 10:47:54] ppcls INFO:             name : ClsDataset
[2024/09/03 10:47:54] ppcls INFO:             image_root : /home/ctyun/PaddleX/paddlex/data
[2024/09/03 10:47:54] ppcls INFO:             cls_label_path : /home/ctyun/PaddleX/paddlex/data/val.txt
[2024/09/03 10:47:54] ppcls INFO:             transform_ops : 
[2024/09/03 10:47:54] ppcls INFO:                 DecodeImage : 
[2024/09/03 10:47:54] ppcls INFO:                     to_rgb : True
[2024/09/03 10:47:54] ppcls INFO:                     channel_first : False
[2024/09/03 10:47:54] ppcls INFO:                 ResizeImage : 
[2024/09/03 10:47:54] ppcls INFO:                     resize_short : 224
[2024/09/03 10:47:54] ppcls INFO:                     interpolation : bicubic
[2024/09/03 10:47:54] ppcls INFO:                     backend : pil
[2024/09/03 10:47:54] ppcls INFO:                 CropImage : 
[2024/09/03 10:47:54] ppcls INFO:                     size : 224
[2024/09/03 10:47:54] ppcls INFO:                 NormalizeImage : 
[2024/09/03 10:47:54] ppcls INFO:                     scale : 1.0/255.0
[2024/09/03 10:47:54] ppcls INFO:                     mean : [0.485, 0.456, 0.406]
[2024/09/03 10:47:54] ppcls INFO:                     std : [0.229, 0.224, 0.225]
[2024/09/03 10:47:54] ppcls INFO:                     order : 
[2024/09/03 10:47:54] ppcls INFO:         sampler : 
[2024/09/03 10:47:54] ppcls INFO:             name : DistributedBatchSampler
[2024/09/03 10:47:54] ppcls INFO:             batch_size : 64
[2024/09/03 10:47:54] ppcls INFO:             drop_last : False
[2024/09/03 10:47:54] ppcls INFO:             shuffle : False
[2024/09/03 10:47:54] ppcls INFO:         loader : 
[2024/09/03 10:47:54] ppcls INFO:             num_workers : 4
[2024/09/03 10:47:54] ppcls INFO:             use_shared_memory : True
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: Infer : 
[2024/09/03 10:47:54] ppcls INFO:     infer_imgs : docs/images/inference_deployment/whl_demo.jpg
[2024/09/03 10:47:54] ppcls INFO:     batch_size : 10
[2024/09/03 10:47:54] ppcls INFO:     transforms : 
[2024/09/03 10:47:54] ppcls INFO:         DecodeImage : 
[2024/09/03 10:47:54] ppcls INFO:             to_rgb : True
[2024/09/03 10:47:54] ppcls INFO:             channel_first : False
[2024/09/03 10:47:54] ppcls INFO:         ResizeImage : 
[2024/09/03 10:47:54] ppcls INFO:             resize_short : 256
[2024/09/03 10:47:54] ppcls INFO:         CropImage : 
[2024/09/03 10:47:54] ppcls INFO:             size : 224
[2024/09/03 10:47:54] ppcls INFO:         NormalizeImage : 
[2024/09/03 10:47:54] ppcls INFO:             scale : 1.0/255.0
[2024/09/03 10:47:54] ppcls INFO:             mean : [0.485, 0.456, 0.406]
[2024/09/03 10:47:54] ppcls INFO:             std : [0.229, 0.224, 0.225]
[2024/09/03 10:47:54] ppcls INFO:             order : 
[2024/09/03 10:47:54] ppcls INFO:         ToCHWImage : None
[2024/09/03 10:47:54] ppcls INFO:     PostProcess : 
[2024/09/03 10:47:54] ppcls INFO:         name : Topk
[2024/09/03 10:47:54] ppcls INFO:         topk : 5
[2024/09/03 10:47:54] ppcls INFO:         class_id_map_file : /home/ctyun/PaddleX/paddlex/data/label.txt
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: Metric : 
[2024/09/03 10:47:54] ppcls INFO:     Train : 
[2024/09/03 10:47:54] ppcls INFO:         TopkAcc : 
[2024/09/03 10:47:54] ppcls INFO:             topk : [1, 5]
[2024/09/03 10:47:54] ppcls INFO:     Eval : 
[2024/09/03 10:47:54] ppcls INFO:         TopkAcc : 
[2024/09/03 10:47:54] ppcls INFO:             topk : [1, 5]
[2024/09/03 10:47:54] ppcls INFO: ------------------------------------------------------------
[2024/09/03 10:47:54] ppcls INFO: train with paddle 3.0.0-beta1 and device Place(cpu)
[2024/09/03 10:47:57] ppcls INFO: Found /root/.paddleclas/weights/CLIP_vit_base_patch16_224.pdparams
[2024/09/03 10:47:57] ppcls INFO: Finish load pretrained model from /root/.paddleclas/weights/CLIP_vit_base_patch16_224.pdparams
[2024/09/03 10:47:57] ppcls INFO: Finish load pretrained model from /home/ctyun/PaddleX/output/2024-09-03/epoch_20.pdparams
[2024/09/03 10:47:57] ppcls INFO: Finish load pretrained model from /home/ctyun/PaddleX/output/2024-09-03/epoch_20.pdparams
I0903 10:47:59.722121 1874550 program_interpreter.cc:243] New Executor is Running.
Traceback (most recent call last):
  File "/home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/tools/export_model.py", line 40, in <module>
    engine.export()
  File "/home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/ppcls/engine/engine.py", line 528, in export
    dump_infer_config(self.config, dst_path)
  File "/home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/ppcls/utils/config.py", line 247, in dump_infer_config
    with open(postprocess_dict["class_id_map_file"], 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ctyun/PaddleX/paddlex/data/label.txt'
Traceback (most recent call last):
  File "/home/ctyun/PaddleX/paddlex/modules/base/trainer/train_deamon.py", line 33, in wrap
    func(self, *args, **kwargs)
  File "/home/ctyun/PaddleX/paddlex/modules/base/trainer/train_deamon.py", line 195, in update
    self.results[i] = self.update_result(self.results[i],
  File "/home/ctyun/PaddleX/paddlex/modules/base/trainer/train_deamon.py", line 266, in update_result
    self.update_models(result, model, train_output, f"last_{i}",
  File "/home/ctyun/PaddleX/paddlex/modules/base/trainer/train_deamon.py", line 318, in update_models
    self.update_inference_model(model, pdparams,
  File "/home/ctyun/PaddleX/paddlex/modules/base/trainer/train_deamon.py", line 326, in update_inference_model
    export_result = model.export(
  File "/home/ctyun/PaddleX/paddlex/repo_apis/PaddleClas_api/cls/model.py", line 201, in export
    return self.runner.export(config_path, [], None, save_dir)
  File "/home/ctyun/PaddleX/paddlex/repo_apis/PaddleClas_api/cls/runner.py", line 127, in export
    cp = self.run_cmd(cmd, switch_wdir=True, echo=True, silent=False)
  File "/home/ctyun/PaddleX/paddlex/repo_apis/base/runner.py", line 359, in run_cmd
    raise CalledProcessError(
paddlex.utils.errors.others.CalledProcessError: Command ['/root/miniconda3/envs/ctyun/bin/python', 'tools/export_model.py', '-c', '/root/.paddlex/tmp10gb6dam/clsmodel_CLIP_vit_base_patch16_224.yml', '-o', 'Global.export_for_fd=True', '-o', 'Global.infer_config_path=/home/ctyun/PaddleX/paddlex/repo_manager/repos/PaddleClas/deploy/configs/inference_cls.yaml'] returned non-zero exit status 1.

@lx1054331851
Copy link
Author

from paddlex import PaddleInferenceOption, create_model

model_name = "CLIP_vit_base_patch16_224"

# 实例化 PaddleInferenceOption 设置推理配置
kernel_option = PaddleInferenceOption()
kernel_option.set_device("gpu:0")

# 调用 create_model 函数实例化预测模型
model = create_model(model_name=model_name, model_dir="/output/2024-09-03/best_model", kernel_option=kernel_option)

# 调用预测模型 model 的 predict 方法进行预测
result = model.predict({'input_path': "https://imgcdnv1.fabricschina.com.cn/eshop/MassimoDutti/2024-08-24/1724433663-10593.jpg?basic=40p"})
/root/miniconda3/envs/ctyun/bin/python /home/ctyun/PaddleX/test/1.py 
The device id has been set to 0.
Traceback (most recent call last):
  File "/home/ctyun/PaddleX/test/1.py", line 10, in <module>
    model = create_model(model_name=model_name, model_dir="/output/2024-09-03/best_model", kernel_option=kernel_option)
  File "/home/ctyun/PaddleX/paddlex/modules/base/predictor/predictor.py", line 207, in create_model
    return BasePredictor.get(model_name)(model_name=model_name,
  File "/home/ctyun/PaddleX/paddlex/modules/base/predictor/utils/node.py", line 43, in _wrapper
    ret = init_func(self, *args, **kwargs)
  File "/home/ctyun/PaddleX/paddlex/modules/base/predictor/predictor.py", line 50, in __init__
    self.other_src = self.load_other_src()
  File "/home/ctyun/PaddleX/paddlex/modules/image_classification/predictor/predictor.py", line 38, in load_other_src
    raise FileNotFoundError(
FileNotFoundError: Cannot find config file: /output/2024-09-03/best_model/inference.yml

文档里的脚本预测都不行

image

@TingquanGao
Copy link
Collaborator

ex到了3.0,这个文档是不是就不支持了,那部署方式也没看到别的文

这个文档确实过期了,我们在梳理当前的文档,会在近期更新。

@TingquanGao
Copy link
Collaborator

这个训练中导出报错是因为数据集缺少标注文件(label.txt),需要检查一下数据集格式。同时模型导出相关的逻辑正在重构,近期会更新。

@TingquanGao
Copy link
Collaborator

The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.


From Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants