PaddleX 实例分割模型训练自定义数据集时报内存溢出 #2634

yzyMichael · 2024-12-12T02:10:17Z

Checklist:

查找历史相关issue寻求解答
翻阅FAQ
翻阅PaddleX 文档
确认bug是否在新版本里还未修复

描述问题

使用PaddleX instance_segmentation 训练自定义数据集时报OOM，数据集大小：22M 总共：47张图片
服务器配置：NVIDIA Tesla V100 16G*4
服务器训练instance_seg_coco_examples数据集正常
训练自定义数据集前check_dataset正常

训练时报错，错误信息如下：
Out of memory error on GPU 0. Cannot allocate 7.392883GB memory on GPU 0, 9.610779GB memory has been allocated and available memory is only 6.154968GB.

Please check whether there is any other process using GPU 0.

复现

python main.py -c paddlex/configs/instance_segmentation/Mask-RT-DETR-L.yaml
-o Global.mode=train
-o Global.device=gpu:0,1,2,3
-o Global.dataset_dir=../dataset/express_coco_instance_seg

高性能推理
- 您是否完全按照高性能推理文档教程跑通了流程？
- 您使用的是离线激活方式还是在线激活方式？
服务化部署
- 您是否完全按照服务化部署文档教程跑通了流程？
- 您在服务化部署中是否有使用高性能推理插件，如果是，您使用的是离线激活方式还是在线激活方式？
- 如果是多语言调用的问题，请给出调用示例子。
端侧部署
- 您是否完全按照端侧部署文档教程跑通了流程？
- 您使用的端侧设备是？对应的PaddlePaddle版本和PaddleLite版本分别是什么？
您使用的模型和数据集是？
模型：instance_segmentation
数据集：自定义数据集
请提供您出现的报错信息及相关log
/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:686: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
warnings.warn(warning_message)
LAUNCH INFO 2024-12-12 09:55:57,548 ----------- Configuration ----------------------
LAUNCH INFO 2024-12-12 09:55:57,548 auto_cluster_config: 0
LAUNCH INFO 2024-12-12 09:55:57,548 auto_parallel_config: None
LAUNCH INFO 2024-12-12 09:55:57,548 auto_tuner_json: None
LAUNCH INFO 2024-12-12 09:55:57,548 devices: 0,1,2,3
LAUNCH INFO 2024-12-12 09:55:57,548 elastic_level: -1
LAUNCH INFO 2024-12-12 09:55:57,548 elastic_timeout: 30
LAUNCH INFO 2024-12-12 09:55:57,548 enable_gpu_log: True
LAUNCH INFO 2024-12-12 09:55:57,548 gloo_port: 6767
LAUNCH INFO 2024-12-12 09:55:57,548 host: None
LAUNCH INFO 2024-12-12 09:55:57,548 ips: None
LAUNCH INFO 2024-12-12 09:55:57,548 job_id: default
LAUNCH INFO 2024-12-12 09:55:57,548 legacy: False
LAUNCH INFO 2024-12-12 09:55:57,548 log_dir: /root/PaddleX/output/distributed_train_logs
LAUNCH INFO 2024-12-12 09:55:57,548 log_level: INFO
LAUNCH INFO 2024-12-12 09:55:57,548 log_overwrite: False
LAUNCH INFO 2024-12-12 09:55:57,548 master: None
LAUNCH INFO 2024-12-12 09:55:57,548 max_restart: 3
LAUNCH INFO 2024-12-12 09:55:57,548 nnodes: 1
LAUNCH INFO 2024-12-12 09:55:57,549 nproc_per_node: None
LAUNCH INFO 2024-12-12 09:55:57,549 rank: -1
LAUNCH INFO 2024-12-12 09:55:57,549 run_mode: collective
LAUNCH INFO 2024-12-12 09:55:57,549 server_num: None
LAUNCH INFO 2024-12-12 09:55:57,549 servers:
LAUNCH INFO 2024-12-12 09:55:57,549 sort_ip: False
LAUNCH INFO 2024-12-12 09:55:57,549 start_port: 6070
LAUNCH INFO 2024-12-12 09:55:57,549 trainer_num: None
LAUNCH INFO 2024-12-12 09:55:57,549 trainers:
LAUNCH INFO 2024-12-12 09:55:57,549 training_script: tools/train.py
LAUNCH INFO 2024-12-12 09:55:57,549 training_script_args: ['--eval', '--config', '/root/.paddlex/tmpnzorflxb/instancesegmodel_Mask-RT-DETR-L.yml', '--use_vdl', 'True', '--vdl_log_dir', '/root/PaddleX/output']
LAUNCH INFO 2024-12-12 09:55:57,549 with_gloo: 1
LAUNCH INFO 2024-12-12 09:55:57,549 --------------------------------------------------
LAUNCH INFO 2024-12-12 09:55:57,549 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2024-12-12 09:55:57,552 Run Pod: vlsjqt, replicas 4, status ready
LAUNCH INFO 2024-12-12 09:55:57,651 Watching Pod: vlsjqt, replicas 4, status running
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_cudnn_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/cudnn/lib', default_value='')
FLAGS(name='FLAGS_nccl_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/nccl/lib', default_value='')
FLAGS(name='FLAGS_cusparse_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/cusparse/lib', default_value='')
FLAGS(name='FLAGS_cusolver_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/cusolver/lib', default_value='')
FLAGS(name='FLAGS_cublas_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/cublas/lib', default_value='')
FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='')
FLAGS(name='FLAGS_enable_pir_api', current_value=False, default_value=True)
FLAGS(name='FLAGS_curand_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/curand/lib', default_value='')
FLAGS(name='FLAGS_nvidia_package_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia', default_value='')
FLAGS(name='FLAGS_cupti_dir', current_value='/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/../nvidia/cuda_cupti/lib', default_value='')
=======================================================================
I1212 09:56:01.623795 331699 tcp_utils.cc:181] The server starts to listen on IP_ANY:50581
I1212 09:56:01.624085 331699 tcp_utils.cc:130] Successfully connected to 10.11.32.133:50581
I1212 09:56:04.711324 331699 process_group_nccl.cc:150] ProcessGroupNCCL pg_timeout_ 1800000
I1212 09:56:04.711365 331699 process_group_nccl.cc:151] ProcessGroupNCCL nccl_comm_init_option_ 0
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[12/12 09:56:05] ppdet.data.source.coco INFO: Load [37 samples valid, 0 samples invalid] in file /root/dataset/express_coco_instance_seg/annotations/instance_train.json.
W1212 09:56:05.518914 331699 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.4, Runtime API Version: 12.3
W1212 09:56:05.520212 331699 gpu_resources.cc:164] device: 0, cuDNN Version: 9.0.
[12/12 09:56:07] ppdet.utils.checkpoint INFO: The shape [80, 256] in pretrained weight transformer.denoising_class_embed.weight is unmatched with the shape [2, 256] in model transformer.denoising_class_embed.weight. And the weight transformer.denoising_class_embed.weight will not be loaded
[12/12 09:56:07] ppdet.utils.checkpoint INFO: The shape [80] in pretrained weight transformer.score_head.bias is unmatched with the shape [2] in model transformer.score_head.bias. And the weight transformer.score_head.bias will not be loaded
[12/12 09:56:07] ppdet.utils.checkpoint INFO: The shape [256, 80] in pretrained weight transformer.score_head.weight is unmatched with the shape [256, 2] in model transformer.score_head.weight. And the weight transformer.score_head.weight will not be loaded
[12/12 09:56:07] ppdet.utils.checkpoint INFO: Finish loading model weights: /root/.cache/paddle/weights/Mask-RT-DETR-L_pretrained.pdparams
W1212 09:56:12.050602 331699 reducer.cc:733] All parameters are involved in the backward pass. It is recommended to set find_unused_parameters to False to improve performance. However, if unused parameters appear in subsequent iterative training, then an error will occur. Please make it clear that in the subsequent training, there will be no parameters that are not used in the backward pass, and then set find_unused_parameters
[12/12 09:56:12] ppdet.engine.callbacks INFO: Epoch: [0] [ 0/10] learning_rate: 0.000000 loss_class: 0.016525 loss_bbox: 1.174297 loss_giou: 3.149053 loss_mask: 0.405251 loss_dice: 4.808298 loss_class_aux: 1.177263 loss_bbox_aux: 8.399309 loss_giou_aux: 18.635376 loss_mask_aux: 26.235355 loss_dice_aux: 34.775093 loss_class_dn: 7.234075 loss_bbox_dn: 0.185098 loss_giou_dn: 0.602603 loss_mask_dn: 0.201542 loss_dice_dn: 1.712869 loss_class_aux_dn: 41.473408 loss_bbox_aux_dn: 2.603938 loss_giou_aux_dn: 6.139773 loss_mask_aux_dn: 1.318109 loss_dice_aux_dn: 13.829336 loss: 174.076584 eta: 0:06:40 batch_cost: 4.0012 data_cost: 0.1661 ips: 0.2499 images/s, max_mem_reserved: 2028 MB, max_mem_allocated: 1939 MB
[12/12 09:56:22] ppdet.utils.checkpoint INFO: Save checkpoint: /root/PaddleX/output/0
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[12/12 09:56:22] ppdet.engine INFO: Export inference config file to /root/PaddleX/output/0/inference/inference.yml
I1212 09:56:37.835078 331699 program_interpreter.cc:242] New Executor is Running.
[12/12 09:56:38] ppdet.engine INFO: Export model and saved in /root/PaddleX/output/0/inference
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
[12/12 09:56:39] ppdet.data.source.coco INFO: Load [9 samples valid, 0 samples invalid] in file /root/dataset/express_coco_instance_seg/annotations/instance_val.json.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Traceback (most recent call last):
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/tools/train.py", line 212, in
main()
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/tools/train.py", line 208, in main
run(FLAGS, cfg)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/tools/train.py", line 161, in run
trainer.train(FLAGS.eval)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/engine/trainer.py", line 685, in train
self._eval_with_loader(self._eval_loader)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/engine/trainer.py", line 718, in _eval_with_loader
outs = self.model(data)
File "/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/nn/layer/layers.py", line 1532, in call
return self.forward(*inputs, **kwargs)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 76, in forward
outs.append(self.get_pred())
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/architectures/detr.py", line 118, in get_pred
return self._forward()
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/architectures/detr.py", line 105, in _forward
bbox, bbox_num, mask = self.post_process(
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/post_process.py", line 574, in call
mask_pred, scores = self._mask_postprocess(masks, scores)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/post_process.py", line 481, in _mask_postprocess
mask_score = F.sigmoid(mask_pred)
File "/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/tensor/ops.py", line 815, in sigmoid
return _C_ops.sigmoid(x)
MemoryError:

C++ Traceback (most recent call last):

0 paddle::pybind::eager_api_sigmoid(_object*, _object*, _object*)
1 sigmoid_ad_func(paddle::Tensor const&)
2 paddle::experimental::sigmoid(paddle::Tensor const&)
3 phi::KernelImpl<void ()(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor), &(void phi::SigmoidKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor*))>::VariadicCompute(phi::DeviceContext const&, phi::DenseTensor const&, phi::DenseTensor*)
4 void phi::ActivationGPUImpl<float, phi::GPUContext, phi::funcs::CudaSigmoidFunctor >(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor*, phi::funcs::CudaSigmoidFunctor const&)
5 float* phi::DeviceContext::Alloc(phi::TensorBase*, unsigned long, bool) const
6 phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
7 paddle::memory::allocation::Allocator::Allocate(unsigned long)
8 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
LAUNCH INFO 2024-12-12 09:56:52,708 Pod failed
LAUNCH ERROR 2024-12-12 09:56:52,709 Container failed !!!
Container rank 0 status failed cmd ['/root/miniconda3/envs/ocr/bin/python', '-u', 'tools/train.py', '--eval', '--config', '/root/.paddlex/tmpnzorflxb/instancesegmodel_Mask-RT-DETR-L.yml', '--use_vdl', 'True', '--vdl_log_dir', '/root/PaddleX/output'] code 1 log /root/PaddleX/output/distributed_train_logs/workerlog.0
LAUNCH INFO 2024-12-12 09:56:52,709 ------------------------- ERROR LOG DETAIL -------------------------
LAUNCH INFO 2024-12-12 09:56:54,512 Exit code 1
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::Allocator::Allocate(unsigned long)
11 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
12 common::enforce::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 7.392883GB memory on GPU 0, 9.610779GB memory has been allocated and available memory is only 6.154968GB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
(at ../paddle/phi/core/memory/allocation/cuda_allocator.cc:71)

I1212 09:56:51.151332 331699 process_group_nccl.cc:155] ProcessGroupNCCL destruct
I1212 09:56:51.404326 331753 tcp_store.cc:290] receive shutdown event and so quit from MasterDaemon run loop
rch.py", line 76, in forward
outs.append(self.get_pred())
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/architectures/detr.py", line 118, in get_pred
return self._forward()
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/architectures/detr.py", line 105, in _forward
bbox, bbox_num, mask = self.post_process(
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/post_process.py", line 574, in call
mask_pred, scores = self._mask_postprocess(masks, scores)
File "/root/PaddleX/paddlex/repo_manager/repos/PaddleDetection/ppdet/modeling/post_process.py", line 481, in _mask_postprocess
mask_score = F.sigmoid(mask_pred)
File "/root/miniconda3/envs/ocr/lib/python3.10/site-packages/paddle/tensor/ops.py", line 815, in sigmoid
return _C_ops.sigmoid(x)
MemoryError:

C++ Traceback (most recent call last):

0 paddle::pybind::eager_api_sigmoid(_object*, _object*, _object*)
1 sigmoid_ad_func(paddle::Tensor const&)
2 paddle::experimental::sigmoid(paddle::Tensor const&)
3 phi::KernelImpl<void ()(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor), &(void phi::SigmoidKernel<float, phi::GPUContext>(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor*))>::VariadicCompute(phi::DeviceContext const&, phi::DenseTensor const&, phi::DenseTensor*)
4 void phi::ActivationGPUImpl<float, phi::GPUContext, phi::funcs::CudaSigmoidFunctor >(phi::GPUContext const&, phi::DenseTensor const&, phi::DenseTensor*, phi::funcs::CudaSigmoidFunctor const&)
5 float* phi::DeviceContext::Alloc(phi::TensorBase*, unsigned long, bool) const
6 phi::DenseTensor::AllocateFrom(phi::Allocator*, phi::DataType, unsigned long, bool)
7 paddle::memory::allocation::Allocator::Allocate(unsigned long)
8 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
9 paddle::memory::allocation::Allocator::Allocate(unsigned long)
10 paddle::memory::allocation::Allocator::Allocate(unsigned long)
11 std::string phi::enforce::GetCompleteTraceBackString<std::string >(std::string&&, char const*, int)
12 common::enforce::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 7.392883GB memory on GPU 0, 9.610779GB memory has been allocated and available memory is only 6.154968GB.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.
(at ../paddle/phi/core/memory/allocation/cuda_allocator.cc:71)

I1212 09:56:51.151332 331699 process_group_nccl.cc:155] ProcessGroupNCCL destruct
I1212 09:56:51.404326 331753 tcp_store.cc:290] receive shutdown event and so quit from MasterDaemon run loop
Traceback (most recent call last):
File "/root/PaddleX/paddlex/utils/result_saver.py", line 29, in wrap
result = func(self, *args, **kwargs)
File "/root/PaddleX/paddlex/engine.py", line 41, in run
self._model.train()
File "/root/PaddleX/paddlex/model.py", line 94, in train
trainer.train()
File "/root/PaddleX/paddlex/modules/base/trainer.py", line 71, in train
train_result = self.pdx_model.train(**train_args)
File "/root/PaddleX/paddlex/repo_apis/PaddleDetection_api/instance_seg/model.py", line 137, in train
return self.runner.train(
File "/root/PaddleX/paddlex/repo_apis/PaddleDetection_api/instance_seg/runner.py", line 55, in train
return self.run_cmd(
File "/root/PaddleX/paddlex/repo_apis/base/runner.py", line 355, in run_cmd
raise CalledProcessError(
paddlex.utils.errors.others.CalledProcessError: Command ['/root/miniconda3/envs/ocr/bin/python', '-m', 'paddle.distributed.launch', '--devices', '0,1,2,3', '--log_dir', '/root/PaddleX/output/distributed_train_logs', 'tools/train.py', '--eval', '--config', '/root/.paddlex/tmpnzorflxb/instancesegmodel_Mask-RT-DETR-L.yml', '--use_vdl', 'True', '--vdl_log_dir', '/root/PaddleX/output'] returned non-zero exit status 1.

环境

请提供您使用的PaddlePaddle、PaddleX版本号、Python版本号
PaddlePaddle 和 PaddleX 版本：3.0.0b2 Python版本：3.10
请提供您使用的操作系统信息，如Linux/Windows/MacOS
linux Ubuntu 22.04
请问您使用的CUDA/cuDNN的版本号是？
cuda=12.3

188080501 · 2024-12-12T03:01:15Z

修改yaml中的batch_size试下呢

报错信息有说：
Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

yzyMichael · 2024-12-12T03:53:31Z

epochs_iters: 从40 调整到10 和 batch_size: 从 2 调整到 1 都是一样内存不足

zhang-prog · 2024-12-12T04:00:48Z

是不是GPU0上有其他任务呢？找一张空卡试一下？

yzyMichael · 2024-12-12T05:07:50Z

没有，查过了，第一张卡的内存达到10G左右报内存不足了

yzyMichael · 2024-12-12T05:16:36Z

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

yzyMichael · 2024-12-12T05:17:39Z

yzyMichael · 2024-12-12T05:26:03Z

cuicheng01 · 2024-12-12T07:56:35Z

请问您使用的paddle版本是？

yzyMichael · 2024-12-12T08:57:33Z

zhang-prog · 2024-12-13T10:46:18Z

我刚刚使用 3.0b2 的 PaddleX 和 Paddle 以及示例数据集尝试进行训练，结果可以正常训练，显存占用10G：

辛苦有空先按照实例分割模块使用教程里的训练流程下载示例数据集进行训练，看下是否正常。

yzyMichael · 2024-12-15T14:42:02Z

我按照实例分割模块使用教程训练流程下载示例数据集进行训练整体流程都正常，就是使用自定义数据集就报这样的问题。

yzyMichael · 2024-12-16T01:26:38Z

自定义数据集特点：每张图片分辨率：4024 ~ 5440 ×3036 ~ 3648 之间，每张图片中的实例在1~5个，自定义数据集图片总数量：47张。

zhang-prog · 2024-12-16T07:13:10Z

可能是输入图片太大了，可以resize一下自定义数据集，或者试试修改下 BatchRandomResize，将大于640的删除。

yzyMichael · 2024-12-16T08:02:40Z

好的，我试一下resize,另外咨询下resize后，标注实例的位置也会同等resize吗

zhang-prog · 2024-12-16T08:08:07Z

你先试试修改 BatchRandomResize 看可不可行，因为自己 resize 数据集也要resize标注的，有点麻烦。

yzyMichael · 2024-12-16T08:24:15Z

我自定义数据集的原始图片都是超大图片，没有低于1080P的图片所以resize是最好的选择。有关于修改resize的相关文档吗

zhang-prog · 2024-12-16T08:49:51Z

没有相关文档。
稍等我先排查下是不是 BatchRandomResize 没生效而导致将整张图片都传入训练了，因为现在显存貌似和输入图片尺寸相关了。

zhang-prog · 2024-12-16T08:52:08Z

可能要等待一段时间，我有结论会同步到这里

zhang-prog · 2024-12-16T11:08:56Z

BatchRandomResize 是正常生效的，训练过程没问题。
问题发生在后处理做mask时，要用到原始图片分辨率。因为你的图片太大，所以导致OOM。

另外，因为直接用超大图片去训练也是不太合理的，所以我有几个解决方法供参考：

resize 数据集和标注，需要自己去解决。
换更小的数据集。
用更大显存的显卡。

TingquanGao assigned zhang-prog Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PaddleX 实例分割模型训练自定义数据集时报内存溢出 #2634

PaddleX 实例分割模型训练自定义数据集时报内存溢出 #2634

yzyMichael commented Dec 12, 2024 •

edited by cuicheng01

Loading

188080501 commented Dec 12, 2024 •

edited

Loading

yzyMichael commented Dec 12, 2024

zhang-prog commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

cuicheng01 commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

zhang-prog commented Dec 13, 2024

yzyMichael commented Dec 15, 2024

yzyMichael commented Dec 16, 2024 •

edited

Loading

zhang-prog commented Dec 16, 2024

yzyMichael commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

yzyMichael commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

PaddleX 实例分割模型训练自定义数据集时报内存溢出 #2634

PaddleX 实例分割模型训练自定义数据集时报内存溢出 #2634

Comments

yzyMichael commented Dec 12, 2024 • edited by cuicheng01 Loading

Checklist:

描述问题

复现

C++ Traceback (most recent call last):

Error Message Summary:

C++ Traceback (most recent call last):

Error Message Summary:

环境

188080501 commented Dec 12, 2024 • edited Loading

yzyMichael commented Dec 12, 2024

zhang-prog commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

cuicheng01 commented Dec 12, 2024

yzyMichael commented Dec 12, 2024

zhang-prog commented Dec 13, 2024

yzyMichael commented Dec 15, 2024

yzyMichael commented Dec 16, 2024 • edited Loading

zhang-prog commented Dec 16, 2024

yzyMichael commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

yzyMichael commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

zhang-prog commented Dec 16, 2024

yzyMichael commented Dec 12, 2024 •

edited by cuicheng01

Loading

188080501 commented Dec 12, 2024 •

edited

Loading

yzyMichael commented Dec 16, 2024 •

edited

Loading