We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
平台为win10 训练时的cmd里记录如下:
LAUNCH INFO 2024-12-23 13:55:59,670 ----------- Configuration ---------------------- LAUNCH INFO 2024-12-23 13:55:59,671 auto_parallel_config: None LAUNCH INFO 2024-12-23 13:55:59,671 auto_tuner_json: None LAUNCH INFO 2024-12-23 13:55:59,671 devices: 0,1 LAUNCH INFO 2024-12-23 13:55:59,671 elastic_level: -1 LAUNCH INFO 2024-12-23 13:55:59,671 elastic_timeout: 30 LAUNCH INFO 2024-12-23 13:55:59,671 enable_gpu_log: True LAUNCH INFO 2024-12-23 13:55:59,671 gloo_port: 6767 LAUNCH INFO 2024-12-23 13:55:59,671 host: None LAUNCH INFO 2024-12-23 13:55:59,671 ips: None LAUNCH INFO 2024-12-23 13:55:59,671 job_id: default LAUNCH INFO 2024-12-23 13:55:59,671 legacy: False LAUNCH INFO 2024-12-23 13:55:59,671 log_dir: D:\model\ccd2-1\distributed_train_logs LAUNCH INFO 2024-12-23 13:55:59,671 log_level: INFO LAUNCH INFO 2024-12-23 13:55:59,671 log_overwrite: False LAUNCH INFO 2024-12-23 13:55:59,671 master: None LAUNCH INFO 2024-12-23 13:55:59,671 max_restart: 3 LAUNCH INFO 2024-12-23 13:55:59,671 nnodes: 1 LAUNCH INFO 2024-12-23 13:55:59,671 nproc_per_node: None LAUNCH INFO 2024-12-23 13:55:59,671 rank: -1 LAUNCH INFO 2024-12-23 13:55:59,671 run_mode: collective LAUNCH INFO 2024-12-23 13:55:59,671 server_num: None LAUNCH INFO 2024-12-23 13:55:59,671 servers: LAUNCH INFO 2024-12-23 13:55:59,671 sort_ip: False LAUNCH INFO 2024-12-23 13:55:59,671 start_port: 6070 LAUNCH INFO 2024-12-23 13:55:59,671 trainer_num: None LAUNCH INFO 2024-12-23 13:55:59,671 trainers: LAUNCH INFO 2024-12-23 13:55:59,671 training_script: tools/train.py LAUNCH INFO 2024-12-23 13:55:59,671 training_script_args: ['--do_eval', '--config', 'C:\\Users\\user\\.paddlex\\tmp2_33b798\\segmodel_Deeplabv3_Plus-R50.yml', '--batch_size', '2', '--learning_rate', '0.001', '--iters', '88000', '--device', 'gpu', '--use_vdl', '--save_dir', 'D:\\model\\ccd2-1', '--save_interval', '1100', '--log_iters', '10'] LAUNCH INFO 2024-12-23 13:55:59,671 with_gloo: 1 LAUNCH INFO 2024-12-23 13:55:59,671 -------------------------------------------------- LAUNCH INFO 2024-12-23 13:55:59,672 Job: default, mode collective, replicas 1[1:1], elastic False LAUNCH INFO 2024-12-23 13:55:59,673 Run Pod: fqfbaa, replicas 2, status ready LAUNCH INFO 2024-12-23 13:55:59,679 Watching Pod: fqfbaa, replicas 2, status running LAUNCH WARNING 2024-12-23 13:55:59,779 save gpu info failed LAUNCH INFO 2024-12-23 13:56:02,684 Pod failed LAUNCH ERROR 2024-12-23 13:56:02,684 Container failed !!! Container rank 0 status failed cmd ['C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\python.exe', '-u', 'tools/train.py', '--do_eval', '--config', 'C:\\Users\\user\\.paddlex\\tmp2_33b798\\segmodel_Deeplabv3_Plus-R50.yml', '--batch_size', '2', '--learning_rate', '0.001', '--iters', '88000', '--device', 'gpu', '--use_vdl', '--save_dir', 'D:\\model\\ccd2-1', '--save_interval', '1100', '--log_iters', '10'] code 1 log D:\model\ccd2-1\distributed_train_logs\workerlog.0 env {'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\user\\AppData\\Roaming', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', 'COMMONPROGRAMFILES(X86)': 'C:\\Program Files (x86)\\Common Files', 'COMMONPROGRAMW6432': 'C:\\Program Files\\Common Files', 'COMPUTERNAME': 'AI2', 'COMSPEC': 'C:\\Windows\\system32\\cmd.exe', 'CONDA_DEFAULT_ENV': 'paddlex_det', 'CONDA_EXE': 'C:\\ProgramData\\anaconda3\\Scripts\\conda.exe', 'CONDA_PREFIX': 'C:\\ProgramData\\anaconda3\\envs\\paddlex_det', 'CONDA_PREFIX_1': 'C:\\ProgramData\\anaconda3', 'CONDA_PROMPT_MODIFIER': '(paddlex_det) ', 'CONDA_PYTHON_EXE': 'C:\\ProgramData\\anaconda3\\python.exe', 'CONDA_SHLVL': '2', 'CUDA_PATH': 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8', 'CUDA_PATH_V11_8': 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8', 'DRIVERDATA': 'C:\\Windows\\System32\\Drivers\\DriverData', 'FLAGS_ENABLE_PIR_API': '0', 'FLAGS_JSON_FORMAT_MODEL': '0', 'FPS_BROWSER_APP_PROFILE_STRING': 'Internet Explorer', 'FPS_BROWSER_USER_PROFILE_STRING': 'Default', 'GENICAM_CACHE_V2_4': 'C:\\Program Files\\Cognex\\Common\\genicam\\cache', 'GENICAM_GENTL32_PATH': 'C:\\Program Files (x86)\\Common Files\\MVS\\Runtime\\Win32_i86', 'GENICAM_GENTL64_PATH': 'C:\\Program Files (x86)\\Common Files\\MVS\\Runtime\\Win64_x64', 'GENICAM_ROOT_V2_4': 'C:\\Program Files\\Cognex\\Common\\genicam', 'HOMEDRIVE': 'C:', 'HOMEPATH': '\\Users\\user', 'IGCCSVC_DB': 'AQAAANCMnd8BFdERjHoAwE/Cl+sBAAAAesNkBH7uNkqtZYc2tEuS3QQAAAACAAAAAAAQZgAAAAEAACAAAABo85rIEbFnFvjA5JNLZ0BMuiP6JFD6HB5/d5wa6rBDGAAAAAAOgAAAAAIAACAAAADDIHQo9IH+cKwwt9BzLQO+g2/PZFgmDYlb5ros7gqIW2AAAACtZzl3taFQ0VaWDnwYAIoK0OB4qQqRLygJjYoOnAkaVQAdMaba7tSy/UVM7Y+oXrxw4QY5EJiboqFLxn1hSVr7kf6eEt1KKPg/2dGzxPKSj8NxEZhkIQuhfsDr8yCKAY5AAAAA3Ep71lUUHDlozpmxFD+49X0eDX4eXF35ADoX93nccTpy3hWWXVZEredANb55n3iVV9SH+DuEem6+JJUbA43png==', 'KMP_DUPLICATE_LIB_OK': 'True', 'KMP_INIT_AT_FORK': 'FALSE', 'LOCALAPPDATA': 'C:\\Users\\user\\AppData\\Local', 'LOGONSERVER': '\\\\AI2', 'MVCAM_COMMON_RUNENV': 'C:\\Program Files (x86)\\MVS\\Development', 'MVCAM_GENICAM_CLPROTOCOL': 'C:\\Program Files (x86)\\Common Files\\MVS\\Runtime\\CLProtocol', 'MVCAM_GIGE_DEBUG_HEARTBEAT': '60000', 'NIEXTCCOMPILERSUPP': 'C:\\Program Files (x86)\\National Instruments\\Shared\\ExternalCompilerSupport\\C\\', 'NI_MO_INSTALL_PATH': 'C:\\Users\\Public\\Documents\\National Instruments\\model_optimizer\\', 'NUMBER_OF_PROCESSORS': '32', 'NVTOOLSEXT_PATH': 'C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\', 'OMP_NUM_THREADS': '1', 'ONEDRIVE': 'C:\\Users\\user\\OneDrive', 'OS': 'Windows_NT', 'OV_MO_INSTALL_PATH': 'C:\\Users\\Public\\Documents\\National Instruments\\intel_model_optimizer\\', 'OV_NI_PLUGIN_DIR': 'C:\\Program Files\\National Instruments\\Shared\\OpenVINO\\', 'PADDLE_PDX_PADDLECLAS_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleClas', 'PADDLE_PDX_PADDLEDETECTION_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleDetection', 'PADDLE_PDX_PADDLENLP_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleNLP', 'PADDLE_PDX_PADDLEOCR_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleOCR', 'PADDLE_PDX_PADDLESEG_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleSeg', 'PADDLE_PDX_PADDLETS_PATH': 'C:\\Paddle\\PaddleX-release-3.0-beta2\\paddlex\\repo_manager\\repos\\PaddleTS', 'PATH': 'C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Lib\\site-packages\\cv2\\../../x64/vc14/bin;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\base;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\base\\..\\libs;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\base;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\base\\..\\libs;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Lib\\site-packages\\cv2\\../../x64/vc14/bin;C:\\ProgramData\\anaconda3\\envs\\paddlex_det;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Library\\usr\\bin;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Library\\bin;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Scripts;C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\bin;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\libnvvp;.;C:\\Program Files\\National Instruments\\Shared\\OpenVINO;C:\\Program Files (x86)\\Common Files\\MVS\\Runtime\\Win32_i86;C:\\Program Files (x86)\\Common Files\\MVS\\Runtime\\Win64_x64;C:\\Windows\\system32;C:\\Windows;C:\\Windows\\System32\\Wbem;C:\\Windows\\System32\\WindowsPowerShell\\v1.0;C:\\Windows\\System32\\OpenSSH;C:\\Program Files\\dotnet;C:\\Program Files (x86)\\National Instruments\\Shared\\LabVIEW CLI;C:\\Program Files\\Cognex\\VisionPro\\bin;C:\\Program Files\\Common Files\\Pleora\\eBUS SDK;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2022.3.0;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\include;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.8\\TensorRT-8.5.1.7\\lib;C:\\Program Files\\Microsoft SQL Server\\150\\Tools\\Binn;C:\\Program Files\\Microsoft SQL Server\\Client SDK\\ODBC\\170\\Tools\\Binn;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit;C:\\Program Files\\NVIDIA Corporation\\NVIDIA NvDLISR;C:\\Users\\user\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\user\\.dotnet\\tools', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC', 'PROCESSOR_ARCHITECTURE': 'AMD64', 'PROCESSOR_IDENTIFIER': 'Intel64 Family 6 Model 183 Stepping 1, GenuineIntel', 'PROCESSOR_LEVEL': '6', 'PROCESSOR_REVISION': 'b701', 'PROGRAMDATA': 'C:\\ProgramData', 'PROGRAMFILES': 'C:\\Program Files', 'PROGRAMFILES(X86)': 'C:\\Program Files (x86)', 'PROGRAMW6432': 'C:\\Program Files', 'PROMPT': '(paddlex_det) $P$G', 'PSMODULEPATH': 'C:\\Program Files\\WindowsPowerShell\\Modules;C:\\Windows\\system32\\WindowsPowerShell\\v1.0\\Modules', 'PUBLIC': 'C:\\Users\\Public', 'SESSIONNAME': 'Console', 'SSL_CERT_DIR': 'C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\Library\\ssl\\certs', 'SSL_CERT_FILE': 'C:\\ProgramData\\anaconda3\\Library\\ssl\\cacert.pem', 'SYSTEMDRIVE': 'C:', 'SYSTEMROOT': 'C:\\Windows', 'TEMP': 'C:\\Users\\user\\AppData\\Local\\Temp', 'TMP': 'C:\\Users\\user\\AppData\\Local\\Temp', 'USERDOMAIN': 'AI2', 'USERDOMAIN_ROAMINGPROFILE': 'AI2', 'USERNAME': 'user', 'USERPROFILE': 'C:\\Users\\user', 'VPRO32_ROOT': 'C:\\Program Files (x86)\\Cognex\\VisionPro', 'VPRO_ROOT': 'C:\\Program Files\\Cognex\\VisionPro', 'WINDIR': 'C:\\Windows', 'ZES_ENABLE_SYSMAN': '1', '__CONDA_OPENSLL_CERT_FILE_SET': '"1"', '__CONDA_OPENSSL_CERT_DIR_SET': '"1"', 'CUSTOM_DEVICE_ROOT': '', 'POD_NAME': 'fqfbaa', 'PADDLE_MASTER': '127.0.0.1:53817', 'PADDLE_GLOBAL_SIZE': '2', 'PADDLE_LOCAL_SIZE': '2', 'PADDLE_GLOBAL_RANK': '0', 'PADDLE_LOCAL_RANK': '0', 'PADDLE_NNODES': '1', 'PADDLE_CURRENT_ENDPOINT': '127.0.0.1:53818', 'PADDLE_TRAINER_ID': '0', 'PADDLE_TRAINERS_NUM': '2', 'PADDLE_RANK_IN_NODE': '0', 'PADDLE_TRAINER_ENDPOINTS': '127.0.0.1:53818,127.0.0.1:53819', 'FLAGS_selected_gpus': '0', 'PADDLE_LOG_DIR': 'D:\\model\\ccd2-1\\distributed_train_logs'} LAUNCH INFO 2024-12-23 13:56:02,684 ------------------------- ERROR LOG DETAIL ------------------------- LAUNCH INFO 2024-12-23 13:56:02,684 Exit code 1 [2024/12/23 13:56:01] INFO: ------------Environment Information------------- platform: Windows-10-10.0.19045-SP0 Python: 3.9.20 | packaged by conda-forge | (main, Sep 30 2024, 17:43:23) [MSC v.1929 64 bit (AMD64)] Paddle compiled with cuda: True NVCC: Build cuda_11.8.r11.8/compiler.31833905_0 cudnn: 8.9 GPUs used: 2 CUDA_VISIBLE_DEVICES: None GPU: ['GPU 0: NVIDIA GeForce', 'GPU 1: NVIDIA GeForce'] PaddleSeg: 0.0.0.dev0 PaddlePaddle: 3.0.0-beta1 OpenCV: 4.5.5 ------------------------------------------------ [2024/12/23 13:56:01] INFO: ---------------Config Information--------------- batch_size: 2 iters: 88000 train_dataset: dataset_root: D:\VB\CCD2 mode: train num_classes: 15 train_path: D:\VB\CCD2\train.txt transforms: - max_scale_factor: 1 min_scale_factor: 1 scale_step_size: 0.25 type: ResizeStepScaling - crop_size: - 1600 - 400 type: RandomPaddingCrop - type: RandomHorizontalFlip - type: Normalize type: SegDataset val_dataset: dataset_root: D:\VB\CCD2 mode: val num_classes: 15 transforms: - type: Normalize type: SegDataset val_path: D:\VB\CCD2\val.txt optimizer: momentum: 0.9 type: SGD weight_decay: 4.0e-05 lr_scheduler: end_lr: 0 learning_rate: 0.001 power: 0.9 type: PolynomialDecay loss: coef: - 1 types: - type: CrossEntropyLoss model: align_corners: false aspp_out_channels: 256 aspp_ratios: - 1 - 12 - 24 - 36 backbone: multi_grid: - 1 - 2 - 4 output_stride: 8 pretrained: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/Deeplabv3_Plus-R50_backbone_imagenet_pretrained.pdparams type: ResNet50_vd backbone_indices: - 0 - 3 num_classes: 15 pretrained: null type: DeepLabV3P pdx_model_name: Deeplabv3_Plus-R50 uniform_output_enabled: true ------------------------------------------------ [2024/12/23 13:56:01] INFO: Set device: gpu [2024/12/23 13:56:01] INFO: Use the following config to build model model: align_corners: false aspp_out_channels: 256 aspp_ratios: - 1 - 12 - 24 - 36 backbone: multi_grid: - 1 - 2 - 4 output_stride: 8 pretrained: https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/Deeplabv3_Plus-R50_backbone_imagenet_pretrained.pdparams type: ResNet50_vd backbone_indices: - 0 - 3 num_classes: 15 pretrained: null type: DeepLabV3P W1223 13:56:01.486145 5624 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 12.7, Runtime API Version: 11.8 W1223 13:56:01.486145 5624 gpu_resources.cc:164] device: 0, cuDNN Version: 8.9. [2024/12/23 13:56:01] INFO: Loading pretrained model from https://paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/Deeplabv3_Plus-R50_backbone_imagenet_pretrained.pdparams [2024/12/23 13:56:02] INFO: There are 275/275 variables loaded into ResNet_vd. [2024/12/23 13:56:02] INFO: Convert bn to sync_bn [2024/12/23 13:56:02] INFO: Use the following config to build train_dataset train_dataset: dataset_root: D:\VB\CCD2 mode: train num_classes: 15 train_path: D:\VB\CCD2\train.txt transforms: - max_scale_factor: 1 min_scale_factor: 1 scale_step_size: 0.25 type: ResizeStepScaling - crop_size: - 1600 - 400 type: RandomPaddingCrop - type: RandomHorizontalFlip - type: Normalize type: SegDataset [2024/12/23 13:56:02] INFO: Use the following config to build val_dataset val_dataset: dataset_root: D:\VB\CCD2 mode: val num_classes: 15 transforms: - type: Normalize type: SegDataset val_path: D:\VB\CCD2\val.txt [2024/12/23 13:56:02] INFO: If the type is SGD and momentum in optimizer config, the type is changed to Momentum. [2024/12/23 13:56:02] INFO: Use the following config to build optimizer optimizer: momentum: 0.9 type: Momentum weight_decay: 4.0e-05 [2024/12/23 13:56:02] INFO: Use the following config to build loss loss: coef: - 1 types: - type: CrossEntropyLoss [2024-12-23 13:56:02,304] [ INFO] distributed_strategy.py:214 - distributed strategy initialized ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='') FLAGS(name='FLAGS_win_cuda_bin_dir', current_value='C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\..\\nvidia', default_value='') ======================================================================= I1223 13:56:02.305222 5624 tcp_utils.cc:181] The server starts to listen on IP_ANY:53817 I1223 13:56:02.305222 5624 tcp_utils.cc:130] Successfully connected to 127.0.0.1:53817 Traceback (most recent call last): File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_manager\repos\PaddleSeg\tools\train.py", line 252, in <module> main(args) File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_manager\repos\PaddleSeg\tools\train.py", line 222, in main train(model, File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddleseg\core\train.py", line 155, in train paddle.distributed.fleet.init(is_collective=True) File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\fleet\fleet.py", line 283, in init paddle.distributed.init_parallel_env() File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\parallel.py", line 1103, in init_parallel_env pg = _new_process_group_impl( File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\collective.py", line 158, in _new_process_group_impl pg = core.ProcessGroupNCCL.create( AttributeError: module 'paddle.base.libpaddle' has no attribute 'ProcessGroupNCCL' labv3_Plus-R50_backbone_imagenet_pretrained.pdparams [2024/12/23 13:56:02] INFO: There are 275/275 variables loaded into ResNet_vd. [2024/12/23 13:56:02] INFO: Convert bn to sync_bn [2024/12/23 13:56:02] INFO: Use the following config to build train_dataset train_dataset: dataset_root: D:\VB\CCD2 mode: train num_classes: 15 train_path: D:\VB\CCD2\train.txt transforms: - max_scale_factor: 1 min_scale_factor: 1 scale_step_size: 0.25 type: ResizeStepScaling - crop_size: - 1600 - 400 type: RandomPaddingCrop - type: RandomHorizontalFlip - type: Normalize type: SegDataset [2024/12/23 13:56:02] INFO: Use the following config to build val_dataset val_dataset: dataset_root: D:\VB\CCD2 mode: val num_classes: 15 transforms: - type: Normalize type: SegDataset val_path: D:\VB\CCD2\val.txt [2024/12/23 13:56:02] INFO: If the type is SGD and momentum in optimizer config, the type is changed to Momentum. [2024/12/23 13:56:02] INFO: Use the following config to build optimizer optimizer: momentum: 0.9 type: Momentum weight_decay: 4.0e-05 [2024/12/23 13:56:02] INFO: Use the following config to build loss loss: coef: - 1 types: - type: CrossEntropyLoss [2024-12-23 13:56:02,304] [ INFO] distributed_strategy.py:214 - distributed strategy initialized ======================= Modified FLAGS detected ======================= FLAGS(name='FLAGS_selected_gpus', current_value='0', default_value='') FLAGS(name='FLAGS_win_cuda_bin_dir', current_value='C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\lib\\site-packages\\paddle\\..\\nvidia', default_value='') ======================================================================= I1223 13:56:02.305222 5624 tcp_utils.cc:181] The server starts to listen on IP_ANY:53817 I1223 13:56:02.305222 5624 tcp_utils.cc:130] Successfully connected to 127.0.0.1:53817 Traceback (most recent call last): File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_manager\repos\PaddleSeg\tools\train.py", line 252, in <module> main(args) File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_manager\repos\PaddleSeg\tools\train.py", line 222, in main train(model, File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddleseg\core\train.py", line 155, in train paddle.distributed.fleet.init(is_collective=True) File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\fleet\fleet.py", line 283, in init paddle.distributed.init_parallel_env() File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\parallel.py", line 1103, in init_parallel_env pg = _new_process_group_impl( File "C:\ProgramData\anaconda3\envs\paddlex_det\lib\site-packages\paddle\distributed\collective.py", line 158, in _new_process_group_impl pg = core.ProcessGroupNCCL.create( AttributeError: module 'paddle.base.libpaddle' has no attribute 'ProcessGroupNCCL' Traceback (most recent call last): File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\utils\result_saver.py", line 29, in wrap result = func(self, *args, **kwargs) File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\engine.py", line 41, in run self._model.train() File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\model.py", line 94, in train trainer.train() File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\modules\base\trainer.py", line 71, in train train_result = self.pdx_model.train(**train_args) File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_apis\PaddleSeg_api\seg\model.py", line 178, in train return self.runner.train( File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_apis\PaddleSeg_api\seg\runner.py", line 55, in train return self.run_cmd( File "C:\Paddle\PaddleX-release-3.0-beta2\paddlex\repo_apis\base\runner.py", line 355, in run_cmd raise CalledProcessError( paddlex.utils.errors.others.CalledProcessError: Command ['C:\\ProgramData\\anaconda3\\envs\\paddlex_det\\python.exe', '-m', 'paddle.distributed.launch', '--devices', '0,1', '--log_dir', 'D:\\model\\ccd2-1\\distributed_train_logs', 'tools/train.py', '--do_eval', '--config', 'C:\\Users\\user\\.paddlex\\tmp2_33b798\\segmodel_Deeplabv3_Plus-R50.yml', '--batch_size', '2', '--learning_rate', '0.001', '--iters', '88000', '--device', 'gpu', '--use_vdl', '--save_dir', 'D:\\model\\ccd2-1', '--save_interval', '1100', '--log_iters', '10'] returned non-zero exit status 1.
训练配置文件如下:
The text was updated successfully, but these errors were encountered:
补充一下,用的是conda安装paddlepaddle conda install paddlepaddle-gpu==3.0.0b2 paddlepaddle-cuda=11.8 -c paddle -c nvidia
conda install paddlepaddle-gpu==3.0.0b2 paddlepaddle-cuda=11.8 -c paddle -c nvidia
Sorry, something went wrong.
leo-q8
No branches or pull requests
平台为win10
训练时的cmd里记录如下:
训练配置文件如下:
The text was updated successfully, but these errors were encountered: