Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

ERROR: Strategy failed to execute. #5774

Open
ktunlab opened this issue Apr 26, 2024 · 0 comments
Open

ERROR: Strategy failed to execute. #5774

ktunlab opened this issue Apr 26, 2024 · 0 comments

Comments

@ktunlab
Copy link

ktunlab commented Apr 26, 2024

Describe the issue:
I’m trying to learn how to implement NAS using NNI. However, I'm getting the ‘ImportError: Cannot use a path to identify something from main.’ and ‘TypeError: cannot pickle 'CudnnModule' object’ errors listed below.

my code: https://github.com/ktunlab/nas-resnet-demo

Environment:

  • NNI version: 3.0
  • Training service (local|remote|pai|aml|etc): local
  • Client OS: Windows 10
  • Server OS (for remote mode only):
  • Python version: 3.8
  • PyTorch/TensorFlow version: pyTorch==2.3.0
  • Is conda/virtualenv/venv used?: Yes
  • Is running in Docker?: No

Configuration:

Log message:

  • nnictl stdout and stderr:

log:
[2024-04-26 15:45:28] Config is not provided. Will try to infer. [2024-04-26 15:45:28] Using execution engine based on training service. Trial concurrency is set to 1. [2024-04-26 15:45:28] Using simplified model format. [2024-04-26 15:45:28] Using local training service. [2024-04-26 15:45:28] WARNING: GPU found but will not be used. Please set experiment.config.trial_gpu_number` to the number of GPUs you want to use for each trial.
[2024-04-26 15:45:30] Creating experiment, Experiment ID: lyjc7okv
[2024-04-26 15:45:30] Starting web server...
[2024-04-26 15:45:30] Setting up...
[2024-04-26 15:45:30] Web portal URLs: http://172.22.9.46:8081 http://127.0.0.1:8081
[2024-04-26 15:45:30] Successfully update searchSpace.
[2024-04-26 15:45:30] Checkpoint saved to C:\Users\Lab-d\nni-experiments\lyjc7okv\checkpoint.
[2024-04-26 15:45:30] Experiment initialized successfully. Starting exploration strategy...
[2024-04-26 15:45:30] ERROR: Strategy failed to execute.
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 831, in get_hybrid_cls_or_func_name
name = _get_cls_or_func_name(cls_or_func)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 810, in _get_cls_or_func_name
raise ImportError('Cannot use a path to identify something from main.')
ImportError: Cannot use a path to identify something from main.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 103, in
exp.run(port=8081)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\experiment\experiment.py", line 236, in run
return self._run_impl(port, wait_completion, debug)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\experiment\experiment.py", line 205, in _run_impl
self.start(port, debug)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\experiment\experiment.py", line 270, in start
self._start_engine_and_strategy()
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\experiment\experiment.py", line 230, in _start_engine_and_strategy
self.strategy.run()
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\strategy\base.py", line 170, in run
self._run()
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\strategy\bruteforce.py", line 223, in _run
self.engine.submit_models(model)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\nas\execution\training_service.py", line 172, in submit_models
self._channel.send_trial(
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\runtime\tuner_command_channel\channel.py", line 144, in send_trial
send_payload = dump(trial_dict, pickle_size_limit=int(os.getenv('PICKLE_SIZE_LIMIT', 64 * 1024)))
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 372, in dump
result = _dump(
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 424, in _dump
return json_tricks.dumps(obj, obj_encoders=encoders, **json_tricks_kwargs)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\nonp.py", line 125, in dumps
txt = combined_encoder.encode(obj)
File "C:\ProgramData\Anaconda3\envs\proje\lib\json\encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\ProgramData\Anaconda3\envs\proje\lib\json\encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\encoders.py", line 76, in default
obj = encoder(obj, primitives=self.primitives, is_changed=id(obj) != prev_id, properties=self.properties)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\json_tricks\utils.py", line 66, in wrapper
return encoder(*args, **{k: v for k, v in kwargs.items() if k in names})
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 858, in _json_tricks_func_or_cls_encode
'nni_type': get_hybrid_cls_or_func_name(cls_or_func, pickle_size_limit)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\nni\common\serializer.py", line 835, in get_hybrid_cls_or_func_name
b = cloudpickle.dumps(cls_or_func)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\cloudpickle\cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "C:\ProgramData\Anaconda3\envs\proje\lib\site-packages\cloudpickle\cloudpickle.py", line 1245, in dump
return super().dump(obj)
TypeError: cannot pickle 'CudnnModule' object
[2024-04-26 15:45:30] Stopping experiment, please wait...
[2024-04-26 15:45:30] Checkpoint saved to C:\Users\Lab-d\nni-experiments\lyjc7okv\checkpoint.
[2024-04-26 15:45:30] Experiment stopped`

How to reproduce it?: python train.py

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant