Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to calibrate settings for a deep learning model #898

Open
theredarmy87 opened this issue Jan 3, 2025 · 0 comments
Open

Unable to calibrate settings for a deep learning model #898

theredarmy87 opened this issue Jan 3, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@theredarmy87
Copy link

Describe the bug

I have a large neural network (CNN) model with ~5 million parameters. I have not been able to do the calibrate_settings step. I have tried passing different max_logrows and scales parameters as well but had no luck:

RuntimeError: Failed to calibrate settings: [Uncategorized] calibration failed, could not find any suitable parameters given the calibration dataset

Expected behaviors

calibrate_settings to run successfully.

Steps to reproduce the bug

All the relevant files to reproduce the error are available in the following public Dropbox folder:
https://www.dropbox.com/scl/fo/hc861nn2fpfzmd3boanq6/AGIX_me24vLt-53LsRlshpc?rlkey=ae5yazlcphiu4j8nrybefibj6&dl=0

Please note that cal_data.json includes only one sample, whereas calibration_data.json includes a batch of 64 samples. Please feel free to use either one for calibration. .

Also, best_model.pth is my trained PyTorch model, but you probably don't need that since I have also included network.onnx which is the same model exported to ONNX format.

input.json is another sample data and settings.json is the setting file created after running the gen_settings code below.

base_path = os.getcwd()
model_path = os.path.join(base_path, 'network.onnx')
input_data_path = os.path.join(base_path, 'input.json')
settings_path = os.path.join(base_path, 'settings.json')
cal_data_path = os.path.join(base_path, 'cal_data.json')
calibration_data_path = os.path.join(base_path, 'calibration_data.json')

run_args = ezkl.PyRunArgs()
run_args.input_visibility = "hashed"
run_args.param_visibility = "hashed"
run_args.output_visibility = "public"
run_args.variables = [("batch_size", 1)]'

res = ezkl.gen_settings(model=model_path, output=settings_path, py_run_args=run_args)
assert res == True

The next step to calibrate settings fails.

res = await ezkl.calibrate_settings( calibration_data_path, model_path, settings_path, target="resources" )
assert res == True

Screenshot 1:

error1

Screenshot 2:

error2

Device and Operating System

!nvidia-smi

nvda

Additional Information

@theredarmy87 theredarmy87 added the bug Something isn't working label Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant