Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compile_pytorch_model.py compile failures (model_path/constants.pkl not found) #22

Open
ljkeller opened this issue Jun 4, 2024 · 7 comments

Comments

@ljkeller
Copy link

ljkeller commented Jun 4, 2024

Hello,

I'm having compile failures with compile_pytorch_model.py. Heres my failure:

/drp-ai_tvm/tutorials# python3 compile_pytorch_model.py /home/models/spark_torch.pt -o spark_torch -s 1,3,28,28
[Check arguments]
  Input AI model         :  /home/models/spark_torch.pt
  SDK path               :  /opt/poky/3.1.21
  DRP-AI Translator path :  /opt/drp-ai_translator_release
  Output dir             :  spark_torch
  Input shape            :  (1, 3, 28, 28)
Traceback (most recent call last):
  File "compile_pytorch_model.py", line 69, in <module>
    model = torch.jit.load(model_file)
  File "/usr/local/lib/python3.8/dist-packages/torch/jit/_serialization.py", line 161, in load
    cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
RuntimeError: [enforce fail at inline_container.cc:222] . file not found: v1.0.1_9_epochs_no_norm_97.27/constants.pkl

Interestingly, I trained this model and deployed to both torch and onnx formats. The onnx export works python3 compile_onnx_model.py /home/models/spark.onnx -o spark -s 1,3,28,28 -i input.

I'm guessing there is a version incompatibility with the torch I trained/exported on and the torch used here for the conversion? I don't see any documentation about expected torch training versions. I don't have my model training PC with me right now, or I'd report the torch version.

Here are the models I've tried spark.zip

Environment

I'm running out of a docker container I built ~6 months ago with docker build -t rzv2l_ai_sdk_image --build-arg SDK="/opt/poky/3.1.21" --build-arg PRODUCT="V2L" as far as I know.

@matinlotfali
Copy link

@ljkeller I have the exact same issue. Were you able to fix it?

@ljkeller
Copy link
Author

ljkeller commented Oct 17, 2024

@ljkeller I have the exact same issue. Were you able to fix it?

I remember having multiple issues that day.

I know I put a PR; I don't think its related? #23 worth looking at.

Otherwise, IIRC, this was a torch or python versioning issue. Unfortunately the best advice I'd have is to binary search through python/torch versions. I think one of the python versions changed model exporting. I think a shortcut is to check for constants.pkl, but I don't remember very well.

I've found ONNX to be much friendlier to use, but even that has an implicit versioning requirement.

@matinlotfali please let me know if you get around the issue.

@matinlotfali
Copy link

I just learned that the PyTorch model file should be converted to a TorchScripted model via torch.jit.trace to work.

@ljkeller
Copy link
Author

ljkeller commented Oct 23, 2024

I just learned that the PyTorch model file should be converted to a TorchScripted model via torch.jit.trace to work.

I've definitely compiled torch models without doing this explicitly. Do you have a link so I can read up on this? That's frustrating.

@matinlotfali
Copy link

I think this is a nice reading material: https://www.geeksforgeeks.org/what-are-torch-scripts-in-pytorch/

@ljkeller
Copy link
Author

ljkeller commented Oct 24, 2024

Yes, but I was looking for an explicit callout of the necessity of the jit. Even the TVM docs don't appear to say much from what I've seen.

We grab the TorchScripted model via tracing

is all their torch compile guide says..

I think this is a recipe for wasted developer time. I could add a failure log warning to tutorials/compile_pytorch_model*.py files on the jit.load if anyone thinks that would be useful? That or some documentation could be updated- I'm not sure where.

@hiroyuki-sakamoto seems to be listening. What do you think?

@hiroyuki-sakamoto
Copy link
Collaborator

Sorry for the delay in responding due to the confusion that accompanied the v2.4.0 release.
I have been looking at this issue 22 and feel it is necessary to add a note that the TorchScript model is required. Also, for various reasons, we are not accustomed to receiving contributions and incorporating them into the Quick, but we appreciate your offer. The next update is scheduled for the end of this year, when we hope to include documentation improvements, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants