Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Modules Recompiling Every Run Despite No Code Changes #287

Closed
ricetwice opened this issue Aug 7, 2024 · 4 comments
Closed

[QUESTION] Modules Recompiling Every Run Despite No Code Changes #287

ricetwice opened this issue Aug 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@ricetwice
Copy link

I am encountering an issue with the Warp framework where certain modules are being recompiled every time I run my program, despite no changes being made to the code. Specifically, these modules show different hash values in the loading information with each run. Meanwhile, other modules were successfully loaded from the cache.

This recompilation process is quite time-consuming and significantly impacts the startup time of my program. I am seeking clarification on the following points:

  1. How does Warp calculate the hash value for a module?
  2. What could be causing certain modules to recompile every time, even without any modification to their code?

Any guidance or insights you could provide would be greatly appreciated.

@ricetwice ricetwice added the question The issue author requires information label Aug 7, 2024
@shi-eric
Copy link
Contributor

shi-eric commented Aug 7, 2024

Can you provide an example of the code output from two runs, one showing a cold start (Warp cache empty) and one from a subsequent run that demonstrates the issue? Ideally, we would like an example script we could run that illustrates the issue.

This function gives an overview of how the module hash is calculated: https://github.com/NVIDIA/warp/blob/main/warp/context.py#L1554-L1683 Structs, kernels, functions, and wp.constants all go into computing a module hash, as well as the hashes of any modules that the current module references.

For 2, a simple example I can think of is if you define a wp.constant using a random number generator that changes value on every run (e.g. no fixed seed). Since the values of wp.constant's get added to the module hash, the hash would change every time you run, and forces the recompilation of both this module and any modules that reference it. This problem used to be a lot worse in older version of Warp in which every module loaded at runtime would have its hash affected by the wp.constant variables declared in the program, which ended up in a lot of unnecessary recompilation if the set of wp.constant variables was changing between runs.

Another thing that used to affect 2 was the declaration of additional kernels at runtime or inside functions. However, this issue was also addressed a few releases ago by maintaining multiple cache directories for the same module name. Previously, we would only keep a single set of files for a module in the cache directory, so if the module hash changed, we would delete the files associated with the old hash and regenerate the files for the new hash.

@shi-eric shi-eric self-assigned this Aug 7, 2024
@ricetwice
Copy link
Author

Hi @shi-eric,

Thanks for your reply! After stepping through the debugging process of the hash calculation for a module, I discovered an issue during the hash update process according to Kernels, specifically in the following lines of code:

for arg, arg_type in kernel.adj.arg_types.items():
    s = f"{arg}: {get_type_name(arg_type)}"
    ch.update(bytes(s, "utf-8"))

If one of the arguments in the kernel is an array of user-defined Structs, the string s representing the argument takes the form:

'arg: array<warp.codegen.Struct object at 0xXXXXXXX>'

The address at the end of this string changes with each run, even when the code remains unmodified. Consequently, the hash of a module with certain kernels having arguments of this type changes every run, which triggers unnecessary recompilations.

I believe this is a bug and should be addressed.

@shi-eric
Copy link
Contributor

shi-eric commented Aug 8, 2024

Thank you @ricetwice for isolating the problem! I'll update you when we have a fix.

@shi-eric shi-eric added bug Something isn't working and removed question The issue author requires information labels Aug 8, 2024
shi-eric pushed a commit that referenced this issue Aug 9, 2024
Module hashing fix for array of user struct type

Closes GH-287

See merge request omniverse/warp!665
@shi-eric
Copy link
Contributor

shi-eric commented Aug 9, 2024

Hey @ricetwice, a fix has been pushed to the main branch for this issue. Thanks again for reporting it!

@shi-eric shi-eric closed this as completed Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants