-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] RMM log file name contains .dev0 extension when GPU device used is not 0 #721
Comments
This makes what otherwise would be device 1 to appear as device 0, so what you've described is expected behavior. |
Ah, okay, thanks @jrhemstad . That explains what I was about to add to the issue - that the contents of the log file are correct even though the log file extension seemed wrong. Using my example above, is it possible to have RMM create the extension as |
I honestly don't know where the Normally you could override the log file by setting |
This line seems relevant rmm/python/rmm/_lib/memory_resource.pyx Line 373 in 3b4a555
|
@jrhemstad one question re: your comment, so if I set |
Yes. |
Yeah Dask-CUDA uses this same trick actually |
I wonder if users could specify where this info is injected. IOW if a user provides the filename |
@jakirkham - I like that suggestion; what should we do if they don't specify that though? e.g., if they specify |
Raise a |
I ultimately need to know the final filename to look for. I know GPU ID because the user told my application about it, so is there a way to map the GPU ID the user is aware of and using (ie. the one they may have set in
FYI I personally don't need that level of customization, and I might just prefer a documented naming behavior such as what's in place now. If this is better for others though then I'm in favor. |
Another option that would work for our application would be to just have |
On second thoughts, maybe using a UUID just makes things less user-friendly? From an end-user standpoint, device ID |
I think that depends. Some users who set Perhaps we need to log information about the GPU into the header of the log file so that the reader can unambiguously interpret it. |
I see your point. But environment variables like
+1 |
cc @charlesbluca (in case you have thoughts here after having implemented RMM logging support in Dask-CUDA 🙂) |
Looking at that file a year later and you aren't likely to have access to the same machine or configuration. I think you really have to log the configuration with the log files if you want to be able to reconstruct. |
OK maybe 1 year was an exaggeration - apologies :) But even a week from now, it can be difficult -- especially if you're in some sort of shared computing environment where Anyway - after discussing more with @rlratzel, we decided to:
A mapping is returned instead of a list as (1) it gives us more flexibility as to how to name the output files, (2) when initializing RMM, the user can specify devices in any arbitrary order (e.g., |
Not a bug and documentation was updated in #722 to address this. |
When
rmm.enable_logging()
is passed alog_file_name
, adev0
extension is always used even when the device in use is not device0
.This reproducer demonstrates the issue:
output:
nvidia-smi
output:The process in question here is 17629 using GPU 1.
The expected behavior is to create a logfile with an extension matching the GPU in use, which in the example above would be:
/tmp/rmmlog.dev1.csv
NOTE: Just in case this is related, this demo was run in a container (hence the need to check
nvidia-smi
on the host) with multiple GPUs exposed from the host machine. SettingCUDA_VISIBLE_DEVICES=0
in the container shows the process running on GPU 0 on the host as expected.The text was updated successfully, but these errors were encountered: