-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nnet1: improving the GPU diagnostics, #1532
nnet1: improving the GPU diagnostics, #1532
Conversation
KarelVesely84
commented
Apr 4, 2017
- we auto-detect the 'compute capability' problems (these appear as the 'invalid device function'),
- we also provide guidelines what to try before posting to forum, and which info to send to us,
I successfuly tested this in 3 situations:
All 3 situations were detected and handled correctly. |
src/nnetbin/cuda-gpu-available.cc
Outdated
@@ -41,6 +53,34 @@ int main(int argc, char *argv[]) try { | |||
CuDevice::Instantiate().SelectGpuId("yes"); | |||
std::cerr | |||
<< "### HURRAY, WE GOT A CUDA GPU FOR COMPUTATION!!! ###" | |||
<< std::endl << std::endl; | |||
std::cerr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of std::cerr here.
Can you please use KALDI_LOG or KALDI_WARN or KALDI_ERR at least some of the time, so that
the version-number info and the name of the binary will be printed?
It's fine with multi-line error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, okay, I will add there a KALDI_LOG at the very beginning.
(Otherwise I am consistently using 'std::cerr', which is necessary in the except() { ... }
part. I remember from lessons, that it is better not to mix the 'fprintf' with the output to std::cout std::cerr and 'fprintf' is used in KALDI_LOG, KALDI_WARN and KALDI_ERR)
Generally speaking the policy is never to use std::cerr at all, and always
use one of the 3 macros. They do work with multi-line output.
…On Tue, Apr 4, 2017 at 11:31 AM, Karel Vesely ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/nnetbin/cuda-gpu-available.cc
<#1532 (comment)>:
> @@ -41,6 +53,34 @@ int main(int argc, char *argv[]) try {
CuDevice::Instantiate().SelectGpuId("yes");
std::cerr
<< "### HURRAY, WE GOT A CUDA GPU FOR COMPUTATION!!! ###"
+ << std::endl << std::endl;
+ std::cerr
Hi, okay, I will add there a KALDI_LOG at the very beginning.
(Otherwise I am consistently using 'std::cerr', which is necessary in the except()
{ ... } part. I remember from lessons, that it is better not to mix the
'fprintf' with the output to std::cout std::cerr and 'fprintf' is used in
KALDI_LOG, KALDI_WARN and KALDI_ERR)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1532 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu6AwAiNHI9_nVEcpxESB2eSuEvnkks5rsoxvgaJpZM4MzMX2>
.
|
c2667d7
to
3130b83
Compare
Yes, I know, the 3 kaldi macros are used usually. Here it is really an exceptional case. We need to be able to print the guideline messages in the Similarly, at the end of every binary we print to
I know it does not follow the usual policy, but in some sense it is consistent for this particular binary to print to |
OK, well I'm not going to insist, but I want to point out that it's only
KALDI_ERR that throws an exception, the others don't.
…On Tue, Apr 4, 2017 at 12:00 PM, Karel Vesely ***@***.***> wrote:
Yes, I know, the 3 kaldi macros are used usually.
Here it is really an exceptional case. We need to be able to print the *guideline
messages* in the except (...) { ... } parts, in which we cannot use the
standard kaldi macros (that would throw an exception from an exception
handler).
Similarly, at the end of every binary we print to std::cerr from the
exception handler:
/src$ tail -n7 gmmbin/gmm-est.cc
} catch(const std::exception &e) {
std::cerr << e.what() << '\n';
return -1;
}
}
I know it does not follow the usual policy, but in some sense it is
consistent for this particular binary to print to std::cerr, if we need
to print from the except (...) { ... } parts. This is really only *'the
single case'* where it is like that...
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1532 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu9mSrLblXOKW0cUEpW76TE12ixxFks5rspNogaJpZM4MzMX2>
.
|
That is true, only KALDI_ERR throws exception. Actually, now I have notice that the Well, I could rewrite it a bit... But, on the other hand in this single case I like the messages uncluttered with the macro strings, because they will always appear directly in the main training log. You can compare the readability of the two variants of log-prints: Current:
With kaldi macros:
Is the 2nd really more readable? If you think so, I will change it... |
It's not really about readability, it's about having enough information
that when someone pastes the error message into kaldi-help, we have enough
information to diagnose the problem without having to go back and forth
with them asking questions.
But it's not that big of a deal, I'd merge it anyway.
…On Tue, Apr 4, 2017 at 12:18 PM, Karel Vesely ***@***.***> wrote:
Reopened #1532 <#1532>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1532 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1OKDK5fmuIJAVdiZ5Q1vYefC_IGks5rspecgaJpZM4MzMX2>
.
|
Okay, I'll do one more update :) |
3130b83
to
6196df7
Compare
Hm, it seems that Travis failed on 'logistic-regression-test'.
For me, locally, it passed... Any idea what might be the problem? |
- we auto-detect the 'compute capability' problems (these appear as the 'invalid device function'), - we also provide guidelines what to try before posting to forum, and which info to send to us,
6196df7
to
39b96be
Compare
Good! Now the Travis test passed! |
- we auto-detect the 'compute capability' problems (these appear as the 'invalid device function'), - we also provide guidelines what to try before posting to forum, and which info to send to us,
- we auto-detect the 'compute capability' problems (these appear as the 'invalid device function'), - we also provide guidelines what to try before posting to forum, and which info to send to us,