-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528
base: master
Are you sure you want to change the base?
Conversation
Reduces the number of contiguous entropy samples observed when running on Verilator to just 8 (down from 1024). This is because observing the full 1024 would require increasing the timeout greatly and would mean that it would take around 13-15 hours to run the test. Without increasing the timeout, the test currently fails due to not meeting the required rate. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that the main purpose of this test is to run on the FPGA or a real device. Running this test in verilator only makes sense for debugging, in which case it does not necessarily matter if it takes many hours to run.
Therefore it's not clear to me if this tests should be just marked as manual
for verilator instead. I think @vogelpi should have a look at this since he designed the testplan for that.
That makes sense, thanks. I agree that we should wait for @vogelpi's opinion on this in that case.
One small correction - in the case where we do not want to make these changes, then marking the tests as |
I think we should apply this change and mark manual. @pamaury is right that it’s not useful in Verilator, but if you really really have to use it to debug something then it should at least work. |
I agree, that sounds sensible. Regardless of whether this change is applied or not, it makes sense to mark the Verilator target as manual if the environment stays around because, based on what has been described so far, the Verilator test target only really makes sense for debugging purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looping me in @AlexJones0 and @pamaury !
I've left some comments on how to change the fw_override_test in a way to not reduce the scope of the test too much but still speed it up for Verilator. Would you mind giving this a try please?
You are right: this test is mainly useful for SiVal purposes (non-Simulation platforms) but having this running in Verilator is very useful for debugging. So re-enabling this is valuable and the approach taken here is valid for the Verilator simlation environment. Thanks @AlexJones0 for doing this work!
kRandomNumberTimeoutUsec = 100 * 1000, | ||
/** | ||
* Timeout to generate a random number in micro seconds when running in a | ||
* Verilator simulation environment. Verilator observes entropy more slowly | ||
* than other environments, and so is given a longer timeout. | ||
*/ | ||
kVerilatorRandomNumberTimeoutUsec = 3 * 1000 * 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a factor of 30x difference here and this strongly suggests that we probably never cared about tuning the rate of the raw noise source (inside AST) for the Verilator simulation. For context, to speed up the Verilator simulation, we use different clock frequences and ratios as well as baud rates for Verilator. See e.g. sw/device/lib/arch/device_sim_verilator.c.
If you compare these values with the values for FPGA (for which this test was initially designed), you'll note an factor of 48x or similar difference. The RTL model used on FPGA and in Verilator is the same though. I don't remember exactly at which clock the AST model is running. I think what we should really do is tuning the AST model for the Verilator simulation. Would you mind creating a GitHub issue for this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks - I've made that issue here: #24585
Please do mention if there's anything that I've missed capturing in that issue. I had a quick look at the RTL of the ADC as well as the device clock/ratio definitions for the Verilator and FPGA environments and I suspect that as you say it is this 48x difference in clock speeds that necessitates this increase in timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating the issue!
if (kDeviceType == kDeviceSimVerilator) { | ||
// If running on Verilator then entropy is generated much more slowly. | ||
// It would take an impractical amount of time to generate the thousands | ||
// of words of entropy that are required by all entropy consumers, and so | ||
// since we are not concerned with the rate of entropy on Verilator, we | ||
// stop at just generating some random numbers only. | ||
return OK_STATUS(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is non-ideal because it means we skip quite a big portion of the main test function in Verilator. So the Verilator simulation becomes less useful for debugging as a big part of the design is not exercised anymore.
Another way to speed up the Verilator simulation would be to reduce the number of modes tested. E.g. for Verilator only we could change:
// Test all modes.
static dif_entropy_src_single_bit_mode_t kModes[] = {
kDifEntropySrcSingleBitModeDisabled, kDifEntropySrcSingleBitMode0,
kDifEntropySrcSingleBitMode1, kDifEntropySrcSingleBitMode2,
kDifEntropySrcSingleBitMode3,
};
to
// Test all modes.
static dif_entropy_src_single_bit_mode_t kModes[] = {
kDifEntropySrcSingleBitModeDisabled, kDifEntropySrcSingleBitMode0,
};
This would just check 2 instead of 5 modes (..Mode0 / .. ..Mode3 are almost identical and all of them are 4x slower compared to ...ModeDisabled).
Also, the final phase:
// Rerun the test with single bit mode disabled,
// this time with an output delay.
fw_ov_insert_wait_enabled = true;
EXECUTE_TEST(test_result, firmware_override_extract_insert,
kDifEntropySrcSingleBitModeDisabled, false, false);
EXECUTE_TEST(test_result, firmware_override_extract_insert,
kDifEntropySrcSingleBitModeDisabled, true, false);
Could be skipped for Verilator. We don't learn a lot from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree, this was one thing that I was not sure on.
Making your suggested changes (just testing the first two modes and skipping the final phase on Verilator), alongside the same timeout changes as before, causes the test to pass in 6656.2 seconds (around 1 hour 50 minutes) in an environment similar to CI. It is not clear to me whether properly tuning the AST model as suggested would change this.
This seems fine to me for debug purposes - it is far better than the 15+ hours it was taking before. Ideally we would like it to run in under an hour, but given the purpose of this test I think it makes more sense to tag the Verilator environment as manual for entropy_src_fw_override_test
with a comment explaining the runtime, alongside these changes. Does this sound reasonable to you @vogelpi?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying this out and reporting back. Tuning the AST model would help on top of this but we don't have to do this right now.
This seems fine to me for debug purposes - it is far better than the 15+ hours it was taking before. Ideally we would like it to run in under an hour, but given the purpose of this test I think it makes more sense to tag the Verilator environment as manual for entropy_src_fw_override_test with a comment explaining the runtime, alongside these changes. Does this sound reasonable to you @vogelpi?
This would be perfect. Can you implement these changes directly in this PR?
858d359
to
fb78785
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @AlexJones0 !
This commit modifies the `entropy_src_fw_override_test` such that when running on Verilator, the random number consumer and the AES test utils are both given 48x larger timeouts, to account for the fact that the AST's noise source has not been tuned for a Verilator execution environment. This stops these specific tests failing on Verilator. Additionally, the requirements of the test have been conditionally reduced when executing on a Verilator simulation environment. The final phase of testing is skipped (single bit mode disabled, without an output delay) as this provides negligible useful information when running on Verilator, and only the `kDifEntropySingleBitModeDisabled` & `kDifEntropySingleBitMode0` modes are used to run the tests, as modes 1 through 3 are almost identical and all are much slower than ModeDisabled. This provides the speedup necessary for the test to succeed (given reasonable timeout) in the Verilator sim environment. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>
fb78785
to
2719ea6
Compare
I forgot to mark the Verilator environment for the |
This PR partially addresses #24184 (specifically, it addresses the two tests found to fail if given long enough to run in Verilator).
Entropy is generated much more slowly in Verilator, and as a result, two tests were timing-out (and if given long enough, failing) when run in Verilator. These are
sw/device/tests:entropy_src_fw_observe_many_contiguous_test_sim_verilator
, andsw/device/tests:entropy_src_fw_override_test_sim_verilator
. This PR conditionally lowers the test requirements for the Verilator simulation environment to allow these tests to pass on Verilator.For
fw_observe_many_contiguous_test
, the number of entropy samples to be observed is reduced from 1024 to just 8, as the full 1024 would take around 13-15 hours. Forfw_override_test
, the entropy consumers have been limited to just random number generation, and the test is terminated after this on Verilator. The timeout given to observe this Entropy is also increased 30x on Verilator.Note that these code changes do not modify the operation of any execution environments other than Verilator; only Verilator has had conditional changes. If this is deemed not appropriate due to requirements of these tests, then instead the Verilator environment should be removed from these two tests.
This has been tested by running the following command:
./bazelisk.sh test -t- --test_output=streamed --test_timeout=3600,3600,3600,3600 \ //sw/device/tests:entropy_src_fw_observe_many_contiguous_test_sim_verilator //sw/device/tests:entropy_src_fw_override_test_sim_verilator
which now passes as expected: