[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528

AlexJones0 · 2024-09-05T17:20:49Z

This PR partially addresses #24184 (specifically, it addresses the two tests found to fail if given long enough to run in Verilator).

Entropy is generated much more slowly in Verilator, and as a result, two tests were timing-out (and if given long enough, failing) when run in Verilator. These are sw/device/tests:entropy_src_fw_observe_many_contiguous_test_sim_verilator, and sw/device/tests:entropy_src_fw_override_test_sim_verilator. This PR conditionally lowers the test requirements for the Verilator simulation environment to allow these tests to pass on Verilator.

For fw_observe_many_contiguous_test, the number of entropy samples to be observed is reduced from 1024 to just 8, as the full 1024 would take around 13-15 hours. For fw_override_test, the entropy consumers have been limited to just random number generation, and the test is terminated after this on Verilator. The timeout given to observe this Entropy is also increased 30x on Verilator.

Note that these code changes do not modify the operation of any execution environments other than Verilator; only Verilator has had conditional changes. If this is deemed not appropriate due to requirements of these tests, then instead the Verilator environment should be removed from these two tests.

This has been tested by running the following command:

./bazelisk.sh test -t- --test_output=streamed --test_timeout=3600,3600,3600,3600 \
    //sw/device/tests:entropy_src_fw_observe_many_contiguous_test_sim_verilator 
    //sw/device/tests:entropy_src_fw_override_test_sim_verilator

which now passes as expected:

//sw/device/tests:entropy_src_fw_observe_many_contiguous_test_sim_verilator PASSED in 2807.4s
//sw/device/tests:entropy_src_fw_override_test_sim_verilator             PASSED in 3358.1s

Reduces the number of contiguous entropy samples observed when running on Verilator to just 8 (down from 1024). This is because observing the full 1024 would require increasing the timeout greatly and would mean that it would take around 13-15 hours to run the test. Without increasing the timeout, the test currently fails due to not meeting the required rate. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

pamaury

I would say that the main purpose of this test is to run on the FPGA or a real device. Running this test in verilator only makes sense for debugging, in which case it does not necessarily matter if it takes many hours to run.
Therefore it's not clear to me if this tests should be just marked as manual for verilator instead. I think @vogelpi should have a look at this since he designed the testplan for that.

AlexJones0 · 2024-09-06T13:20:56Z

That makes sense, thanks. I agree that we should wait for @vogelpi's opinion on this in that case.

Therefore it's not clear to me if this tests should be just marked as manual for verilator instead.

One small correction - in the case where we do not want to make these changes, then marking the tests as manual for Verilator will not be sufficient, as when manually executing these tests, they will fail with either Internal:["ENT",176] in the case of entropy_src_fw_observe_many_contiguous_test, or DeadlineExceeded:["ENT",367] in the case of entropy_src_fw_override_test. To retain e.g. 1024 contiguous samples, we would still need to modify the test to conditionally increase the timeout to on Verilator, or we could just remove the Verilator execution environments entirely. It otherwise doesn't make sense to me to have a test that is marked as supporting Verilator which you can run manually, but which is actually guaranteed to fail.

jwnrt · 2024-09-06T14:44:43Z

I think we should apply this change and mark manual. @pamaury is right that it’s not useful in Verilator, but if you really really have to use it to debug something then it should at least work.

AlexJones0 · 2024-09-06T15:56:51Z

I think we should apply this change and mark manual.

I agree, that sounds sensible. Regardless of whether this change is applied or not, it makes sense to mark the Verilator target as manual if the environment stays around because, based on what has been described so far, the Verilator test target only really makes sense for debugging purposes.

vogelpi

Thanks for looping me in @AlexJones0 and @pamaury !

I've left some comments on how to change the fw_override_test in a way to not reduce the scope of the test too much but still speed it up for Verilator. Would you mind giving this a try please?

You are right: this test is mainly useful for SiVal purposes (non-Simulation platforms) but having this running in Verilator is very useful for debugging. So re-enabling this is valuable and the approach taken here is valid for the Verilator simlation environment. Thanks @AlexJones0 for doing this work!

vogelpi · 2024-09-16T20:02:04Z

sw/device/tests/entropy_src_fw_override_test.c

+  kRandomNumberTimeoutUsec = 100 * 1000,
+  /**
+   * Timeout to generate a random number in micro seconds when running in a
+   * Verilator simulation environment. Verilator observes entropy more slowly
+   * than other environments, and so is given a longer timeout.
+   */
+  kVerilatorRandomNumberTimeoutUsec = 3 * 1000 * 1000,


There is a factor of 30x difference here and this strongly suggests that we probably never cared about tuning the rate of the raw noise source (inside AST) for the Verilator simulation. For context, to speed up the Verilator simulation, we use different clock frequences and ratios as well as baud rates for Verilator. See e.g. sw/device/lib/arch/device_sim_verilator.c.

If you compare these values with the values for FPGA (for which this test was initially designed), you'll note an factor of 48x or similar difference. The RTL model used on FPGA and in Verilator is the same though. I don't remember exactly at which clock the AST model is running. I think what we should really do is tuning the AST model for the Verilator simulation. Would you mind creating a GitHub issue for this please?

That makes sense, thanks - I've made that issue here: #24585

Please do mention if there's anything that I've missed capturing in that issue. I had a quick look at the RTL of the ADC as well as the device clock/ratio definitions for the Verilator and FPGA environments and I suspect that as you say it is this 48x difference in clock speeds that necessitates this increase in timeout.

Thanks for creating the issue!

vogelpi · 2024-09-16T20:10:14Z

sw/device/tests/entropy_src_fw_override_test.c

+  if (kDeviceType == kDeviceSimVerilator) {
+    // If running on Verilator then entropy is generated much more slowly.
+    // It would take an impractical amount of time to generate the thousands
+    // of words of entropy that are required by all entropy consumers, and so
+    // since we are not concerned with the rate of entropy on Verilator, we
+    // stop at just generating some random numbers only.
+    return OK_STATUS();
+  }


I think this is non-ideal because it means we skip quite a big portion of the main test function in Verilator. So the Verilator simulation becomes less useful for debugging as a big part of the design is not exercised anymore.

Another way to speed up the Verilator simulation would be to reduce the number of modes tested. E.g. for Verilator only we could change:

// Test all modes. static dif_entropy_src_single_bit_mode_t kModes[] = { kDifEntropySrcSingleBitModeDisabled, kDifEntropySrcSingleBitMode0, kDifEntropySrcSingleBitMode1, kDifEntropySrcSingleBitMode2, kDifEntropySrcSingleBitMode3, };

to

// Test all modes. static dif_entropy_src_single_bit_mode_t kModes[] = { kDifEntropySrcSingleBitModeDisabled, kDifEntropySrcSingleBitMode0, };

This would just check 2 instead of 5 modes (..Mode0 / .. ..Mode3 are almost identical and all of them are 4x slower compared to ...ModeDisabled).

Also, the final phase:

// Rerun the test with single bit mode disabled, // this time with an output delay. fw_ov_insert_wait_enabled = true; EXECUTE_TEST(test_result, firmware_override_extract_insert, kDifEntropySrcSingleBitModeDisabled, false, false); EXECUTE_TEST(test_result, firmware_override_extract_insert, kDifEntropySrcSingleBitModeDisabled, true, false);

Could be skipped for Verilator. We don't learn a lot from this.

Yes, I agree, this was one thing that I was not sure on.

Making your suggested changes (just testing the first two modes and skipping the final phase on Verilator), alongside the same timeout changes as before, causes the test to pass in 6656.2 seconds (around 1 hour 50 minutes) in an environment similar to CI. It is not clear to me whether properly tuning the AST model as suggested would change this.

This seems fine to me for debug purposes - it is far better than the 15+ hours it was taking before. Ideally we would like it to run in under an hour, but given the purpose of this test I think it makes more sense to tag the Verilator environment as manual for entropy_src_fw_override_test with a comment explaining the runtime, alongside these changes. Does this sound reasonable to you @vogelpi?

Thanks for trying this out and reporting back. Tuning the AST model would help on top of this but we don't have to do this right now.

This seems fine to me for debug purposes - it is far better than the 15+ hours it was taking before. Ideally we would like it to run in under an hour, but given the purpose of this test I think it makes more sense to tag the Verilator environment as manual for entropy_src_fw_override_test with a comment explaining the runtime, alongside these changes. Does this sound reasonable to you @vogelpi?

This would be perfect. Can you implement these changes directly in this PR?

vogelpi

LGTM, thanks @AlexJones0 !

This commit modifies the `entropy_src_fw_override_test` such that when running on Verilator, the random number consumer and the AES test utils are both given 48x larger timeouts, to account for the fact that the AST's noise source has not been tuned for a Verilator execution environment. This stops these specific tests failing on Verilator. Additionally, the requirements of the test have been conditionally reduced when executing on a Verilator simulation environment. The final phase of testing is skipped (single bit mode disabled, without an output delay) as this provides negligible useful information when running on Verilator, and only the `kDifEntropySingleBitModeDisabled` & `kDifEntropySingleBitMode0` modes are used to run the tests, as modes 1 through 3 are almost identical and all are much slower than ModeDisabled. This provides the speedup necessary for the test to succeed (given reasonable timeout) in the Verilator sim environment. Signed-off-by: Alex Jones <alex.jones@lowrisc.org>

AlexJones0 · 2024-09-19T10:40:01Z

I forgot to mark the Verilator environment for the fw_override test as manual in that push so I've just added that now - apologies!

AlexJones0 requested a review from a team as a code owner September 5, 2024 17:20

AlexJones0 requested review from HU90m, pamaury and jwnrt and removed request for a team and HU90m September 5, 2024 17:20

pamaury requested review from vogelpi and h-filali September 6, 2024 13:03

pamaury reviewed Sep 6, 2024

View reviewed changes

vogelpi reviewed Sep 16, 2024

View reviewed changes

AlexJones0 mentioned this pull request Sep 17, 2024

[entropy_src,verilator] Tune AST model of noise for Verilator simulation #24585

Open

AlexJones0 force-pushed the entropy_src_verilator_tests branch from 858d359 to fb78785 Compare September 19, 2024 10:16

vogelpi approved these changes Sep 19, 2024

View reviewed changes

AlexJones0 force-pushed the entropy_src_verilator_tests branch from fb78785 to 2719ea6 Compare September 19, 2024 10:38

AlexJones0 requested a review from pamaury September 19, 2024 11:32

AlexJones0 requested a review from engdoreis September 30, 2024 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528

[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528

AlexJones0 commented Sep 5, 2024

pamaury left a comment

AlexJones0 commented Sep 6, 2024

jwnrt commented Sep 6, 2024

AlexJones0 commented Sep 6, 2024

vogelpi left a comment

vogelpi Sep 16, 2024

AlexJones0 Sep 17, 2024

vogelpi Sep 19, 2024

vogelpi Sep 16, 2024

AlexJones0 Sep 17, 2024

vogelpi Sep 19, 2024

vogelpi left a comment

AlexJones0 commented Sep 19, 2024

[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528

Are you sure you want to change the base?

[entropy_src,verilator] Reduce entropy tests requirements on Verilator to fix failing test targets #24528

Conversation

AlexJones0 commented Sep 5, 2024

pamaury left a comment

Choose a reason for hiding this comment

AlexJones0 commented Sep 6, 2024

jwnrt commented Sep 6, 2024

AlexJones0 commented Sep 6, 2024

vogelpi left a comment

Choose a reason for hiding this comment

vogelpi Sep 16, 2024

Choose a reason for hiding this comment

AlexJones0 Sep 17, 2024

Choose a reason for hiding this comment

vogelpi Sep 19, 2024

Choose a reason for hiding this comment

vogelpi Sep 16, 2024

Choose a reason for hiding this comment

AlexJones0 Sep 17, 2024

Choose a reason for hiding this comment

vogelpi Sep 19, 2024

Choose a reason for hiding this comment

vogelpi left a comment

Choose a reason for hiding this comment

AlexJones0 commented Sep 19, 2024