-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Conversation
Seems this found a bug under For all the tests, the following pattern asserts:
However, the following does not:
Assertion is:
JitDump: |
For reference: The above only happens for However, I have |
Looks like it might be a combination of inlining along with us returning a |
{ | ||
var test = new Test(); | ||
|
||
if (test.IsSupported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split of tests into different calling patterns is very good, however, it could be more convenient to use Func<T> or similar to pass tests into main. This would allow for greater flexibility, since individual test would not have to implement all functions but only pass those which are relevant for them and could implement even not predefined test functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current functions represent the functionality that all tests should probably implement, in the most generic sense.
They ensure that the tests are functionally correct (for the various modes required) and that any new codegen works as intended.
There may be some tests which differ slightly, which is the purpose of the ValidateOther
. For tests that differ more significantly, a more custom Main method is fine and would be expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current functions represent the functionality that all tests should probably implement
Yes it is most probably correct but my intention was to suggest using more flexible pattern in discovering and running tests what would lift limitations on test implementation and decouple Program
from Test
. For instance every test method should implement Func<bool>[] GetTests()
method which will return set of tests which will be run by Main
function and some of the tests could be mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And only IsSupported
should be implemented directly by every test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance every test method should implement Func[] GetTests() method
Fair enough. We should probably wait and see if moving to use a proper unit testing is viable, however (there may be some actual reason why not for CoreCLR)
|
||
namespace HardwareIntrinsicTest | ||
{ | ||
public unsafe struct TestTable<T> : IDisposable where T : struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestTable in current form is very deficient for testing more complex operations. As a requirements I would propose:
- Support all input data formats for all HW intrinsic operations - Vector128, Vector256, numerical types, an even strings for SSE42.
- Store output of intrinsic under test as well as output of verification function alongside test data.
- Allow for priniting to output of all collected test data for detailed verification.
Since we may need to have TestTable
with 4 or even 5 data arrays it could be necessary to create set of structs
or just switch to class
and extend it via inheritance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some local work here that I haven't pushed yet.
This was basically renamed to SimpleTestTable
and is meant for tests which take expect the same operation to be performed for each input. There are tests in other categories that would require their own new TestTables
.
For example, some tests only operate on a single input, some operate on strings or scalar values, and some (like Shuffle
or Unpack
) require checking inArray1
, inArray2
, and outArray
as a whole, rather than an element at time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some local work as well - I hope I can push it soon. It has all this difficult operations tested with several input - ouput combinations.
|
||
private TestTable<ulong> _testTable; | ||
|
||
public Test() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I find very problematic here is the very limited test scope. Testing single operation on single Vector random elements just scratches the surface. In general Vector operations are fast and it is feasible to run 10s, 100s or even 1000s of them without any significant time penalty.
Furthermore, we should test for all known problematic data points for given data format i.e. for floats we should have predefined problematic data comprising +0.0, -0.0, data close to minimum normal value, subnormal values, 1.0, -1.0 NaN, +infinity and -infinity, set of data around 0.0 and around 1.0 combined into operations with other large randomly generated data set.
Another requirement related to test function would be saving of all input data, output from function under test and output from verification function for later printout when test will fail to provide good diagnostic data.
All tests should run up to the end and only after that errors should be reported for all faulting operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing single operation on single Vector random elements just scratches the surface.
These are meant to be unit tests, not integration or end-to-end tests. We should be writing other test types (integration, end-to-end, perf, etc) separately and only ensuring that the basic functionality is covered by these tests in particular.
Furthermore, we should test for all known problematic data points for given data format i.e. for floats we should have predefined problematic data comprising
I don't agree.
We are not coding complex algorithms, instead we are taking a method and (in the 95% of cases) generating a single hardware instruction. The vast majority of the complexity comes in the form of containment analysis and ensuring we do the right thing when hardware isn't supported or we are called indirectly.
Provided we are emitting the correct instruction and doing nothing else with the data (which is validated by the basic scenario) we can safely rely on the fact that Intel, AMD, and other hardware manufacturers have likely already done more extensive testing to ensure the underlying hardware instruction is functionally correct.
That beind said, there are few exceptions. These show up in the form of helper functions (Set, SetAll, etc) and for methods which require more extensive codegen (such as CompareEqualUnorderedScalar
, which does a comparison, a couple of flag checks, and returns a bool).
Another requirement related to test function would be saving of all input data, output from function under test and output from verification function for later printout when test will fail to provide good diagnostic data.
The way the current code is written, each test will print its failure information (which includes inputs and outputs) and all other tests will continue executing (short of an Exception being thrown or a JIT assert occurring).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can safely rely on the fact that Intel, AMD, and other hardware manufacturers have likely already done more extensive testing to ensure the underlying hardware instruction is functionally correct
It is not my intention to test function correctness, however, trusting that HW manufacturers are doing their job well is OK but checking it is even better. My intention is to execute all corner cases which may or may not rise exceptions and test how we handle it. I am certain there could be some surprises ahead in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have likely already done more extensive testing
This is not a safe assumption to make. You should assume that all testing that was done is using the tests committed in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should assume that all testing that was done is using the tests committed in the repo.
My point was that we are providing a contract to the consumer of our API. That contract (for the majority of the hardware intrinsics) is the following:
- We will emit a particular instruction
- We will pass the inputs, as given
- We will return the output, as given
Our contract doesn't (and shouldn't) be giving any guarantees about the correctness of the underlying instruction.
We should be validating our contract (which is what the current tests are doing). Ensuring that special inputs for which the underlying instruction may behave differently is not something that our contract covers, as it is an implementation detail of the instruction we said we were going to emit.
|
||
namespace HardwareIntrinsicTest | ||
{ | ||
public sealed unsafe class Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to split operation tests into separate test for each operation - data type combination. This would really explode number of projects used to build and run the tests multiplying project files without significant gain in test content. From my checks building tests in coreclr consists from tiny actual code file build time and huge MSBuild overhead. It is even more pronounced during test run, where core logic test run is a minuscule portion of the whole test running time.
Splitting larger tests into tiny ones will only exacerbate these problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to split operation tests into separate test for each operation - data type combination. This would really explode number of projects used to build and run the tests multiplying project files without significant gain in test content. From my checks building tests in coreclr consists from tiny actual code file build time and huge MSBuild overhead. It is even more pronounced during test run, where core logic test run is a minuscule portion of the whole test running time.
Ideally we would be using an actual unit test framework (such as xUnit or NUnit), there would be a project per ISA (rather than a project per method or per overload), a class per method or overload, and a Fact per scenario.
However, I've not seen many other tests in mscorlib (outside of perf tests) that support this.
It might be worth changing this however. @jkotas, thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test frameworks work great for testing framework APIs where it does not really matter how you call the API and it is fine to abstract it away using XUnit theories, etc.
They do not work well for testing codegen where you need the code calling the API to look in a particular way. (Also, the codegen/runtime bugs are pain to debug with unit testing framework loaded in the process since it introduces a lot of noise.)
I think you may want to look at autogenerating the test sources for testing the intrinsics.
Cherry-picked the fix for the inliner assertion here: #15772 |
src/jit/hwintrinsicxarch.cpp
Outdated
// (gtIsRecursiveCall == false) as the gtNewMustThrowException node can mess with the | ||
// inliner in unexpected ways. | ||
|
||
if (gtIsRecursiveCall(method)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting back to my comment on your last PR, now that we have a use case -- I'd prefer it if this "recursive call ==> must expand" idiom was confined to the importer; you should pass mustExpand
down in here to figure out whether bailing out is acceptable behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Also see #15772, which has the changes standalone.
Updated to use a T4 text template for generating the tests. Currently just a We can add other templates based on test needs. Working on a change to add JitLog checking to validate the calls are expanded as intrinsics where appropriate, but that can always go in another PR as well. |
test Windows_NT x64 Checked jitincompletehwintrinsic |
The two job failures are unrelated to this change (which only touches HWIntrinsic test code). I've requeued the jobs. Failure was: baseservices_threading/mutex_misc_waitone2_waitone2 |
Question. Shall we extract |
I think we can now that I have pulled the validation logic out of it (now it just represents three pinned arrays and should be reusable almost everywhere). The place where it isn't still usable is for the scalar tests, which read/write a single value from memory. |
FYI. @CarolEidt as well. |
Just digged out important issue which got folded by github |
Another paleontological discovery hidden by github: |
My response to this is still the same. The hardware intrinsic contract (for all non-helper intrinsics) is that we will:
Any particulars about how the underlying hardware operates on any given data is an implementation detail of the hardware instruction and is not something we need to or should be testing. Provided we are passing in the data unmodified, then it doesn't matter if it is QNaN, SNaN, 0, -0, +Inf, -Inf, denormal, rational, irrational, positive, negative, etc. |
It would be an entirely different story if we were providing a software fallback. But we are not (and even in the case of the compiler fallback, we are still meeting the original contract requirements stated). |
This test pattern looks good to me. Is there anybody can approve and merge this PR? |
Do we care about test efficiency? If yes pls comment on my previous question. |
I did not see any problem with the " test efficiency" yet. |
@fiigii, The "issue" @4creators is referring to is the large amount of time it takes to compile all the tests in the framework. I currently have the tests split out into one project per method instance (so However, I don't know if this is a real concern. Once tests have been compiled, you can skip building tests again, and when modifying or adding new tests, it is very easy to build just the modified/new projects. Additionally, I believe this split helps break the tests out into logical, readable, and debuggable components (and makes it much simpler to use with the T4 text templates). |
It might be possible to modify the template to be able to produce multiple outputs, in which case we would be back to a single project per method (i.e. just However, I'm not sure its worth the additional effort to do (and if it is, I think it can be on the backlog of "nice to have"). |
Ok, new changes are up. Overall the tests converted are the same, but now it uses the |
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic |
Thanks for the work. So, is this script executed by the coreclr test build to automatically generate test cases? Or we need to manually run it? Can we push this PR to get merged as soon as possible? There are more and more intrinsic PRs coming, so we should have a criteriron for HW intrinsic tests to avoid refactoring tests again and again... |
At present, it needs to be manually run after you update it with any new input data. |
Good to know. Current test-template looks pretty good, and I think we can merge this PR at first, and extending the template later to support more tests (e.g., JIT-log) is easy. cc @CarolEidt |
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The all generated tests (and therefore the template) should contain a comment noting that the test is generated and should not be manually edited. It should also contain instructions on how to regenerate.
@jashook - could you have a look at this? I believe that the use of a single project file per ISA avoids the excessive test build time that I was concerned about, but I'd like your input on this. This looks good to me overall, but I think there is a need to add some minimal documentation. This is all pretty self-explanatory, I think, but there needs to be a way to guide a JIT/coreclr developer to find the right files. Probably a brief document under https://github.com/dotnet/coreclr/tree/master/Documentation/project-docs? |
…ic documentation.
Updated the generated files to indicate that they are automatically generated from the script and updated the script with basic instructions for how to run or add new tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
@@ -2,6 +2,13 @@ | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
// See the LICENSE file in the project root for more information. | |||
|
|||
/****************************************************************************** | |||
* This file is auto-generated from a template file by the GenerateTests.csx * | |||
* script in tests\src\JIT\HardwareIntrinsics\X86\Shared. In order to make * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent - great place to put the pointer to the script!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
}} | ||
else | ||
{{ | ||
for (var i = 1; i < left.Length; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this for
loop to enable the template for more intrinsics (e.g., HorizontalAdd
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean just check each element individually, rather than looping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would make the template more general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to log an issue to track this (as part of porting the remaining tests to use the template as well).
We probably want to refactor it such that we don't have to duplicate a bunch of table data for the regular intrinsics (where each check will be the same).
Windows_NT x86 Release Innerloop Build and Test passed, but hasn't updated the icon. |
ping @jashook - do you think you'll find some time to review this soon? |
@CarolEidt sorry missed this. Taking a look now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It is worth noting the new tests will have to be added to our lstFile to run them in automation. My guess is that file is quite out of date already.
So, is it the case that newly-added tests are never run in automation? Does that mean ci, or ...? |
I'm fairly certain they are being run in CI, it is certainly failing the builds in a number of cases 😄 |
That is the case only for arm/arm64. |
Thanks @jashook ! |
I've updated the x86 hardware intrinsics to share some base code and to follow a template that ensures we get fairly consistent coverage.
Notable, The
TestTable
andProgram.cs
are shared.All tests will be expected to have the following tests:
At the very least, the new pattern is very "templatable" and is easy to copy and then do a search/replace on to get working for new tests