Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][mono] Basic SIMD support for System.Numerics.Vector2 on arm64 #91455

Closed
wants to merge 92 commits into from

Conversation

matouskozak
Copy link
Member

@matouskozak matouskozak commented Sep 1, 2023

This PR adds basic SIMD support for System.Numerics.Vector2 on arm64. Equaling the current support present for System.Numerics.Vector4.

Current SIMD support for Vector2 with mini/llvm:

  • SN_ctor
  • SN_Abs
  • SN_Add
  • SN_Clamp
  • SN_Divide (currently disabled Vector2 / float scenario, will enable in the next PR)
  • SN_Dot
  • SN_Max
  • SN_Min
  • SN_Multiply
  • SN_Negate
  • SN_SquareRoot
  • SN_Subtract
  • SN_get_Item
  • SN_get_One
  • SN_get_UnitX
  • SN_get_UnitY
  • SN_get_Zero
  • SN_op_Addition
  • SN_op_Division
  • SN_op_Equality
  • SN_op_Inequality
  • SN_op_Multiply
  • SN_op_Subtraction
  • SN_op_UnaryNegation
  • SN_set_Item

Future work on the missing intrinsic is tracked here #91394.
Contributes to: #73462


p.s. These getters currently use 128-bit code paths for emitting const values (emit_xconst_v128) even for Vector2 (64-bit vector):

  • SN_get_UnitX
  • SN_get_UnitY
  • SN_get_One

While it looks like the generated code is okay locally, it could potentially cause problems in mono_arch_emit_exceptions or in subsequent exception handling

} else if (ji->type == MONO_PATCH_INFO_X128) {
size += 16 + 15; /* sizeof (Vector128<T>) + alignment */
}

@fanyang-mono @jandupej Any thoughts on that? Is dedicated 64-bit const value emit worth implementing?

@matouskozak
Copy link
Member Author

matouskozak commented Sep 1, 2023

More paths where 128-bit vector specific code is used detected (e.g. OP_XZERO). Changing to WIP and I'll investigate further.

@matouskozak matouskozak changed the title [mono] Basic SIMD support for System.Numerics.Vector2 on arm64 [WIP][mono] Basic SIMD support for System.Numerics.Vector2 on arm64 Sep 1, 2023
@jandupej
Copy link
Contributor

jandupej commented Sep 1, 2023

These getters currently use 128-bit code paths for emitting const values...

You can use a fmov to flood the lower two floats with 1.0f. This gives you the fastest SN_get_One possible (there is a 64-bit variant of this, with q=0). To make SN_get_UnitX/Y you can shift the vector left or right as doubles by 32. Zeros are shifted in, so this will give you a (0.0f, 1.0f) or reverse. This will destroy the upper 64 bits of the register, but it shouldn't be a problem as only the lower 64 bits are of importance.

ViktorHofer and others added 16 commits September 1, 2023 11:23
The ILLink ProjectReference output shouldn't be copied to consuming projects.

From the docs:
> Private	Optional boolean. Specifies whether the reference should be copied to the output folder. This attribute matches the Copy Local property of the reference that's in the Visual Studio IDE.
* [ARM64] Add g_GCShadowEnd to JIT_WriteBarrier_Table

This change moves address of g_GCShadowEnd to JIT_WriteBarrier_Table like
others variables used in Write Barrier.

This fix simmilar to RISC-V one dotnet#90036

* [ARM64] Move GCShadow vars to the end of the wbs block

* Update src/coreclr/vm/arm64/asmhelpers.asm

---------

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
Co-authored-by: pavelsavara <pavel.savara@gmail.com>
* Remove winternl.h

* Remove winver.h

* Remove verrsrc.h

* Remove dbghelp.h

* Remove conio.h

* Remove io.h

* FILECFAbsoluteTimeToFileTime

* Delete some unused header files

* More unused structs and macros

* Update fxver.h
Adds a default configuration that comes with all the dependencies preinstalled, but not prebuilt. This reduces confusion if you don't use the "New with options..." dropdown when creating the Codespace since you'd just get a standard Ubuntu container then.

Also install a specific version of dotnet-serve.
* Introduce a switch to use FullOpts for cctors

* Remove the obsolete policy around cctors

* Revert changes around the new switch
…quals` (dotnet#91461)

* Add missing comparisons

* Add unit tests
* workloads: Skip updating the targeting pack

This is required because the runtime pack is still 8.0, but updating the
targeting pack from the local build would make it 9.0, thus causing this
error for non-wasm apps:

```
Unhandled exception. System.IO.FileNotFoundException: Could not load file or assembly 'System.Runtime, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.

File name: 'System.Runtime, Version=9.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
```

* [wasm] makefile cleanup

* [wasm] Enable non-wasm build tests
Uses the code sharing infra added in
24e5bce. This shares most of the
logic that was easily shareable in the `TestCasesRunner`
directory.
jkotas and others added 27 commits September 5, 2023 11:10
…et#91300)

Fix a long standing TODO. Formatting of assertion failures and fail-fasts in native AOT matches regular CoreCLR with this change.

```csharp
Debug.Assert(false, "Something is not right.");
```

Before:
```
Unhandled Exception: System.Diagnostics.DebugProvider+DebugAssertException: Something is not right.
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x49
   at System.Diagnostics.Debug.Fail(String, String) + 0x5a
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x2d
   at System.Diagnostics.Debug.Assert(Boolean, String) + 0x28
   at Program.<Main>$(String[] args) + 0x1d
   at test!<BaseAddress>+0x1d3a67
   at test!<BaseAddress>+0x1d3af5
```

After:
```
Process terminated. Assertion failed.
Something is not right.
   at System.Diagnostics.DebugProvider.Fail(String, String) + 0x49
   at System.Diagnostics.Debug.Fail(String, String) + 0x5a
   at System.Diagnostics.Debug.Assert(Boolean, String, String) + 0x2d
   at System.Diagnostics.Debug.Assert(Boolean, String) + 0x28
   at Program.<Main>$(String[] args) + 0x1d
   at repro!<BaseAddress>+0x1d51c7
   at repro!<BaseAddress>+0x1d5255
```

```csharp
Environment.FailFast("Fatal error!");
```

Before:
```
Process terminated. Fatal error!
```

After:
```
Process terminated. Fatal error!
   at System.RuntimeExceptionHelpers.FailFast(String, Exception, String, RhFailFastReason, IntPtr, IntPtr) + 0x335
   at System.Environment.FailFast(String) + 0x35
   at Program.<Main>$(String[] args) + 0x1b
   at repro!<BaseAddress>+0x1d40c7
   at repro!<BaseAddress>+0x1d4155
```
* [RISC-V] Generate profiling function callbacks

Initial implementation based on ARM64 code.

* [RISC-V] Fix asm stub for calling profiler callbacks

* Fix argument registers according to RISC-V calling convention
* Fix field offsets for PROFILE_PLATFORM_SPECIFIC_DATA
* Make sure field offsets for PROFILE_PLATFORM_SPECIFIC_DATA stay fixed by static asserting the offsets in asmconstants.h

* [RISC-V] Pass arguments for Profile(Enter|Leave|Tailcall)Naked stubs in t0 and t1 because t2 is used to store the call address of the stub

* [RISC-V] Remove unimplemented definition of EmitRet

* [RISC-V] Copy struct from registers into a buffer so that profiler can see whole struct arguments laid out in memory

* [RISC-V] Implement ProfileArgIterator::GetReturnBufferAddr()

Since the RISC-V ABI says values are returned like the first named argument, re-use the struct copying routine from argument parsing as much as possible.

* Factor out duplicated test results checking routine in SlowPathELTProfiler::Shutdown()

* [RISC-V] Clean up PROFILE_PLATFORM_SPECIFIC_DATA

* Remove unused t0 field
* Remove 'unused' field and widen 'flags' to 64 bits to maintain alignment and shave off one sw instruction

* [RISC-V] Remove commented out code

* [RISC-V] Fix formatting

* [RISC-V] Apply format patch from failed check

* [RISC-V] Post-review fixes
Fixes dotnet#91249

We originally fixed the warnings with dotnet#69236 but due to `add_compile_options()` calls in zlib.cmake _all_ of the mono native libs - not just the zlib components - were being built with C4244 disabled and a few couple more leaked in.

This fixes the occurrences and reworks the .cmake files a bit to hopefully prevent this in the future.
…ject that is built outside of test runs (dotnet#91477)

- Make a installer/tests/Assets/Projects directory that configures projects to build using the live targeting packs
  - Move `AppWithSubDirs` under this directory
- Put test output under `artifacts/tests/host/<os>.<arch>.<config>` instead of `artifacts/tests/<config>` (better aligned with coreclr test output structure)
- Add a `host.pretest` subset for test assets needed for host tests
  - Currently it just builds the one test asset project, but the intention is that more projects can be moved such that they are built outside of test runs. Setting up the shared framework directory could also be part of this.
- Add `SingleFileTestApp` helper that uses output from built test asset projects:
  - Lays out a directory for the non-bundled app
  - Creates bundles based on that directory and the (live) shared framework
    - This should better model the product construction than existing tests, which bundle the entire self-contained output directory (so it has all the native host/runtime files that wouldn't be part of single-file)
  - I had played with merging this with `TestApp` or reusing `TestApp.Populate*`, but this was special enough (files not on disk, only assemblies) that it seemed more confusing to try to share a code path.
- Update `BundledAppWithSubDirs` to use this new mechanism for creating a single-file test app instead of copying/publishing projects
  - On my Windows machine running on Release everything, these go from a bit over a minute to 12 seconds
On some failure paths, it was boxing a char or an int that would have been used as part of creating an exception on a Parse path but that's ignored on a TryParse path. This avoids that boxing by storing those values into a dedicated int field on the DateTimeResult. While doing this, I also consolidated several other fields, such that this not only avoids the boxing, it also results in a net reduction of the size of DateTimeResult (it's passed around by ref, so this only slightly reduces a zeroing burden).
Fixes dotnet#91309.

NativeAOT sources stack trace data from two places: reflection metadata (if the method was reflection-visible), or specialized stack trace metadata. Finding out if a method should be hidden from stack traces in the former case is easy: just look for the attribute. The latter case requires compiler work.

In this PR, I'm extending the specialized stack trace metadata format to contain a bit that says if the frame should be hidden or not. I'm doing it in a way that we can also do this for compiler-generated methods (that otherwise don't show up in stack trace metadata).

The runtime behavior is similar to CoreCLR - the methods show up in individual stack frames, but if we stringify the stack trace, they'll be skipped. I'm adding a specialized test that tests the two sources of metadata.

I'm also marking the methods involved in the startup path as stack trace hidden.

There is one thing I skipped - CoreCLR has logic to force generate the stack trace metadata for the first frame. If we do that, the compiler-generated startup code starts showing up in stack traces again. Marking `Main` as stack trace hidden might lead to an empty stack trace.
 cpufeatures.h/.c uses TARGET_ ifdefs, but it should use HOST_ ifdefs
@matouskozak matouskozak closed this Sep 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Oct 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.