You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
Fixing Buffer::BlockCopy, JIT_MemCpy, and JIT_MemSet to just call the appropriate CRT functions for x64 Windows, as is already done for all other platforms/targets
#25763
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This resolves #25505 for release/3.0 and is a backport of #25750.
Based on the original issue where JIT_MemCpy was changed to use rep movsb (see #7198), there was:
minor improvement (~5%) for arrays of length 0 to 120
good improvement (~40%) for arrays of length 130 to 510
~47% for arrays of length 130 to 310
~39% for arrays of length 320 to 440
~27% for arrays of length 450 to 510
little improvement (~1%) for arrays above 510 in length
This was only tested for 520 and 1000 bytes
However, on AMD processors, there are additional limitations around rep movsb and when it is beneficial to use. The common conditions under which it is being used in the JIT_MemCpy method today actually cause a 2x perf decrease for arrays larger than 512 bytes.
Having a custom memcpy routine adds additional maintenance burden, can be error prone, is generally not as widely tested, and does not get many of the optimizations that the CRT implementations receives. This, coupled with the overall minor improvements for small arrays on Intel processors and the 2x regression for arrays over 512 bytes on AMD processors is resulting in the custom memcpy routine being removed.
It would be beneficial for any future improvements to memcpy to be made directly against glibc and crt instead.
@MeiChin-Tsai this one is your call 😄 I would have a preference to take it, if the risk is low enough because it's a large (2x) regression on a significant proportion of CPU out there, in fairly commonly used API, that we missed due to insufficient coverage of this hardware. However, I'd maybe not take it in ask mode.
@jkotas how do you feel about the risk? @tannergooding is this close to a git revert (which would be lower risk)?
@tannergooding is this close to a git revert (which would be lower risk)?
No, it isn't a revert. A revert would have kept us with a custom JIT_MemCpy and JIT_MemSet implementation for x64 Windows. Instead, this just makes x64 Windows consistent with the other platform/architecture combinations and just forwards to the CRT implementation.
Get descriptions, examples and documentation about supported commands
Example: help "command_name"
list:
List all pipelines for this repository using a comment.
Example: "list"
run:
Run all pipelines or a specific pipeline for this repository using a comment. Use this command by itself to trigger all related pipelines, or specify a pipeline to run.
Example: "run" or "run pipeline_name"
where:
Report back the Azure DevOps orgs that are related to this repository and org
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This resolves #25505 for release/3.0 and is a backport of #25750.
Based on the original issue where JIT_MemCpy was changed to use rep movsb (see #7198), there was:
minor improvement (~5%) for arrays of length 0 to 120
good improvement (~40%) for arrays of length 130 to 510
~47% for arrays of length 130 to 310
~39% for arrays of length 320 to 440
~27% for arrays of length 450 to 510
little improvement (~1%) for arrays above 510 in length
This was only tested for 520 and 1000 bytes
However, on AMD processors, there are additional limitations around rep movsb and when it is beneficial to use. The common conditions under which it is being used in the JIT_MemCpy method today actually cause a 2x perf decrease for arrays larger than 512 bytes.
Having a custom memcpy routine adds additional maintenance burden, can be error prone, is generally not as widely tested, and does not get many of the optimizations that the CRT implementations receives. This, coupled with the overall minor improvements for small arrays on Intel processors and the 2x regression for arrays over 512 bytes on AMD processors is resulting in the custom memcpy routine being removed.
It would be beneficial for any future improvements to memcpy to be made directly against glibc and crt instead.