-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize GetExtension execution time on NET8 #3359
Changes from all commits
dc997f8
3027a62
141f6fc
ace9e16
8894688
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -195,8 +195,31 @@ private static string DetermineValidPathCharacters() | |
/// <returns></returns> | ||
public static string GetExtension(string path) | ||
{ | ||
if (path == null) | ||
if (path is null) | ||
return null; | ||
|
||
#if NET8_0_OR_GREATER | ||
// LastIndexOf and LastIndexOfAny is vectorized on .NET8+ and is | ||
// significantly faster for cases where 'path' does not end with a short file | ||
// extension, such as GUIDs | ||
ReadOnlySpan<char> pathSpan = path.AsSpan(); | ||
int extensionIndex = pathSpan.LastIndexOf('.'); | ||
if (extensionIndex == -1) | ||
{ | ||
return string.Empty; | ||
} | ||
|
||
int directoryIndex = pathSpan.LastIndexOfAny('/', '\\', ':'); | ||
|
||
// extension separator is found and exists before path separator or path separator doesn't exist | ||
// AND it's not the last one in the string | ||
if (directoryIndex < extensionIndex && extensionIndex < pathSpan.Length - 1) | ||
{ | ||
return pathSpan.Slice(extensionIndex).ToString(); | ||
} | ||
|
||
return string.Empty; | ||
#else | ||
int length = path.Length; | ||
int index = length; | ||
|
||
|
@@ -213,15 +236,16 @@ public static string GetExtension(string path) | |
else if (IsPathSeparator(ch)) | ||
break; | ||
} | ||
|
||
return string.Empty; | ||
} | ||
|
||
// Checks if the character is one \ / : | ||
private static bool IsPathSeparator(char ch) | ||
{ | ||
return (ch == '\\' || | ||
ch == '/' || | ||
ch == ':'); | ||
bool IsPathSeparator(char ch) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should this still be static? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good catch. Unfortunately, at the time I had looked at it this was limited by the |
||
{ | ||
return (ch == '\\' || | ||
ch == '/' || | ||
ch == ':'); | ||
} | ||
#endif | ||
} | ||
|
||
/* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply replace this with https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getextension?view=net-8.0?
It seems it covers all the edge cases already and also has the directory and alt directory separator char. That version is already heavily optimized, and then the complexity would go away of having to manage TFM specific code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielmarbach I actually gave that a try :)
Benchmarking yielded numbers similar to the existing AWSSDK code, which as it turns out use very similar algorithms. It has indeed been heavily optimized for the filesystem "happy path" where a short file extension exists.
There is one notable difference though in that the current dotnet runtime is more permissive than the current AWSSDK implementation. The AWSSDK will currently return an empty string if any of
/
,\
or:
are present, regardless of underlying runtime. The current built-in Windows dotnet runtime, however, will no longer detect volume separator while the Unix runtime will not detect a volume separator or a backslash - only forward slash. I believe the the opposite may've been true for the full framework side, where the built-in one was too restrictive in that it may throw for various invalid filesystem characters like*
,|
etc. For the NET8 side, I wasn't sure if the change in behavior for keys likefoo.{tenantId}:{subId}
was worth it.@normj Reaching out as I see you've been requested for review here. This does introduce added complexity and I recognize that optimizing for GUID (or similar) keys may be a less-common use case than I may imagine. I'm happy to go in any direction you like here. Note that if optimizing for GUIDs were of interest then this could be taken a bit further by adding this after the first
LastIndexOf()
call effectively bring the GUID case down to around 2ns on my machine at the expense of another 0.5ns for the cases where a file extension does exist.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying using the extension method that @danielmarbach shows no significant performance boost over what is shipped in the SDK today?
The code change you have for .NET 8+ is pretty simple and straight forward so I'm good with taking the change assuming we are saying this approach is still faster then the extension method coming from the framework. I would ask that you add a comment in the
#if
explaining how newer versions of .NET have an optimized version ofLastIndexOf
. Basically give a hint for feature reads of the code wondering why we have 2 separate implementations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, my last comment was a little verbose as I'd been rushing to send it before the bottom of the hour here.
@normj
This is correct. I've re-benchmarked to include the built-in .NET version as well as including the "early return" in this version. I've updated the original PR description to include the benchmarking code and latest results. I'll push the changes shortly, after resolving some conflicts.