-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize GetExtension execution time on NET8 #3359
Conversation
@@ -199,8 +199,22 @@ private static string DetermineValidPathCharacters() | |||
/// <returns></returns> | |||
public static string GetExtension(string path) | |||
{ | |||
if (path == null) | |||
if (path is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not simply replace this with https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getextension?view=net-8.0?
It seems it covers all the edge cases already and also has the directory and alt directory separator char. That version is already heavily optimized, and then the complexity would go away of having to manage TFM specific code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielmarbach I actually gave that a try :)
Benchmarking yielded numbers similar to the existing AWSSDK code, which as it turns out use very similar algorithms. It has indeed been heavily optimized for the filesystem "happy path" where a short file extension exists.
There is one notable difference though in that the current dotnet runtime is more permissive than the current AWSSDK implementation. The AWSSDK will currently return an empty string if any of /
, \
or :
are present, regardless of underlying runtime. The current built-in Windows dotnet runtime, however, will no longer detect volume separator while the Unix runtime will not detect a volume separator or a backslash - only forward slash. I believe the the opposite may've been true for the full framework side, where the built-in one was too restrictive in that it may throw for various invalid filesystem characters like *
, |
etc. For the NET8 side, I wasn't sure if the change in behavior for keys like foo.{tenantId}:{subId}
was worth it.
@normj Reaching out as I see you've been requested for review here. This does introduce added complexity and I recognize that optimizing for GUID (or similar) keys may be a less-common use case than I may imagine. I'm happy to go in any direction you like here. Note that if optimizing for GUIDs were of interest then this could be taken a bit further by adding this after the first LastIndexOf()
call effectively bring the GUID case down to around 2ns on my machine at the expense of another 0.5ns for the cases where a file extension does exist.
int extensionIndex = path.AsSpan().LastIndexOf('.');
+if (extensionIndex == -1) {
+ return string.Empty;
+}
int directoryIndex = path.AsSpan().LastIndexOfAny('/', '\\', ':');
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying using the extension method that @danielmarbach shows no significant performance boost over what is shipped in the SDK today?
The code change you have for .NET 8+ is pretty simple and straight forward so I'm good with taking the change assuming we are saying this approach is still faster then the extension method coming from the framework. I would ask that you add a comment in the #if
explaining how newer versions of .NET have an optimized version of LastIndexOf
. Basically give a hint for feature reads of the code wondering why we have 2 separate implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, my last comment was a little verbose as I'd been rushing to send it before the bottom of the hour here.
Are you saying using the extension method that @danielmarbach shows no significant performance boost over what is shipped in the SDK today?
@normj
This is correct. I've re-benchmarked to include the built-in .NET version as well as including the "early return" in this version. I've updated the original PR description to include the benchmarking code and latest results. I'll push the changes shortly, after resolving some conflicts.
18a3b3f
to
cbeb741
Compare
// LastIndexOf and LastIndexOfAny is vectorized on .NET8+ and is | ||
// signifigantly faster for cases where 'path' does not end with a short file | ||
// extension, such as GUIDs | ||
int extensionIndex = path.AsSpan().LastIndexOf('.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be cleaner to assign the span to a local and then use it through the method?
// AND it's not the last one in the string | ||
if (directoryIndex < extensionIndex && extensionIndex < path.Length - 1) | ||
{ | ||
return path.Substring(extensionIndex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had the span assigned to a local like suggested above we could slice the span here directly
return (ch == '\\' || | ||
ch == '/' || | ||
ch == ':'); | ||
bool IsPathSeparator(char ch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still be static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Unfortunately, at the time I had looked at it this was limited by the <LangVersion>
on .NET Framework (#3316). Static local functions required version 9.
@stevenaw I haven't looked in details at the PR but I wonder why the conditional compile is required. I was able to make similar optimizations in #3365 without conditionals MemoryExtensions etc. should be available. You might not get the same speed as on NET8 but that doesn't really matter especially because you could then also have just one code path and one test case for both TFMs |
Thanks for the feedback @normj I've left recently on vacation but can take a look at all these when I return middle next week. |
7d220da
to
8894688
Compare
Thanks for your patience on this one. I think I've incorporated or replied to all the feedback here and it is ready for another review |
This is a good point. The main drawback would be how long it takes when processing a file path with an extension (ex: @normj I'm happy to push another commit with this change if you'd prefer it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine leaving the old targets using the same code path not giving users any surprises and focus on the performance improvements being in .NET 8+.
Description
Newer versions of .NET have improved the performance of the IndexOf() methods such that it's often quicker for many cases to do two
LastIndexOf[Any]()
calls rather than one manual loop. This PR uses this to provide a 4x speedup (on my hardware) over the "worst-case" time of the old algorithm where no delimeters can be found, such as when processing a GUID in string format. The "best case" for the original implementation where a value ends in a short 3-letter extension remains comparable, +/- 0.5ns.Despite the APIs being available as far back as netcore2.1, I have chosen to target this hotpath against v4 and NET8+ as the BCL speed-ups to make this worthwhile are only available on later runtimes.
Motivation and Context
This function appears used by S3 client. Optimizing handling of non-file names such as GUIDS as keys will help when used against S3 buckets using GUIDs or similar keys.
Testing
The changes are essentially .NET8-only though I added equivalent unit tests for both the NetStandard and NetFramework projects to demonstrate matching functionality.
Screenshots (if appropriate)
Benchmarks
Benchmark Code
Types of changes
Checklist
License