Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize GetExtension execution time on NET8 #3359

Merged
merged 5 commits into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 32 additions & 8 deletions sdk/src/Core/Amazon.Util/AWSSDKUtils.cs
Original file line number Diff line number Diff line change
Expand Up @@ -195,8 +195,31 @@ private static string DetermineValidPathCharacters()
/// <returns></returns>
public static string GetExtension(string path)
{
if (path == null)
if (path is null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply replace this with https://learn.microsoft.com/en-us/dotnet/api/system.io.path.getextension?view=net-8.0?

It seems it covers all the edge cases already and also has the directory and alt directory separator char. That version is already heavily optimized, and then the complexity would go away of having to manage TFM specific code.

Copy link
Contributor Author

@stevenaw stevenaw Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielmarbach I actually gave that a try :)

Benchmarking yielded numbers similar to the existing AWSSDK code, which as it turns out use very similar algorithms. It has indeed been heavily optimized for the filesystem "happy path" where a short file extension exists.

There is one notable difference though in that the current dotnet runtime is more permissive than the current AWSSDK implementation. The AWSSDK will currently return an empty string if any of /, \ or : are present, regardless of underlying runtime. The current built-in Windows dotnet runtime, however, will no longer detect volume separator while the Unix runtime will not detect a volume separator or a backslash - only forward slash. I believe the the opposite may've been true for the full framework side, where the built-in one was too restrictive in that it may throw for various invalid filesystem characters like *, | etc. For the NET8 side, I wasn't sure if the change in behavior for keys like foo.{tenantId}:{subId} was worth it.

@normj Reaching out as I see you've been requested for review here. This does introduce added complexity and I recognize that optimizing for GUID (or similar) keys may be a less-common use case than I may imagine. I'm happy to go in any direction you like here. Note that if optimizing for GUIDs were of interest then this could be taken a bit further by adding this after the first LastIndexOf() call effectively bring the GUID case down to around 2ns on my machine at the expense of another 0.5ns for the cases where a file extension does exist.

int extensionIndex = path.AsSpan().LastIndexOf('.');
+if (extensionIndex == -1) {
+  return string.Empty;
+}

int directoryIndex = path.AsSpan().LastIndexOfAny('/', '\\', ':');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying using the extension method that @danielmarbach shows no significant performance boost over what is shipped in the SDK today?

The code change you have for .NET 8+ is pretty simple and straight forward so I'm good with taking the change assuming we are saying this approach is still faster then the extension method coming from the framework. I would ask that you add a comment in the #if explaining how newer versions of .NET have an optimized version of LastIndexOf. Basically give a hint for feature reads of the code wondering why we have 2 separate implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, my last comment was a little verbose as I'd been rushing to send it before the bottom of the hour here.

Are you saying using the extension method that @danielmarbach shows no significant performance boost over what is shipped in the SDK today?

@normj
This is correct. I've re-benchmarked to include the built-in .NET version as well as including the "early return" in this version. I've updated the original PR description to include the benchmarking code and latest results. I'll push the changes shortly, after resolving some conflicts.

return null;

#if NET8_0_OR_GREATER
// LastIndexOf and LastIndexOfAny is vectorized on .NET8+ and is
// significantly faster for cases where 'path' does not end with a short file
// extension, such as GUIDs
ReadOnlySpan<char> pathSpan = path.AsSpan();
int extensionIndex = pathSpan.LastIndexOf('.');
if (extensionIndex == -1)
{
return string.Empty;
}

int directoryIndex = pathSpan.LastIndexOfAny('/', '\\', ':');

// extension separator is found and exists before path separator or path separator doesn't exist
// AND it's not the last one in the string
if (directoryIndex < extensionIndex && extensionIndex < pathSpan.Length - 1)
{
return pathSpan.Slice(extensionIndex).ToString();
}

return string.Empty;
#else
int length = path.Length;
int index = length;

Expand All @@ -213,15 +236,16 @@ public static string GetExtension(string path)
else if (IsPathSeparator(ch))
break;
}

return string.Empty;
}

// Checks if the character is one \ / :
private static bool IsPathSeparator(char ch)
{
return (ch == '\\' ||
ch == '/' ||
ch == ':');
bool IsPathSeparator(char ch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this still be static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Unfortunately, at the time I had looked at it this was limited by the <LangVersion> on .NET Framework (#3316). Static local functions required version 9.

{
return (ch == '\\' ||
ch == '/' ||
ch == ':');
}
#endif
}

/*
Expand Down
17 changes: 17 additions & 0 deletions sdk/test/NetStandard/UnitTests/Core/AWSSDKUtilsTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -84,5 +84,22 @@ public void UrlEncodeWithPath(string input, string expected)

Assert.Equal(expected, encoded);
}

[Theory]
[InlineData(null, null)]
[InlineData("no-delimiters-at-all", "")]
[InlineData("delimiter-end-of-string.", "")]
[InlineData("relative-path/no-file-extension", "")]
[InlineData("relative-path\\no-file-extension", "")]
[InlineData("relative-path:no-file-extension", "")]
[InlineData("simple-file.pdf", ".pdf")]
[InlineData("relative-path/with-file-extension.pdf", ".pdf")]
[InlineData("relative-path.with-dot/with-file-extension.pdf", ".pdf")]
public void GetExtension(string input, string expected)
{
var actual = AWSSDKUtils.GetExtension(input);

Assert.Equal(expected, actual);
}
}
}
23 changes: 20 additions & 3 deletions sdk/test/UnitTests/Custom/Util/AWSSDKUtilsTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,11 @@
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/
using Amazon.Util;
using Microsoft.VisualStudio.TestTools.UnitTesting;
using System;
using System.IO;
using Amazon.Util;
using System.Reflection;
using Moq;
using Amazon.Util.Internal;
using System.Text;

namespace AWSSDK.UnitTests
Expand Down Expand Up @@ -178,5 +176,24 @@ public void ToHex(string input, bool lowercase, string expectedResult)

Assert.AreEqual(expectedResult, hexString);
}

[TestCategory("UnitTest")]
[TestCategory("Util")]
[DataTestMethod]
[DataRow(null, null)]
[DataRow("no-delimiters-at-all", "")]
[DataRow("delimiter-end-of-string.", "")]
[DataRow("relative-path/no-file-extension", "")]
[DataRow("relative-path\\no-file-extension", "")]
[DataRow("relative-path:no-file-extension", "")]
[DataRow("simple-file.pdf", ".pdf")]
[DataRow("relative-path/with-file-extension.pdf", ".pdf")]
[DataRow("relative-path.with-dot/with-file-extension.pdf", ".pdf")]
public void GetExtension(string input, string expected)
{
var actual = AWSSDKUtils.GetExtension(input);

Assert.AreEqual(expected, actual);
}
}
}