Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: CRC-32C non-cryptographic hashing API #85222

Open
brantburnett opened this issue Apr 23, 2023 · 11 comments
Open

[API Proposal]: CRC-32C non-cryptographic hashing API #85222

brantburnett opened this issue Apr 23, 2023 · 11 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO.Hashing
Milestone

Comments

@brantburnett
Copy link
Contributor

Background and motivation

#24328 and subsequent changes have added support for CRC-32, CRC-64, XxHash32, XxHash64, and XxHash128 non-cryptographic hash algorithms to the System.IO.Hashing library. The CRC-32 implementation follows the ITU-T V.42 and IEEE 802.3 specifications.

However, another common CRC-32 implementation is CRC-32C (Castagnoli), which uses a different polynomial. This algorithm is used by iSCSI, SCTP, G.hn payload, Btrfs, ext4, Ceph, and Snappy. It is also supported by hardware intrinsics on both ARM and Intel processors (with recently added support in BitConverter to simplify use of the intrinsics #61558).

Adding high-performance built-in support for this variant could be beneficial to library and application authors. For example, this case in the Snappier implementation of Snappy compression could benefit from a higher performance implementation being generally available.

The implementation of the algorithm should make use of vectorization and intrinsics (when available) to gain the best possible performance.

API Proposal

namespace System.IO.Hashing;

public sealed class Crc32C : NonCryptographicHashAlgorithm
{
    public Crc32C() : base(32) { }
    public override void Append(ReadOnlySpan<byte> source) { }
    public override void Reset() { }
    protected override GetCurrentHashCore(Span<byte> destination) { }
    protected override void GetHashAndResetCore(Span<byte> destination) { }
    [CLSCompliant(false)] public uint GetCurrentHashAsUInt32() => throw null;

    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}

API Usage

// Compute a CRC-32C
var crc32c = new Crc32C();
crc32c.Append(sourceBytes);
Console.WriteLine(crc32c.GetcurrenthashAsUInt32());

// Compute and output to span
Span<byte> dest = stackalloc byte[4];
var bytesWritten = Crc32C.Hash(sourceBytes, dest);

Alternative Designs

An alternative design is to unseal Crc32 and inherit from it to gain some code reuse. However, this could reduce the performance of Crc32 and was previously dismissed in another discussion: #24328 (comment)

Risks

The only risk I see is adding more code to be maintained.

@brantburnett brantburnett added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 23, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Apr 23, 2023
@ghost
Copy link

ghost commented Apr 23, 2023

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

#24328 and subsequent changes have added support for CRC-32, CRC-64, XxHash32, XxHash64, and XxHash128 non-cryptographic hash algorithms to the System.IO.Hashing library. The CRC-32 implementation follows the ITU-T V.42 and IEEE 802.3 specifications.

However, another common CRC-32 implementation is CRC-32C (Castagnoli), which uses a different polynomial. This algorithm is used by iSCSI, SCTP, G.hn payload, Btrfs, ext4, Ceph, and Snappy. It is also supported by hardware intrinsics on both ARM and Intel processors (with recently added support in BitConverter to simplify use of the intrinsics #61558).

Adding high-performance built-in support for this variant could be beneficial to library and application authors. For example, this case in the Snappier implementation of Snappy compression could benefit from a higher performance implementation being generally available.

The implementation of the algorithm should make use of vectorization and intrinsics (when available) to gain the best possible performance.

API Proposal

namespace System.IO.Hashing;

public sealed class Crc32C : NonCryptographicHashAlgorithm
{
    public Crc32C() : base(32) { }
    public override void Append(ReadOnlySpan<byte> source) { }
    public override void Reset() { }
    protected override GetCurrentHashCore(Span<byte> destination) { }
    protected override void GetHashAndResetCore(Span<byte> destination) { }
    [CLSCompliant(false)] public uint GetCurrentHashAsUInt32() => throw null;

    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}

API Usage

// Compute a CRC-32C
var crc32c = new Crc32C();
crc32c.Append(sourceBytes);
Console.WriteLine(crc32c.GetcurrenthashAsUInt32());

// Compute and output to span
Span<byte> dest = stackalloc byte[4];
var bytesWritten = Crc32C.Hash(sourceBytes, dest);

Alternative Designs

An alternative design is to unseal Crc32 and inherit from it to gain some code reuse. However, this could reduce the performance of Crc32 and was previously dismissed in another discussion: #24328 (comment)

Risks

The only risk I see is adding more code to be maintained.

Author: brantburnett
Assignees: -
Labels:

api-suggestion, area-System.Security

Milestone: -

@ghost
Copy link

ghost commented Apr 23, 2023

Tagging subscribers to this area: @dotnet/area-system-io-hashing, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

#24328 and subsequent changes have added support for CRC-32, CRC-64, XxHash32, XxHash64, and XxHash128 non-cryptographic hash algorithms to the System.IO.Hashing library. The CRC-32 implementation follows the ITU-T V.42 and IEEE 802.3 specifications.

However, another common CRC-32 implementation is CRC-32C (Castagnoli), which uses a different polynomial. This algorithm is used by iSCSI, SCTP, G.hn payload, Btrfs, ext4, Ceph, and Snappy. It is also supported by hardware intrinsics on both ARM and Intel processors (with recently added support in BitConverter to simplify use of the intrinsics #61558).

Adding high-performance built-in support for this variant could be beneficial to library and application authors. For example, this case in the Snappier implementation of Snappy compression could benefit from a higher performance implementation being generally available.

The implementation of the algorithm should make use of vectorization and intrinsics (when available) to gain the best possible performance.

API Proposal

namespace System.IO.Hashing;

public sealed class Crc32C : NonCryptographicHashAlgorithm
{
    public Crc32C() : base(32) { }
    public override void Append(ReadOnlySpan<byte> source) { }
    public override void Reset() { }
    protected override GetCurrentHashCore(Span<byte> destination) { }
    protected override void GetHashAndResetCore(Span<byte> destination) { }
    [CLSCompliant(false)] public uint GetCurrentHashAsUInt32() => throw null;

    public static byte[] Hash(byte[] source) => throw null;
    public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
    public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
    public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}

API Usage

// Compute a CRC-32C
var crc32c = new Crc32C();
crc32c.Append(sourceBytes);
Console.WriteLine(crc32c.GetcurrenthashAsUInt32());

// Compute and output to span
Span<byte> dest = stackalloc byte[4];
var bytesWritten = Crc32C.Hash(sourceBytes, dest);

Alternative Designs

An alternative design is to unseal Crc32 and inherit from it to gain some code reuse. However, this could reduce the performance of Crc32 and was previously dismissed in another discussion: #24328 (comment)

Risks

The only risk I see is adding more code to be maintained.

Author: brantburnett
Assignees: -
Labels:

api-suggestion, untriaged, area-System.IO.Hashing

Milestone: -

@adamsitnik
Copy link
Member

@stephentoub @bartonjs @vcsjones what is your opinion on that? It LGTM but I don't know what is our strategy on adding new hash algorithms APIs

@vcsjones
Copy link
Member

A possible alternative solution is to consider a more general solution of allowing custom polynomials to be used. This likely has steeper use but it might be worth considering some variation of that instead of introducing a new type every time there is a good case for a new CRC polynomial. #78063 has proposed such an idea.

@MichalPetryka
Copy link
Contributor

A possible alternative solution is to consider a more general solution of allowing custom polynomials to be used.

CRC32C is special cased and hardware accelerated on both XArch and Arm64.

@brantburnett
Copy link
Contributor Author

A possible alternative solution is to consider a more general solution of allowing custom polynomials to be used. This likely has steeper use but it might be worth considering some variation of that instead of introducing a new type every time there is a good case for a new CRC polynomial. #78063 has proposed such an idea.

Truly implementing custom polynomials is also much more complex than just a different polynomial value. Different CRC algorithms also involve other details such as input bit reflection, output bit reflection, initial values, and XOR values for the final output, even for the same size. You can see a lot of the different variants here: http://www.sunshine2k.de/coding/javascript/crc/crc_js.html

@brantburnett
Copy link
Contributor Author

@adamsitnik Any further thoughts on this? I'd love to get it in before .NET 8 gets frozen for release candidates.

@jozkee jozkee added this to the 9.0.0 milestone Jul 28, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 28, 2023
@Lanayx
Copy link

Lanayx commented Sep 20, 2023

Apache Pulsar also uses CRC32C, so this will be beneficial for it's client
image

@EgorBo
Copy link
Member

EgorBo commented Sep 27, 2023

@stephentoub
Copy link
Member

stephentoub commented Sep 28, 2023

SIMDified version in Chromium https://chromium.googlesource.com/chromium/src/+/HEAD/third_party/zlib/crc32_simd.c

.NET 8's implementation of crc32 (this issue is about crc32c) is also vectorized:
https://github.com/dotnet/runtime/blob/main/src/libraries/System.IO.Hashing/src/System/IO/Hashing/Crc32.Vectorized.cs

.NET 8 also includes BitOperations.Crc32c that uses SSE4's _mm_crc32_u8/16/32/64 and arm's __crc32cb/h/w/d when available.

@PaulusParssinen
Copy link
Contributor

We can go even faster: https://github.com/corsix/fast-crc32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.IO.Hashing
Projects
None yet
Development

No branches or pull requests