-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose general purpose Crc32 APIs #2036
Comments
Is this using a specific polynomial? and if so which one? and can there be a way for the user to choose? |
Not if we want these accelerated by the underlying hardware.
|
So if someone is dealing with zip or something using the old polynomial they'll still need to make their own. That's a bit of a shame. Given that at least half of the time you want to calculate a crc you will be validating incoming data it might make sense to have a |
If we were to expose a CRC32 API over a buffer rather than an integral type I'd probably push for it not to be on the |
If |
Right, this is namely meant to be a "cross-platform" version of the hardware intrinsics (much like the other functions on
I agree that having a more general purpose class that allows polynomial customization and trivial operation over spans of data would be useful, but that isn't the immediate goal of this API. |
If I understand right, given the caller can't choose the polynomial, and we will use the fixed CPU specific algorithm if available, this API would only be useful for callers that are (a) not interoperating with some other format that has a chosen polynomial (such as zlib), and (b) do not need to exchange the value across architectures. Is that right? |
We likely could provide both and we might even be able to expose the API in such a way that the JIT specially handles the method if the polynomial given is constant. But as proposed, it would be limited to the given polynomials. |
I guess I'm wondering whether there's examples when this API with the "undefined" polynomial would be useful. But maybe that's not that different to asking when the intrinsics we already have would be useful. It would have to be when you're producing/consuming on the same machine. (Unless I misunderstand) |
namespace System.Numerics
{
public static class BitOperations
{
public static uint Crc32(uint crc, byte data);
public static uint Crc32(uint crc, ushort data);
public static uint Crc32(uint crc, uint data);
public static uint Crc32(uint crc, ulong data);
}
} |
I watched the video and I still need clarification on my questions I think 😄
Also, it would be good to have an example of usage, especially since this API is expected to be used to checksum an array or stream. How does that look? |
The polynomial is The polynomial being used is for CRC32-Castagnoli, which is fairly popular/prevalent and adopted in several standards; but which is not universaly used. |
In particular, wikipedia lists |
I see. That also seems to be the one used by the most popular package (CRC32C.NET) |
Any plans to add Crc8 and Crc16 support, as well? Used in protocol messages (https://sourceforge.net/p/bacnet/mailman/message/1259086/). |
Not as part of this work because I don't believe there is hardware intrinsic that provides either of those. |
Could I give this a try? |
That optimization only kicks in for byte and sbyte arrays. You would need to store the table as byte array. I do not think the precomputed table makes sense for this. Computing the table on the fly for the fallback case should be just fine. |
The description of that PR says |
I've changed the table to be stored in a BenchmarkDotNet=v0.12.1, OS=Windows 8.1 (6.3.9600.0)
Intel Core i3-4130 CPU 3.40GHz (Haswell), 1 CPU, 4 logical and 2 physical cores
Frequency=3319351 Hz, Resolution=301.2637 ns, Timer=TSC
.NET Core SDK=5.0.102
[Host] : .NET Core 5.0 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
.NET Core 5.0 : .NET Core 5.0.2 (CoreCLR 5.0.220.61120, CoreFX 5.0.220.61120), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
|
The You need to do |
Yeah, noticed after reading the commets on that PR, was my fault for not checking them. |
This came up as part of #24328, and we probably want to use namespace System.Numerics
{
public static class BitOperations
{
public static uint Crc32C(uint crc, byte data);
public static uint Crc32C(uint crc, ushort data);
public static uint Crc32C(uint crc, uint data);
public static uint Crc32C(uint crc, ulong data);
}
} CC. @bartonjs, @terrajobst, @GrabYourPitchforks could we confirm this is desired and that the above is the desired implementation shape? |
Makes sense to us (@dotnet/fxdc) |
Can you expose as well a method for Crc32 computation on buffers ? I don't think .NET has a public one. For example, simply turning System.IO.Compression.Crc32Helper public would be nice. |
@wiz0u have you seen #24328? This includes general purpose Crc32 APIs and will be in .NET 6 (under System.IO.Hashing: https://source.dot.net/#System.IO.Hashing/System/IO/Hashing/Crc32.cs,6471de2466bb4aa7). |
ah great! thanks! |
I would suggest adding another method to the proposal with the nuint parameter type. |
I wouldn't find that necessarily useful, the point of the overloads is to hash multiple bytes at once so it usually involves reading byte array values as different types, which gets a bit unclean when the type we read isn't constant size. |
The type is a constant size for a given run (32-bits on 32-bit machines and 64-bits on 64-bit machines), as is the result (32-bits). It only makes a difference on how efficiently you can process the bits for large inputs ( |
@tannergooding I can't tell if you're arguing for, against, or just providing data 😄. public static uint Crc32C(uint crc, nuint data); seems like a reasonable addition on the surface of things, but won't it produce different answers on 32-bit and 64-bit executions for the same 32-bit compatible value, since a 64-bit run sees 4 extra bytes in the payload? I'm probably leaning toward not having it and making the caller be clear if they always want 64-bit semantics or if they wanted platform native, on the assumption that |
It will produce different values. |
@bartonjs, these are the primitive building blocks and so it's up the the author of a more general Crc32 loop to correctly handle any trailing bytes.\ I would expect, for example, if you have 11 bytes that you'd do: 1 or 2 iterations using The only difference between all of these is how many bytes get operated on "per iteration". |
inconsistency between |
I don't think that's required. The System.IO.Hashing is a oneshot not an additive so you pass it the total data and then it computes it all which will include the final mask which is normally used to tell the difference between a hash of 0 entries and an empty hash. The intrinsic is just the underlying platform instruction being exposed for use. |
In addition to the things @Wraith2 already pointed out, the System.IO.Hashing.Crc32 type is computing a different 32-bit CRC than this primitive. System.IO.Hashing.Crc32 computes the same CRC-32 used for Ethernet; this one is used for "CRC-32C", which is used in ext4. https://stackoverflow.com/questions/26429360/crc32-vs-crc32c CRC-32: https://reveng.sourceforge.io/crc-catalogue/17plus.htm#crc.cat.crc-32-iso-hdlc |
This issue is blocked by #62692. @tannergooding can you please remove the |
Rationale
Computing a Cyclic Redundancy Check (CRC) is a common algorithm used for error detection in things like networks or storage. Additionally, modern hardware provides instruction level support for computing these values. It would be beneficial if we exposed a set of general-purpose methods which allow iterative CRC32 computation.
Proposed API
Open Questions
The output is generally treated as unsigned, however .NET considers unsigned integers (other than byte) as non-CLS compliant. It would be possible to expose both versions (those that take/return
int
and those that take/returnuint
). It would likewise be good on the input data to determine if they should be signed, unsigned, or both.The text was updated successfully, but these errors were encountered: