-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General purpose non-cryptographic hashing API for .NET #24328
Comments
TBH, I cloned it, tried to publish, got MSB4062, said to myself "I'm probably building with the wrong version of dotnet and/or msbuild, I'll figure it out this weekend" and then forgot all about it :( In terms of an API, the feedback I've gotten on SpookilySharp has mostly been people who liked it but didn't like the license (I changed it to MIT this weekend, but it had been EUPL for a long time). It's either too good or too bad to get contributors. From my experience with it and from the little feedback I got I would say:
|
A first few questions, not thought through in detail yet: What kinds of APIs are the focus here?
A few things that I assume:
Some stuff that I'm not totally sure about yet (back from when my time ran out on the Haschisch playground):
Also, what's next? |
I propose enabling an API for several non-cryptographic hash algorithms as well as being extensible to future/user-provided algorithms. "Non-cryptographic" includes Spooky or Marvin32, but not SHA256 (we shouldn't have any overlap between System.Numerics.Hashing and System.Security.Cryptography to avoid confusion). The first set of algorithms we build can be determined after deciding on an API (using data gathered in #19621). Reasons to use them include: Enabling multiple algorithms is important because they differ by size, speed, entropy, DoS resistance and general usage. Unlike HashCode, this would be for users who want to pick a particular algorithm for a particular purpose. Some potentially interesting characteristics could include:
Based on all of that, my straw-man API idea, based on IncrementalHash, is below: namespace System.Numerics.Hashing
{
public abstract class HashBase
{
/// <summary>
/// Number of bits produced by the hash
/// </summary>
public abstract int HashSize { get; }
/// <summary>
/// Core hashing function filled in by implementers. Adds data to the internal
/// state of the hash.
/// </summary>
public abstract void AddIncrementalData(ReadOnlySpan<byte> data);
/// <summary>
/// Both returns the result of the hash of incrementally added data and resets
/// the internal state to be ready to hash again.
/// </summary>
public abstract bool TryGetHashAndReset(Span<byte> hash);
public byte[] GetHashAndReset();
/// <summary>
/// Used to indicate that the hash state is currently empty and it can start a new hash.
/// </summary>
protected abstract bool StateIsEmpty { get; }
// Various overloads for hashing a single chunk of data and returning the result.
// These are intended to cover most usages. These aren't compatible with incremental hashes,
// and would throw if !StateIsEmpty (and they'll reset when they're done)
public byte[] ComputeHash(ReadOnlySpan<byte> data);
public bool TryComputeHash(ReadOnlySpan<byte> data, Span<byte> hash);
public byte[] ComputeHash(Stream data);
public bool TryComputeHash(Stream data, Span<byte> hash);
public async byte[] ComputeHashAsync(Stream data); // No Span overload for this because it can't be used with async
}
public abstract class IntSizedHash : HashBase
{
int Get32BitHashAndReset();
int Compute32BitHash(ReadOnlySpan<byte> data);
int Compute32BitHash(Stream data);
async Task<int> Compute32BitHashAsync(Stream data);
}
public abstract class LongSizedHash : HashBase
{
long Get64BitHashAndReset();
long Compute64BitHash(ReadOnlySpan<byte> data);
long Compute64BitHash(Stream data);
async Task<long> Compute64BitHashAsync(Stream data);
}
} An example derived class might look like: namespace System.Numerics.Hashing
{
public sealed class Marvin32 : IntSizedHash
{
private long _seed;
private bool _isEmpty = true;
private long _state;
public Marvin32(long seed) { _seed = seed; }
public override int HashSize => 32;
protected override bool StateIsEmpty => _isEmpty;
public override void AddIncrementalData(ReadOnlySpan<byte> data)
{
_isEmpty = false;
// Incorporate data into _state
}
public override bool TryGetHashAndReset(Span<byte> hash)
{
// Finalize the hash based on _state and fill it into the span
// Reset to be reused
_state = _seed;
_isEmpty = true;
return true;
}
}
} |
@morganbr, I like your proposal, though I'd offer some suggestions. First, I don't see the value of the Second, I don't know if the int / long returning methods are terribly useful. The only real use case I see for them is to make something that can be used for Finally, should |
Maybe in the native side, but the managed side doesn't allow state cloning. And "getting the hash" requires a HashFInal-type operation, such writing the number of bytes written; so once a hash has been "gotten" it has been tainted. That's why crypto
I agree. Byte -> (U)Int{32|64} requires a choice about endianness. "Machine" is wrong, because then the same hash algorithm produces a different value between x86 and arm32. Little is fine, except for when it isn't. So I'd leave the API as bytes; and if some algorithm says it produces an integer it can add the method itself. Other feedback I have, disjoint from responding to feedback feedback.
public abstract partial class HashBase
{
protected static bool ReferenceTryComputeHash<T>(ReadOnlySpan<byte> data, Span<byte> destination) where T : HashBase, new()
{
T hasher = new T();
if (destination.Length < hasher.HashSizeInBytes)
{
return false;
}
hasher.AppendData(data);
hasher.TryGetHashAndReset(destination);
// Should that be asserted? (Doesn't help 3rd party assemblies). A throw? Seems more expensive than justified.
return true;
}
// and/or.
protected static bool ReferenceTryComputeHash(HashBase hasher, ReadOnlySpan<byte> data, Span<byte> destination)
{
if (hasher == null) throw new ArgumentNullException(nameof(hasher));
if (!hasher.StateIsEmpty) throw new InvalidOperationException(SR.NoTaintedHashesPlease);
if (destination.Length < hasher.HashSizeInBytes)
{
return false;
}
hasher.AppendData(data);
hasher.TryGetHashAndReset(destination);
// Should that be asserted? (Doesn't help 3rd party assemblies). A throw? Seems more expensive than justified.
return true;
}
}
public partial class Marvin32 : HashBase
{
public static bool TryComputeHash(ReadOnlySpan<byte> data, Span<byte> destination)
{
return ReferenceTryComputeHash<Marvin32>(data, destination);
}
}
public partial class Crc32 : HashBase
{
public static bool TryComputeHash(ReadOnlySpan<byte> data, Span<byte> destination)
{
if (destination.Length < sizeof(int))
{
return false;
}
// implementation of CRC-32 over closed data, because our perf/analysis data says this is
// a popular routine, and that eliminating the Gen0 temporary HashBase-derived object is worthwhile.
return true;
}
} And, finally, the name |
This actually originates from the discussions to make (Let me know if I'm missing anything) "Mixer API" uses a stateful API, therefore the proposed design here by @morganbr also provides a stateful API. I feel like it isn't as useful as it can be, because:
An alternative I propose is a single-shot, in-memory hashing API: public interface IInMemoryHashCode
{
int GetHashCode(ReadOnlySpan<T> input);
} and the "mixer API" implementation would be responsible for accumulating the input in a buffer and passing it along to the hash function. This would make implementations easier and faster for the given purpose. Obviously, growing a buffer is also very expensive, maybe even more expensive than what we gain from hashing in one shot, but it can be tuned to work like a My proposal:
Endianness isn't an issue because I see no reason to mix in 64-bit or arbitrary length return values, because this is only for GetHashCode(). I see no reason to take this beyond the intended goal, or just use HashAlgorithm. Let me know what you think, thanks. |
Oh, another 4-years-old issue huh…… I'm trying to implement a Bloom Filter recently and I want to use a non-cryptographic hash algorithm called MurmurHash3. I find that it has already become a part of Scala's standard library, but doesn't exist in dotnet BCL. Although our community provides different implementations of those hash algorithms, some of them are confusing and unreliable. I understand all current efforts are put into the release of .NET 5. I just hope, after that, someone will consider on this issue and take some progress... |
@GrabYourPitchforks, should this issue be converted to a |
@tannergooding Probably. Just scrolled back through the earlier discussion, and I don't know if we're close to settling on an API shape. |
Based on things we've learned from the cryptographic algorithms, I have two strawmen. The first is a common pattern, no inheritance. It could probably be upgraded to inheritance-based later without breaking, because the CLR allows for methods to move to base types and they still get bound at runtime. The second is the inheritance based one, which has the unfortunate problem of needing to name the base class. They both follow the rules:
(No inheritance, simplified (span-explosion of members not shown)): namespace TBD
{
public class XxHash32
{
public XxHash32() { }
public void Append(ReadOnlySpan<byte> source) { }
public void Reset() { }
public byte[] GetCurrentHash() => throw null;
public byte[] GetHashAndReset() => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
}
public class XxHash64
{
public XxHash64() { }
public void Append(ReadOnlySpan<byte> source) { }
public void Reset() { }
public byte[] GetCurrentHash() => throw null;
public byte[] GetHashAndReset() => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
}
public class Crc32
{
public Crc32() { }
public void Append(ReadOnlySpan<byte> source) { }
public void Reset() { }
public byte[] GetCurrentHash() => throw null;
public byte[] GetHashAndReset() => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
}
public class Crc64
{
public Crc64() { }
public void Append(ReadOnlySpan<byte> source) { }
public void Reset() { }
public byte[] GetCurrentHash() => throw null;
public byte[] GetHashAndReset() => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
}
// I see a pattern here...
} The second looks an awful lot like the first, but the verbosity of the proposal is increased to show what is/isn't virtual. The int-returning/Span-writing methods use the template method pattern so that destination-too-small gets normalized exceptions, but Append and Reset don't need to since there's no validation to be done. (Unless someone wants namespace TBD
{
public abstract class HashBase
{
public int Length { get; }
protected HashBase(int length) { }
public void Append(byte[] source) { }
public abstract void Append(ReadOnlySpan<byte> source);
public abstract void Reset();
public byte[] GetCurrentHash() => throw null;
public bool TryGetCurrentHash(Span<byte> destination, out int bytesWritten) => throw null;
public int GetCurrentHash(Span<byte> destination) => throw null;
protected abstract int GetCurrentHashCore(Span<byte> destination);
public byte[] GetHashAndReset() => throw null;
public bool TryGetHashAndReset(Span<byte> destination, out int bytesWritten) => throw null;
public int GetHashAndReset(Span<byte> destination) => throw null;
// Virtual in case there's a more efficient way the algorithm can close itself.
protected virtual int GetHashAndResetCore(Span<byte> destination)
{
int ret = GetCurrentHashCore(destination);
Reset();
return ret;
}
}
public class XxHash32 : HashBase
{
public XxHash32() : base(32) { }
public override void Append(ReadOnlySpan<byte> source) { }
public override void Reset() { }
protected override int GetCurrentHashCore(Span<byte> destination) => throw null;
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}
public class XxHash64 : HashBase
{
public XxHash64() : base(64) { }
public override void Append(ReadOnlySpan<byte> source) { }
public override void Reset() { }
protected override int GetCurrentHashCore(Span<byte> destination) => throw null;
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}
public class Crc32 : HashBase
{
public Crc32() : base(32) { }
public override void Append(ReadOnlySpan<byte> source) { }
public override void Reset() { }
protected override int GetCurrentHashCore(Span<byte> destination) => throw null;
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}
public class Crc64 : HashBase
{
public Crc64() : base(64) { }
public override void Append(ReadOnlySpan<byte> source) { }
public override void Reset() { }
protected override int GetCurrentHashCore(Span<byte> destination) => throw null;
public static byte[] Hash(byte[] source) => throw null;
public static byte[] Hash(ReadOnlySpan<byte> source) => throw null;
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten) => throw null;
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination) => throw null;
}
} If we /really/ wanted to we could add some "AsInt32" and "AsInt64" variants on the 32 and 64 bit versions. For either proposal. For the inheritance model it's probably worth adding a HashBase32 and HashBase64 extra layer so that those methods ride for free. e.g. namespace TBD
{
public abstract class HashBase64 : HashBase
{
protected HashBase64() : base(64) { }
public long GetCurrentHashAsInt64() => throw null;
public long GetHashAsInt64AndReset() => throw null;
}
public class Crc64 : HashBase64
{
// just as it was before, except an easier ctor.
// it still writes bytes, it can't influence the new methods.
}
} These shapes enable adding bigger hashes, like MurmurHash128, without "some are bytes, some are ints, some are longs...". The naming pattern is a problem for algorithms that end in a number, like FNV1, but I hear FNV1a is almost always better, so If any of the algorithms require knowing the length up front, they could either have special ctors and/or Reset methods, or just buffer from the accumulator. (Or don't do inheritance and only implement the static Hash members) Personally, I like the simplified implementations and uniformity that the inheritance model brings. But I can't come up with a good name for it.
|
|
I agree that if we create a common base class we should address only the common subset of functionality: an algorithm which operates on a byte stream of arbitrary length, a digest length which is known upfront, an instance which has reset capabilities. There should be no need to implement There are some scenarios this doesn't address. This doesn't address a theoretical hash algorithm which operates on WORD or DWORD streams instead of byte streams, or which takes ancillary data alongside the to-be-hashed data. (I'm reminded of AEAD encryption algorithms and how the existing If we say that these are specialty scenarios, then it'd make sense to put this functionality (as appropriate) on the derived types rather than trying to slam it on to the base type. And if devs really need something truly custom, nobody's prevented from making a Contoso.HashAlgorithms power toys package. :) |
Namespace wise, what about these:
I'm a fan of (1) because I'm not sure this will be a mainline API. It's for people processing binary data, which is what the buffers namespace is really about. If we believe that's true then I think |
You can hash strings. You can hash files. I don't think people think of those in terms of buffers to be honest, so that doesn't sit well in my mind. System.IO.Hashing seems better to me, and as much as I'd like System.Hashing I'm not sure it deserves being that high up. |
We can add stringy overloads if needed. (Thought experiment: What happens if / when we introduce Utf8String? Should hashing a String and the equivalent Utf8String produce the same hash code? Now we're adding policy on top of something that's supposed to be a barebones reference hashing API.) Not sure if we need proper File-based overloads. I don't see realistic scenarios where people are running MurmurHash over files directly. |
The one-shots could throw for inappropriately-sized data (and/or have overloads), and the accumulator could buffer
I'm not sure that really makes sense here. AEAD is (reductio ad absurdum) "I'm hashing some stuff" and "I'm encrypting some stuff", and the ancillary data is "oh, but there's a thing I want to hash but not encrypt". If there was something like "ZipperHash" which was defined as working on interleaved segments of data that could decide it fits the shape by accepting the input as alternating data units... but really that hash would probably be the best AEAD-problem-equivalent of "I don't look like this hierarchy, so I'll use similar names but not share a base type"
Well, I left that in with
The good part of the inheritance model is we can solve that for the accumulators. public void Append(string text, Encoding encoding = null) I think we're better off with less-is-more here, though, and leave it off until someone cries for it. This also solves String/Utf8String parity questions.
Yeah, probably do want to do something for streams. namespace TBD
{
partial class NonCryptographicHashAlgorithm
{
public void Append(Stream stream) { }
public Task AppendAsync(Stream stream, CancellationToken cancellationToken = default) { }
}
} and if we also want to pattern-support them on the statics namespace TBD
{
partial class Crc64
{
public static byte[] Hash(Stream stream) => throw null;
public static Task<byte[]> HashAsync(Stream stream, CancellationToken cancellationToken = default) => throw null;
public static int Hash(Stream stream, Span<byte> destination) => throw null;
public static Task<int> HashAsync(Stream stream, Memory<byte> destination, CancellationToken cancellationToken = default) => throw null;
// Do, or do not. There is no Async Try.
}
} |
It's awesome to see this making progress. I generally think a base class or interface would be useful for modularity/optimization. For example, some algorithms are optimized for x64 and would run slowly in x86, so it would be nice to be able to abstract the choice in a program that isn't too picky. A few other random thoughts:
|
some algorithms are optimized for x64 and would run slowly in x86 And some even give different results, which isn't a great situation, for example Murmur in its 128bit form. Optimization is not great idea in that sort of case, I'd rather people ended up being quite specific in their choices. |
GetCurrentHash is a partial clone. We added it on IncrementalHash to support file transfer-type protocols where they'll do things like transmit a hash every 100MB or so, which is the cumulative hash up to that point. While I'd just design the protocol to be the hash of every different chunk, they exist, and people asked for them. I figure if we design it in from the beginning here it's easier than trying to add it later. Arguably a full Clone in the abstraction is more flexible, but we haven't seen requests for "these two files have the same 200MB, then differ at the end, can I seed it with their commonality and then just give the two variants"?
I tried ignoring the crypto types that fed into this when naming this, and looked at it from the array and span perspective: it will produce an amount of data, that data is measured by {arr}.Length or {span}.Length. So instead of HashLengthInBytes, or whatever, how about "Length"? But I expect this'll get a nontrivial amount of discussion in review.
That's a good point. Browsable-never, throw, obsolete, and sealed sounds right to me. |
I like that. Could it be confused with “input buffer length”? |
It'd be better for such a type to derive from the abstract base, not Crc32, no? They produce fundamentally different results. |
It depends on your approach as you can use the same algorithm, with a different constant, so inheritance could be a good solution. The logic is identical otherwise. (here is a library that does so. Obviously, the most optimal version of crc32c would use the sse4 CRC32 intrinsic. |
From an implementation point of view, but not for someone consuming an API named one thing that then does something different. Also, the API as defined doesn't support such a change: it would require additional surface area, at which point we could choose to unseal if actually designing for such scenarios. And as you point out, that's not the ideal implementation, anyway, so.... |
We talked about CRC32c in the meeting. It would be a different class if we added it. We can certainly seal the types to avoid such confusion. |
This specific issue aside, I think this should be our default stance for all new types going forward unless we have explicitly brought to API review a proposal for an extensibility model. |
I 100% agree. I think this should be the default guidance for all of the .NET ecosystem, but I lost that battle ;-) |
API Amendment: Because there was both the property When implementing XxHash32 and XxHash64 I was reminded that they support 32- and 64-bit (respectively) seeds. Rather than limiting the System.IO.Hashing version to the zero seed, I propose overloading the ctor and adding optional parameters for the seed on the static versions. namespace System.IO.Hashing
{
public abstract class NonCryptographicHashAlgorithm
{
- protected abstract int GetCurrentHashCore(Span<byte> destination);
+ protected abstract void GetCurrentHashCore(Span<byte> destination);
- protected virtual int GetHashAndResetCore(Span<byte> destination);
+ protected virtual void GetHashAndResetCore(Span<byte> destination);
}
public sealed class XxHash32 : NonCryptographicHashAlgorithm
{
public XxHash32();
+ public XxHash32(int seed);
- public static byte[] Hash(byte[] source);
+ public static byte[] Hash(byte[] source, int seed=0);
- public static byte[] Hash(ReadOnlySpan<byte> source);
+ public static byte[] Hash(ReadOnlySpan<byte> source, int seed=0);
- public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
+ public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int seed=0);
- public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
+ public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, int seed=0);
}
public sealed class XxHash64 : NonCryptographicHashAlgorithm
{
public XxHash64();
+ public XxHash64(long seed);
- public static byte[] Hash(byte[] source);
+ public static byte[] Hash(byte[] source, long seed = 0);
- public static byte[] Hash(ReadOnlySpan<byte> source);
+ public static byte[] Hash(ReadOnlySpan<byte> source, long seed = 0);
- public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
+ public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, long seed = 0);
- public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
+ public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, long seed = 0);
}
} These changes would result in this as the full, updated proposal: namespace System.IO.Hashing
{
public abstract class NonCryptographicHashAlgorithm
{
public int HashLengthInBytes { get; }
protected NonCryptographicHashAlgorithm(int hashLengthInBytes);
public abstract void Append(ReadOnlySpan<byte> source);
public abstract void Reset();
protected abstract void GetCurrentHashCore(Span<byte> destination);
public void Append(byte[] source);
public void Append(Stream stream);
public Task AppendAsync(Stream stream, CancellationToken cancellationToken = default);
public byte[] GetCurrentHash();
public bool TryGetCurrentHash(Span<byte> destination, out int bytesWritten);
public int GetCurrentHash(Span<byte> destination);
public byte[] GetHashAndReset();
public bool TryGetHashAndReset(Span<byte> destination, out int bytesWritten);
public int GetHashAndReset(Span<byte> destination);
protected virtual void GetHashAndResetCore(Span<byte> destination);
[EditorBrowsable(EditorBrowsableState.Never)]
[Obsolete("Use GetCurrentHash() to retrieve the computed hash code.", true)]
public int GetHashCode();
}
public sealed class XxHash32 : NonCryptographicHashAlgorithm
{
public XxHash32();
public XxHash32(int seed);
public static byte[] Hash(byte[] source, int seed = 0);
public static byte[] Hash(ReadOnlySpan<byte> source, int seed = 0);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int seed = 0);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, int seed = 0);
}
public sealed class XxHash64 : NonCryptographicHashAlgorithm
{
public XxHash64();
public XxHash64(long seed);
public static byte[] Hash(byte[] source, long seed = 0);
public static byte[] Hash(ReadOnlySpan<byte> source, long seed = 0);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, long seed = 0);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, long seed = 0);
}
public sealed class Crc32 : NonCryptographicHashAlgorithm
{
public Crc32();
public static byte[] Hash(byte[] source);
public static byte[] Hash(ReadOnlySpan<byte> source);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
}
public sealed class Crc64 : NonCryptographicHashAlgorithm
{
public Crc64();
public static byte[] Hash(byte[] source);
public static byte[] Hash(ReadOnlySpan<byte> source);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
}
} |
namespace System.IO.Hashing
{
public abstract class NonCryptographicHashAlgorithm
{
public int HashLengthInBytes { get; }
protected NonCryptographicHashAlgorithm(int hashLengthInBytes);
public abstract void Append(ReadOnlySpan<byte> source);
public abstract void Reset();
protected abstract void GetCurrentHashCore(Span<byte> destination);
public void Append(byte[] source);
public void Append(Stream stream);
public Task AppendAsync(Stream stream, CancellationToken cancellationToken = default);
public byte[] GetCurrentHash();
public bool TryGetCurrentHash(Span<byte> destination, out int bytesWritten);
public int GetCurrentHash(Span<byte> destination);
public byte[] GetHashAndReset();
public bool TryGetHashAndReset(Span<byte> destination, out int bytesWritten);
public int GetHashAndReset(Span<byte> destination);
protected virtual void GetHashAndResetCore(Span<byte> destination);
[EditorBrowsable(EditorBrowsableState.Never)]
[Obsolete("Use GetCurrentHash() to retrieve the computed hash code.", true)]
public int GetHashCode();
}
public sealed class XxHash32 : NonCryptographicHashAlgorithm
{
public XxHash32();
public XxHash32(int seed);
public static byte[] Hash(byte[] source);
public static byte[] Hash(byte[] source, int seed = 0);
public static byte[] Hash(ReadOnlySpan<byte> source, int seed = 0);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, int seed = 0);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, int seed = 0);
}
public sealed class XxHash64 : NonCryptographicHashAlgorithm
{
public XxHash64();
public XxHash64(long seed);
public static byte[] Hash(byte[] source);
public static byte[] Hash(byte[] source, long seed);
public static byte[] Hash(ReadOnlySpan<byte> source, long seed = 0);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten, long seed = 0);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination, long seed = 0);
}
public sealed class Crc32 : NonCryptographicHashAlgorithm
{
public Crc32();
public static byte[] Hash(byte[] source);
public static byte[] Hash(ReadOnlySpan<byte> source);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
}
public sealed class Crc64 : NonCryptographicHashAlgorithm
{
public Crc64();
public static byte[] Hash(byte[] source);
public static byte[] Hash(ReadOnlySpan<byte> source);
public static bool TryHash(ReadOnlySpan<byte> source, Span<byte> destination, out int bytesWritten);
public static int Hash(ReadOnlySpan<byte> source, Span<byte> destination);
}
} |
As discussed above, there are several possible choices of CRC polynomial, and the choice matters eg., if you are adhering to a specification. Presumably we will choose one for our Crc32/Crc64 classes here (based on what's fastest and/or most useful) but how will we offer other polynomials now or in the future? Also would it be helpful for the API names to indicate which polynomial it uses (this may relate to the other question) |
"CRC-32", as far as I can tell, is the version used by Ethernet. ( /// <summary>
/// Provides an implementation of the CRC-32 algorithm, as used in
/// ITU-T V.42 and IEEE 802.3.
/// </summary>
/// <remarks>
/// <para>
/// This implementation emits the answer in the Little Endian byte order so that
/// the CRC residue relationship (CRC(message concat CRC(message))) is a fixed value) holds.
/// For CRC-32 this stable output is the byte sequence <c>{ 0x1C, 0xDF, 0x44, 0x21 }</c>,
/// the Little Endian representation of <c>0x2144DF1C</c>.
/// </para>
/// <para>
/// There are multiple, incompatible, definitions of a 32-bit cyclic redundancy
/// check (CRC) algorithm. When interoperating with another system, ensure that you
/// are using the same definition. The definition used by this implementation is not
/// compatible with the cyclic redundancy check described in ITU-T I.363.5.
/// </para>
/// </remarks>
public sealed partial class Crc32 : NonCryptographicHashAlgorithm { ... }
/// <summary>
/// Provides an implementation of the CRC-64 algorithm as described in ECMA-182, Annex B.
/// </summary>
/// <remarks>
/// <para>
/// This implementation emits the answer in the Big Endian byte order so that
/// the CRC residue relationship (CRC(message concat CRC(message))) is a fixed value) holds.
/// For CRC-64 this stable output is the byte sequence
/// <c>{ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }</c>.
/// </para>
/// <para>
/// There are multiple, incompatible, definitions of a 64-bit cyclic redundancy
/// check (CRC) algorithm. When interoperating with another system, ensure that you
/// are using the same definition. The definition used by this implementation is not
/// compatible with the cyclic redundancy check described in ISO 3309.
/// </para>
/// </remarks>
public sealed partial class Crc64 : NonCryptographicHashAlgorithm { ... } |
Ah, you're right, I had overlooked that the other polynomials have different suffixes. |
Context from https://github.com/dotnet/corefx/issues/14354#issuecomment-348003785:
/cc @gimpf, @morganbr, @JonHanna
API Proposal
Append(Stream)
. It's not generally expected that someone would have an API that takes a parameter of the base class type.There are a couple of items that are undecided:
Low-concept version
Full proposal
Optional addendum, static members to hash Streams.
These members should validate that there's sufficient write-space before draining the stream, otherwise data has been lost (for non-seekable streams, at least).
The text was updated successfully, but these errors were encountered: