Proposal: Expose Bit Manipulation functions #27382

grant-d · 2018-09-13T03:21:37Z

Bit manipulation routines are common enough that we should expose a subset as platform primitives.
While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.

The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding

The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev choose to use them).

Scope

Consolidate existing callsites in CoreCLR (Consolidate implementation of Rotate and PopCount coreclr#22584)
Units pass in CoreFX(Units for BitOps.TrailingZeroCount corefx#35193)
Expose System.Numerics.BitOperations as a public ref assembly in CoreFX (https://github.com/dotnet/corefx/issues/35419)
Consolidate existing callsites in CoreFX ([NO MERGE] BitOps analysis CoreFX (WIP) corefx#34917)
Implement proposed methods (this issue)

Rationale and Usage

The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage.
Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128
(There is likely to be more; the initial search was timeboxed to ~1 hour)

Some of the implementation have suboptimal performance or bugs. Something like ExtractBit is trivial to implement, but PopCount is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.

Here's an example of a function (BTC) whose signature is simple but the algebra is easy to get wrong.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btc = (value & mask) != 0;

    value = ~(~mask ^ value);

    return btc;
}

However making a call to it meets our goal of abstraction and performance:

uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);

Proposed API

The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.

Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]

public static class BitOperations
{
    // BT
    bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
    bool ExtractBit(uint value, int bitOffset);
    bool ExtractBit(int value, int bitOffset);

    // BTS (scalar)
    byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
    uint InsertBit(uint value, int bitOffset);
    int InsertBit(int value, int bitOffset);

    // True BTS (returns original value)
    bool InsertBit(ref byte value, int bitOffset);
    bool InsertBit(ref uint value, int bitOffset);
    bool InsertBit(ref int value, int bitOffset);

    // BTR
    byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
    uint ClearBit(uint value, int bitOffset);
    int ClearBit(int value, int bitOffset);

    bool ClearBit(ref byte value, int bitOffset);
    bool ClearBit(ref uint value, int bitOffset);
    bool ClearBit(ref int value, int bitOffset);

    // BTC
    byte ComplementBit(byte value, int bitOffset);
    uint ComplementBit(uint value, int bitOffset);
    int ComplementBit(int value, int bitOffset);

    bool ComplementBit(ref byte value, int bitOffset);
    bool ComplementBit(ref uint value, int bitOffset);
    bool ComplementBit(ref int value, int bitOffset);

    // on ? BTS : BTR
    byte WriteBit(byte value, int bitOffset, bool on);
    uint WriteBit(uint value, int bitOffset, bool on);
    int WriteBit(int value, int bitOffset, bool on);

    bool WriteBit(ref byte value, int bitOffset, bool on);
    bool WriteBit(ref uint value, int bitOffset, bool on);
    bool WriteBit(ref int value, int bitOffset, bool on);
}

Details

The focus will be on performance.
Ultimately, most of the code should be branchless and leverage intrinsics where possible.
Somewhat of an overlap in functionality between InsertBit/ClearBit and WriteBit where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling in Write to avoid any branching that it's maybe worth keeping both variants.
Since these functions are used in performance-sensitive applications, we avoid input checking and exceptions. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-bounds offset will use mod n in some functions or result in a no-op in others) and specific design choices.

Questions

What additional sizes & signs of integers do we support, and in which methods? It looks like int, uint and byte are commonly used. Anything else?

Decisions

What namespace this should be in. Decision: namespace System.Numerics.
The class name of BitOps is self-documenting, terse (it might be specified frequently in a callsite, if not aliased with using) and both terms are well-known names or abbreviations. Decision: BitOperations
Decision: We favor well-known names over the alternatives. For example, PopCount could be called CountSetBits but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, eg Popcnt.PopCount.
Likewise, method names should also be as concise as possible, for example, TrailingZeroCount may be more concisely described as TrailingZeros. Decision: TrailingZeroCount already chosen by a previous PR.
Decision: Count-oriented methods such as PopCount return (idiomatic) int, not uint
Should offset or position parameters be int or uint. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.
Log(0) is mathematically undefined. Should it return 0 or -1? Decision: Returns 0
Do we care about endianness (ie do we need BE and LE variants of relevant methods). The current proposal is only LE, and is shaped in such a way that we are not future-proof wrt supporting BE. Discussion proposes an alternative. For example, LeadingZeros might need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.

Sample call sites

The following samples are taken from the linked units, from the method BitOps_Samples.
The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.

// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));

// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));

// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);

// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3)); 
Assert.Equal(0b0000_0001, dest);

// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);

// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);

Updates

Original issue authored by @mburbea here: Proposal: Add a BitManipulation class
Initial proposal submitted
Methods such as InsertBit accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two, InsertBit and ClearBit.
Some name changes based on suggestions (eg FlipBit became ComplementBit).
TestAndSet methods have been refactored into simple scalar functions, for performance.
The offset parameter is int instead of byte. All POC units pass.
Added WriteBit(value, offset, bool on) overloads that conditionally set/clear the specified bit.
Added ExtractByte and friends
Added Evaluate(bool) (used internally, so may as well expose it)
Removed all Span<T> overloads per @tannergooding advice. Will maybe submit in separate proposal
Moved TrailingOnes and LeadingOnes into to do later proposal in comments
Added Log2
Removed redundant overloads
Removed doc-comments so spec is easier to read
Added byte overloads to all crud methods based on code analysis of existing callsites. eg eg bool ExtractBit(byte value, int bitOffset)
Added Scope section at top of spec
Separated calls into existing and proposed sections
Added task list, worded spec in a more concise manner
Moved RotateByte, etc into to do later proposal in comments

The text was updated successfully, but these errors were encountered:

grant-d · 2018-09-13T04:02:51Z

cc @mburbea, @tannergooding , @CarolEidt

tannergooding · 2018-09-13T04:50:01Z

Please drop the in modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register).

tannergooding · 2018-09-13T05:10:16Z

Additional Comments/Thoughts:

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
I'm not sure I like the methods taking a bool.
- InsertBit(value, offset, on) might be better as InsertBit(value, offset) and ResetBit(value, offset) (or ClearBit, ZeroBit, etc).
- Likewise, LeadingCount and TrailingCount are likely better split into LeadingZeroCount, TrailingZeroCount, LeadingOneCount, and TrailingOneCount
I think NegateBit may be more "better" than FlipBit (or ComplementBit)
System.BitOps seems appropriate as a namespace (and lets this live next to BitConverter). Especially since several of these may get used in the Runtime/Framework itself
I imagine we will want to just provide overloads for the signed types

grant-d · 2018-09-13T05:55:35Z

I'm not sure I like the methods taking a bool.

I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).

I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit (BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.

I think NegateBit may be more "better" than FlipBit (or ComplementBit)

I will make that change

omariom · 2018-09-13T10:40:35Z

@grant-d
public static bool InsertBit(ref uint value, byte offset);

Will JIT be able to optimize it properly if the value is an enregistered local?
Like in this code:

int M(uint value)
{
    return InsertBit(in value, 3) ? 5 :6; 
}

grant-d · 2018-09-13T15:25:14Z

It needs to be a ref, ie:

private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6;

I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.

Clockwork-Muse · 2018-09-13T16:08:40Z

Um, ExtractBit takes byte and ushort, but neither of those are present for InsertBit/ClearBit (instead there's int and long)?

CarolEidt · 2018-09-13T16:56:37Z

These need to be passed by value. Passing things around by ref or in will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit should just return the modified value.

grant-d · 2018-09-13T18:54:20Z

Sounds good, I have made the change. @tannergooding, let me know if you think the BTS/BTR/BTC variants need to be in the first deliverable.

grant-d · 2018-09-13T22:44:16Z

Will JIT be able to optimize it properly

It will now - I have changed all ref type methods to be scalar functions; see notes in the spec.

danmoseley · 2018-09-14T05:17:57Z

API review probably won't like the contraction BitOps. BitOperations is not so bad or possibly Bits even...

grant-d · 2018-09-14T19:11:45Z

I like Bits, I have added all of these to the naming suggestions.

grant-d · 2018-09-14T19:27:58Z

If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.

Done - both variants exist.

grant-d · 2018-09-14T20:56:37Z

Note to self:

Optimization of Span overloads: https://github.com/dotnet/corefx/issues/32184
SSE2 Intrinsics (still experimental)

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

grant-d · 2018-09-17T18:09:07Z

Related: https://github.com/dotnet/corefx/issues/1010

jhudsoncedaron · 2018-09-17T21:40:46Z

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.

grant-d · 2018-09-17T21:46:08Z

AggressiveInlining should be contractual

The POC already does so. I will add a note to the spec

tannergooding · 2018-09-17T21:48:12Z

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining attribute exists.

GSPP · 2018-09-18T12:54:37Z

InsertBit(value, offset, on) is needed because sometimes the on value is dynamic. There should be separate SetBit and ClearBit methods.

grant-d · 2018-09-18T13:50:41Z

@GSPP can this not be handled from the callsite.
if (on) SetBit(i) else ClearBit(i)
I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason

jhudsoncedaron · 2018-09-18T15:03:00Z

@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.

grant-d · 2018-09-18T15:04:48Z

InsertBit is already implemented without branching, see POC code.

grant-d · 2018-09-18T15:07:59Z

The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).

jhudsoncedaron · 2018-09-18T15:10:59Z

And I'm saying the dynamic set-clear version can be implemented w/o branching.

ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;

grant-d · 2018-09-18T15:22:02Z

Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?

tannergooding · 2018-09-18T15:22:44Z

@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.

Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF (Bit Scan Forward), BSR (Bit Scan Reverse), BT (Bit Test), BTC (Bit Test and Complement), BTR (Bit Test and Reset), BTS (Bit Test and Set), LZCNT (Count the Number of Leading Zero Bits), and POPCNT (Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.

Also, I don't think you can convert a bool to ulong. You will get error CS0030: Cannot convert type 'bool' to 'ulong'

jhudsoncedaron · 2018-09-18T15:28:53Z

I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.

CarolEidt · 2018-09-18T15:38:47Z

I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.

grant-d · 2018-09-18T15:57:00Z

[Edited]

I assumed the gist was pseudocode; in C# we can do one of the following.
Micro benchmarking shows better perf (0.6x) than the idiomatic expression.

return condition ? 37 : 0; // 1.0x (idiomatic)

return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)

This relies on internal knowledge of bool but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1 for false. But since this is ultimately CLI, I guess that's not a concern.

That being said. I started the spec out with the dynamic signature, and followed a suggestion to move to explicit. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?

jkotas · 2019-02-19T18:34:07Z

@grant-d I would scope this issue down to just what we have in CoreLib and that is clearly useful. It will make the API review on this easy and it will allow us to make the most useful APIs public in .NET Core 3.0.

I would split the bit manipulation APIs into a separate issue that will be looked at and discussed separately, on its own schedule.

grant-d · 2019-02-19T19:35:24Z

That's a great idea Jan. I had already split this into 2 asks in the summary, but I will create two separate issues instead

grant-d · 2019-02-19T19:41:19Z

Done - see https://github.com/dotnet/corefx/issues/35419

jkotas · 2019-02-19T21:42:31Z

Lets get the subset from dotnet/corefx#35419 through first.

ghost · 2020-06-23T21:48:12Z

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

tannergooding · 2020-09-25T16:31:19Z

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

It's worth noting that of the variants being exposed, byte doesn't have hardware support. The x86 hardware only accelerates 16, 32, and 64-bit variants. The proposal, as given, isn't exposing short, ushort, long, or ulong (nor sbyte), which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The native intrinsics return unsigned char (effectively bool) and take the first parameter by pointer, so I think taking by ref as a variant makes sense and fits in with how these are expected to be used (which is likely in a if (InsertBit(ref _value, 4)) like scenario). @CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants?

If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

tannergooding · 2020-09-25T16:32:00Z

Meant to link to the native intrinsics above: https://docs.microsoft.com/en-us/cpp/intrinsics/bittest-bittest64?view=vs-2019

jkotas · 2020-09-25T17:08:53Z

which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The original proposal had methods like PopCount that were already approved and that we had no intrinsic patterns for.

I do not see value in having method for bit tests, etc. I believe that the existing pattern for bit tests like x & (1 << 5) are very concise and easy to read for anybody deals with bits.

The fact that MSVC C++ introduced an intrinsic for bit tests sounds like over-engineering to me. They could have picked that up as pattern match just fine.

tannergooding · 2020-09-25T17:45:29Z

They could have picked that up as pattern match just fine.

And I believe they do, but its the same fundamental problem as with RotateLeft and RotateRight, which is that there are a plethora of possible patterns and those may not be obvious, especially to novice/beginner programmers.

On the other hand, a method provides an explicit way to opt into the functionality, know that it will work (there probably isn't some edge case they are forgetting around bit manipulation), and that it will most likely be efficient for the given platform.
I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns: #27382 (comment). That is, they exist because the API is that much more concise, easy to read, and allows logic updates without finding everywhere you wrote the pattern.

It also solves the problem of matching a more complex pattern where you want to simultaneously branch based off the existing value of a bit while simultaneously setting it to a new value.

jkotas · 2020-09-25T18:16:37Z

RotateLeft and RotateRight are fine with me. The pattern to write that using language operators is not concise, and there are multiple ways to write that.

I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns

The most common ones like RotateLeft / RotateRight, PopCount, etc. are addressed already.

The remaining ones are the bit tests: InsertBit, ClearBit, etc. They show a lot less occurences. I have done a few searches for patterns like & (1 <<) or | (1 <<. Based on these searches, I believe that it is about 100x more common for people to use the these direct patterns than to wrap it in the helper method.

grant-d · 2020-09-25T20:07:21Z

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

Thanks for the ping Tanner, updated the spec's name to BitOperations

grant-d · 2020-09-25T20:14:32Z

FYI, here's my previous research on duplicate functions for each operation.
A year+ old so may be more (or less) now:
https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128

eg, in ImmutableHashMap:

            [Pure]
            private static uint InsertBit(int position, uint bits)
            {
                Debug.Assert(0 == (bits & (1u << position)));
                return bits | (1u << position);
            }

jkotas · 2020-09-25T20:18:29Z

This is a good example: It asserts and sets the bit. We would not be able to replace it with the proposed method, without losing the debug validation.

BreyerW · 2020-09-30T23:02:32Z

A question: could be possible to include optimized array/span based versions? Use case is bitmask of arbitrary length backed by array of uints (likely, like BigInteger but this class isnt specially optimized for bitmasking and doesnt expose setting bits by index/position).

If theres no extra gains (like even more vectorization) then forget my question, it will be better to simply implement BitMask on top of current API proposal.

CarolEidt · 2020-09-30T23:31:35Z

@CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants? If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

IMO the best API for these cases is a value tuple. However, admittedly I haven't yet had the time to get my PR #37928 working, which will enable the best codegen for these cases.

If the ref API is preferred, the JIT can still take one of two avenues to achieve the same result:

Implement an internal intrinsic that returns a value tuple, and have the ref version call that.
Implement the ref API by effectively creating a value tuple in the JIT. However, this approach requires either creating a new struct type in the JIT, which is likely to be significantly more complicated.

could be possible to include optimized array/span based versions?

I would think that would be doable fully in C#, if it was considered desirable.

tannergooding · 2021-03-27T18:05:21Z

Lack of bit test and reset is what I was able to root cause some of the last missing perf gains down to in: davepl/Primes#6

Effectively, for a ClearBit function:

      void ClearBit(unsigned int index)
      {
          if (index % 2 == 0)
          {
              printf("You're setting even bits, which is sub-optimal.\n");
              return;
          }
          index = index / 2;
          rawbits[index / 8] &= ~(1 << (index % 8));
      }

vs C#:

private static void ClearBit(ref byte rawbits, uint index)
    {
        if ((index % 2) == 0)
        {
            Console.WriteLine("You are setting even bits, which is sub-optimal");
        }
        else
        {
            index /= 2;
            Unsafe.Add(ref rawbits, (nint)(index / 8)) &= (byte)~(1u << (int)(index % 8));
        }
    }

You get the following codegen for cpp:

this$ = 8
index$ = 16
void prime_sieve::ClearBit(unsigned int) PROC                    ; prime_sieve::ClearBit, COMDAT
        mov     r8d, edx
        test    dl, 1
        jne     SHORT $LN2@ClearBit
        lea     rcx, OFFSET FLAT:`string'
        jmp     printf
$LN2@ClearBit:
        shr     r8d, 1
        mov     edx, r8d
        and     r8d, 7
        shr     rdx, 3
        add     rdx, QWORD PTR [rcx+8]
        movzx   eax, BYTE PTR [rdx]
        btr     eax, r8d
        mov     BYTE PTR [rdx], al
        ret     0

as compared to C#:

    L0000: test dl, 1
    L0003: jne short L0017
    L0005: mov rcx, 0x2489d6296b8
    L000f: mov rcx, [rcx]
    L0012: jmp System.Console.WriteLine(System.String)
    L0017: shr edx, 1
    L0019: mov eax, edx
    L001b: shr eax, 3
    L001e: add rax, rcx
    L0021: mov ecx, edx
    L0023: and ecx, 7
    L0026: mov edx, 1
    L002b: shl edx, cl
    L002d: not edx
    L002f: movzx edx, dl
    L0032: and [rax], dl
    L0034: ret

This results in a much bigger inner loop and in worse overall codegen.

jkotas · 2021-03-28T02:35:51Z

Lack of bit test and reset

This looks more like a missing optimization in the JIT, similar to #6815

xtqqczze · 2023-05-24T03:31:19Z

Merged in #83400:

runtime/src/libraries/System.Private.CoreLib/src/System/Numerics/BitOperations.cs

Lines 947 to 959 in c96c15c

    
           /// <summary> 
        
           /// Flip the bit at a specific position in a given value. 
        
           /// Similar in behavior to the x86 instruction BTC (Bit Test and Complement). 
        
           /// </summary> 
        
           /// <param name="value">The value.</param> 
        
           /// <param name="index">The zero-based index of the bit to flip. 
        
           /// Any value outside the range [0..31] is treated as congruent mod 32.</param> 
        
           /// <returns>The new value.</returns> 
        
           [MethodImpl(MethodImplOptions.AggressiveInlining)] 
        
           internal static uint FlipBit(uint value, int index) 
        
           { 
        
               return value ^ (1u << index); 
        
           }

grant-d changed the title ~~Expose Bit Manipulation functions~~ Proposal: Expose Bit Manipulation functions Sep 13, 2018

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020

shaggygi mentioned this issue Apr 11, 2020

Proposal: Bitwise primary operators dotnet/csharplang#3352

Closed

4 tasks

tannergooding added area-System.Numerics and removed area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jun 23, 2020

tannergooding modified the milestones: 5.0.0, Future Jun 23, 2020

tannergooding mentioned this issue Mar 27, 2021

Adding a faithful implementation of the C# solution PlummersSoftwareLLC/Primes#6

Merged

huoyaoyuan mentioned this issue Jul 28, 2022

JIT: Recognize 'bt' bit test idiom #72986

Closed

TIHan mentioned this issue Feb 2, 2023

[JIT] X64 - Recognize patterns to emit bts, btc, btr instructions #81512

Open

Proposal: Expose Bit Manipulation functions #27382

Proposal: Expose Bit Manipulation functions #27382

Comments

grant-d commented Sep 13, 2018 • edited Loading

Scope

Rationale and Usage

Proposed API

Details

Questions

Decisions

Sample call sites

Updates

grant-d commented Sep 13, 2018

tannergooding commented Sep 13, 2018

tannergooding commented Sep 13, 2018 • edited Loading

grant-d commented Sep 13, 2018 • edited Loading

omariom commented Sep 13, 2018 • edited Loading

grant-d commented Sep 13, 2018

Clockwork-Muse commented Sep 13, 2018

CarolEidt commented Sep 13, 2018

grant-d commented Sep 13, 2018 • edited Loading

grant-d commented Sep 13, 2018

danmoseley commented Sep 14, 2018

grant-d commented Sep 14, 2018

grant-d commented Sep 14, 2018

grant-d commented Sep 14, 2018 • edited Loading

grant-d commented Sep 17, 2018

jhudsoncedaron commented Sep 17, 2018

grant-d commented Sep 17, 2018 • edited Loading

tannergooding commented Sep 17, 2018

GSPP commented Sep 18, 2018

grant-d commented Sep 18, 2018 • edited Loading

jhudsoncedaron commented Sep 18, 2018

grant-d commented Sep 18, 2018

grant-d commented Sep 18, 2018

jhudsoncedaron commented Sep 18, 2018 • edited Loading

grant-d commented Sep 18, 2018

tannergooding commented Sep 18, 2018

jhudsoncedaron commented Sep 18, 2018

CarolEidt commented Sep 18, 2018

grant-d commented Sep 18, 2018 • edited Loading

jkotas commented Feb 19, 2019

grant-d commented Feb 19, 2019

grant-d commented Feb 19, 2019

jkotas commented Feb 19, 2019

ghost commented Jun 23, 2020

tannergooding commented Sep 25, 2020 • edited Loading

tannergooding commented Sep 25, 2020

jkotas commented Sep 25, 2020

tannergooding commented Sep 25, 2020

jkotas commented Sep 25, 2020

grant-d commented Sep 25, 2020 • edited Loading

grant-d commented Sep 25, 2020 • edited Loading

jkotas commented Sep 25, 2020

BreyerW commented Sep 30, 2020 • edited Loading

CarolEidt commented Sep 30, 2020

tannergooding commented Mar 27, 2021 • edited Loading

jkotas commented Mar 28, 2021

xtqqczze commented May 24, 2023 • edited Loading

grant-d commented Sep 13, 2018 •

edited

Loading

tannergooding commented Sep 13, 2018 •

edited

Loading

grant-d commented Sep 13, 2018 •

edited

Loading

omariom commented Sep 13, 2018 •

edited

Loading

grant-d commented Sep 13, 2018 •

edited

Loading

grant-d commented Sep 14, 2018 •

edited

Loading

grant-d commented Sep 17, 2018 •

edited

Loading

grant-d commented Sep 18, 2018 •

edited

Loading

jhudsoncedaron commented Sep 18, 2018 •

edited

Loading

grant-d commented Sep 18, 2018 •

edited

Loading

tannergooding commented Sep 25, 2020 •

edited

Loading

grant-d commented Sep 25, 2020 •

edited

Loading

grant-d commented Sep 25, 2020 •

edited

Loading

BreyerW commented Sep 30, 2020 •

edited

Loading

tannergooding commented Mar 27, 2021 •

edited

Loading

xtqqczze commented May 24, 2023 •

edited

Loading