Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Expose Bit Manipulation functions #27382

Open
3 of 5 tasks
grant-d opened this issue Sep 13, 2018 · 105 comments
Open
3 of 5 tasks

Proposal: Expose Bit Manipulation functions #27382

grant-d opened this issue Sep 13, 2018 · 105 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Numerics
Milestone

Comments

@grant-d
Copy link
Contributor

grant-d commented Sep 13, 2018

Bit manipulation routines are common enough that we should expose a subset as platform primitives.
While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.

The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding

The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev choose to use them).

Scope

Rationale and Usage

The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage.
Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128
(There is likely to be more; the initial search was timeboxed to ~1 hour)

Some of the implementation have suboptimal performance or bugs. Something like ExtractBit is trivial to implement, but PopCount is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.

Here's an example of a function (BTC) whose signature is simple but the algebra is easy to get wrong.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btc = (value & mask) != 0;

    value = ~(~mask ^ value);

    return btc;
}

However making a call to it meets our goal of abstraction and performance:

uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);

Proposed API

The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.

Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]

public static class BitOperations
{
    // BT
    bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
    bool ExtractBit(uint value, int bitOffset);
    bool ExtractBit(int value, int bitOffset);

    // BTS (scalar)
    byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
    uint InsertBit(uint value, int bitOffset);
    int InsertBit(int value, int bitOffset);

    // True BTS (returns original value)
    bool InsertBit(ref byte value, int bitOffset);
    bool InsertBit(ref uint value, int bitOffset);
    bool InsertBit(ref int value, int bitOffset);

    // BTR
    byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
    uint ClearBit(uint value, int bitOffset);
    int ClearBit(int value, int bitOffset);

    bool ClearBit(ref byte value, int bitOffset);
    bool ClearBit(ref uint value, int bitOffset);
    bool ClearBit(ref int value, int bitOffset);

    // BTC
    byte ComplementBit(byte value, int bitOffset);
    uint ComplementBit(uint value, int bitOffset);
    int ComplementBit(int value, int bitOffset);

    bool ComplementBit(ref byte value, int bitOffset);
    bool ComplementBit(ref uint value, int bitOffset);
    bool ComplementBit(ref int value, int bitOffset);

    // on ? BTS : BTR
    byte WriteBit(byte value, int bitOffset, bool on);
    uint WriteBit(uint value, int bitOffset, bool on);
    int WriteBit(int value, int bitOffset, bool on);

    bool WriteBit(ref byte value, int bitOffset, bool on);
    bool WriteBit(ref uint value, int bitOffset, bool on);
    bool WriteBit(ref int value, int bitOffset, bool on);
}

Details

  • The focus will be on performance.
  • Ultimately, most of the code should be branchless and leverage intrinsics where possible.
  • Somewhat of an overlap in functionality between InsertBit/ClearBit and WriteBit where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling in Write to avoid any branching that it's maybe worth keeping both variants.
  • Since these functions are used in performance-sensitive applications, we avoid input checking and exceptions. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-bounds offset will use mod n in some functions or result in a no-op in others) and specific design choices.

Questions

  • What additional sizes & signs of integers do we support, and in which methods? It looks like int, uint and byte are commonly used. Anything else?

Decisions

  • What namespace this should be in. Decision: namespace System.Numerics.
  • The class name of BitOps is self-documenting, terse (it might be specified frequently in a callsite, if not aliased with using) and both terms are well-known names or abbreviations. Decision: BitOperations
  • Decision: We favor well-known names over the alternatives. For example, PopCount could be called CountSetBits but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, eg Popcnt.PopCount.
  • Likewise, method names should also be as concise as possible, for example, TrailingZeroCount may be more concisely described as TrailingZeros. Decision: TrailingZeroCount already chosen by a previous PR.
  • Decision: Count-oriented methods such as PopCount return (idiomatic) int, not uint
  • Should offset or position parameters be int or uint. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.
  • Log(0) is mathematically undefined. Should it return 0 or -1? Decision: Returns 0
  • Do we care about endianness (ie do we need BE and LE variants of relevant methods). The current proposal is only LE, and is shaped in such a way that we are not future-proof wrt supporting BE. Discussion proposes an alternative. For example, LeadingZeros might need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.

Sample call sites

The following samples are taken from the linked units, from the method BitOps_Samples.
The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.

// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));

// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));

// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);

// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3)); 
Assert.Equal(0b0000_0001, dest);

// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);

// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);

Updates

  • Original issue authored by @mburbea here: Proposal: Add a BitManipulation class
  • Initial proposal submitted
  • Methods such as InsertBit accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two, InsertBit and ClearBit.
  • Some name changes based on suggestions (eg FlipBit became ComplementBit).
  • TestAndSet methods have been refactored into simple scalar functions, for performance.
  • The offset parameter is int instead of byte. All POC units pass.
  • Added WriteBit(value, offset, bool on) overloads that conditionally set/clear the specified bit.
  • Added ExtractByte and friends
  • Added Evaluate(bool) (used internally, so may as well expose it)
  • Removed all Span<T> overloads per @tannergooding advice. Will maybe submit in separate proposal
  • Moved TrailingOnes and LeadingOnes into to do later proposal in comments
  • Added Log2
  • Removed redundant overloads
  • Removed doc-comments so spec is easier to read
  • Added byte overloads to all crud methods based on code analysis of existing callsites. eg eg bool ExtractBit(byte value, int bitOffset)
  • Added Scope section at top of spec
  • Separated calls into existing and proposed sections
  • Added task list, worded spec in a more concise manner
  • Moved RotateByte, etc into to do later proposal in comments
@grant-d
Copy link
Contributor Author

grant-d commented Sep 13, 2018

cc @mburbea, @tannergooding , @CarolEidt

@grant-d grant-d changed the title Expose Bit Manipulation functions Proposal: Expose Bit Manipulation functions Sep 13, 2018
@tannergooding
Copy link
Member

Please drop the in modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register).

@tannergooding
Copy link
Member

tannergooding commented Sep 13, 2018

Additional Comments/Thoughts:

  • Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
  • I'm not sure I like the methods taking a bool.
    • InsertBit(value, offset, on) might be better as InsertBit(value, offset) and ResetBit(value, offset) (or ClearBit, ZeroBit, etc).
    • Likewise, LeadingCount and TrailingCount are likely better split into LeadingZeroCount, TrailingZeroCount, LeadingOneCount, and TrailingOneCount
  • I think NegateBit may be more "better" than FlipBit (or ComplementBit)
  • System.BitOps seems appropriate as a namespace (and lets this live next to BitConverter). Especially since several of these may get used in the Runtime/Framework itself
  • I imagine we will want to just provide overloads for the signed types

@grant-d
Copy link
Contributor Author

grant-d commented Sep 13, 2018

I'm not sure I like the methods taking a bool.

I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).

I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit (BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.

I think NegateBit may be more "better" than FlipBit (or ComplementBit)

I will make that change

@omariom
Copy link
Contributor

omariom commented Sep 13, 2018

@grant-d
public static bool InsertBit(ref uint value, byte offset);

Will JIT be able to optimize it properly if the value is an enregistered local?
Like in this code:

int M(uint value)
{
    return InsertBit(in value, 3) ? 5 :6; 
}

@grant-d
Copy link
Contributor Author

grant-d commented Sep 13, 2018

It needs to be a ref, ie:

private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6; 

I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.

@Clockwork-Muse
Copy link
Contributor

Um, ExtractBit takes byte and ushort, but neither of those are present for InsertBit/ClearBit (instead there's int and long)?

@CarolEidt
Copy link
Contributor

These need to be passed by value. Passing things around by ref or in will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit should just return the modified value.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 13, 2018

Sounds good, I have made the change. @tannergooding, let me know if you think the BTS/BTR/BTC variants need to be in the first deliverable.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 13, 2018

Will JIT be able to optimize it properly

It will now - I have changed all ref type methods to be scalar functions; see notes in the spec.

@danmoseley
Copy link
Member

API review probably won't like the contraction BitOps. BitOperations is not so bad or possibly Bits even...

@grant-d
Copy link
Contributor Author

grant-d commented Sep 14, 2018

I like Bits, I have added all of these to the naming suggestions.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 14, 2018

If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.

Done - both variants exist.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 14, 2018

Note to self:

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

@grant-d
Copy link
Contributor Author

grant-d commented Sep 17, 2018

@jhudsoncedaron
Copy link

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 17, 2018

AggressiveInlining should be contractual

The POC already does so. I will add a note to the spec

@tannergooding
Copy link
Member

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining attribute exists.

@GSPP
Copy link

GSPP commented Sep 18, 2018

InsertBit(value, offset, on) is needed because sometimes the on value is dynamic. There should be separate SetBit and ClearBit methods.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 18, 2018

@GSPP can this not be handled from the callsite.
if (on) SetBit(i) else ClearBit(i)
I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason

@jhudsoncedaron
Copy link

@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 18, 2018

InsertBit is already implemented without branching, see POC code.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 18, 2018

The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).

@jhudsoncedaron
Copy link

jhudsoncedaron commented Sep 18, 2018

And I'm saying the dynamic set-clear version can be implemented w/o branching.

ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;

@grant-d
Copy link
Contributor Author

grant-d commented Sep 18, 2018

Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?

@tannergooding
Copy link
Member

@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.

Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF (Bit Scan Forward), BSR (Bit Scan Reverse), BT (Bit Test), BTC (Bit Test and Complement), BTR (Bit Test and Reset), BTS (Bit Test and Set), LZCNT (Count the Number of Leading Zero Bits), and POPCNT (Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.

Also, I don't think you can convert a bool to ulong. You will get error CS0030: Cannot convert type 'bool' to 'ulong'

@jhudsoncedaron
Copy link

I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.

@CarolEidt
Copy link
Contributor

I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 18, 2018

[Edited]

I assumed the gist was pseudocode; in C# we can do one of the following.
Micro benchmarking shows better perf (0.6x) than the idiomatic expression.

return condition ? 37 : 0; // 1.0x (idiomatic)

return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)

This relies on internal knowledge of bool but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1 for false. But since this is ultimately CLI, I guess that's not a concern.

That being said. I started the spec out with the dynamic signature, and followed a suggestion to move to explicit. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?

@jkotas
Copy link
Member

jkotas commented Feb 19, 2019

@grant-d I would scope this issue down to just what we have in CoreLib and that is clearly useful. It will make the API review on this easy and it will allow us to make the most useful APIs public in .NET Core 3.0.

I would split the bit manipulation APIs into a separate issue that will be looked at and discussed separately, on its own schedule.

@grant-d
Copy link
Contributor Author

grant-d commented Feb 19, 2019

That's a great idea Jan. I had already split this into 2 asks in the summary, but I will create two separate issues instead

@grant-d
Copy link
Contributor Author

grant-d commented Feb 19, 2019

@jkotas
Copy link
Member

jkotas commented Feb 19, 2019

Lets get the subset from dotnet/corefx#35419 through first.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@tannergooding tannergooding added area-System.Numerics and removed area-System.Runtime untriaged New issue has not been triaged by the area owner labels Jun 23, 2020
@ghost
Copy link

ghost commented Jun 23, 2020

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

@tannergooding tannergooding modified the milestones: 5.0.0, Future Jun 23, 2020
@tannergooding
Copy link
Member

tannergooding commented Sep 25, 2020

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

It's worth noting that of the variants being exposed, byte doesn't have hardware support. The x86 hardware only accelerates 16, 32, and 64-bit variants. The proposal, as given, isn't exposing short, ushort, long, or ulong (nor sbyte), which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The native intrinsics return unsigned char (effectively bool) and take the first parameter by pointer, so I think taking by ref as a variant makes sense and fits in with how these are expected to be used (which is likely in a if (InsertBit(ref _value, 4)) like scenario). @CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants?

  • If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

@tannergooding
Copy link
Member

Meant to link to the native intrinsics above: https://docs.microsoft.com/en-us/cpp/intrinsics/bittest-bittest64?view=vs-2019

@jkotas
Copy link
Member

jkotas commented Sep 25, 2020

which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The original proposal had methods like PopCount that were already approved and that we had no intrinsic patterns for.

I do not see value in having method for bit tests, etc. I believe that the existing pattern for bit tests like x & (1 << 5) are very concise and easy to read for anybody deals with bits.

The fact that MSVC C++ introduced an intrinsic for bit tests sounds like over-engineering to me. They could have picked that up as pattern match just fine.

@tannergooding
Copy link
Member

They could have picked that up as pattern match just fine.

And I believe they do, but its the same fundamental problem as with RotateLeft and RotateRight, which is that there are a plethora of possible patterns and those may not be obvious, especially to novice/beginner programmers.

On the other hand, a method provides an explicit way to opt into the functionality, know that it will work (there probably isn't some edge case they are forgetting around bit manipulation), and that it will most likely be efficient for the given platform.
I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns: #27382 (comment). That is, they exist because the API is that much more concise, easy to read, and allows logic updates without finding everywhere you wrote the pattern.

It also solves the problem of matching a more complex pattern where you want to simultaneously branch based off the existing value of a bit while simultaneously setting it to a new value.

@jkotas
Copy link
Member

jkotas commented Sep 25, 2020

RotateLeft and RotateRight are fine with me. The pattern to write that using language operators is not concise, and there are multiple ways to write that.

I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns

The most common ones like RotateLeft / RotateRight, PopCount, etc. are addressed already.

The remaining ones are the bit tests: InsertBit, ClearBit, etc. They show a lot less occurences. I have done a few searches for patterns like & (1 <<) or | (1 <<. Based on these searches, I believe that it is about 100x more common for people to use the these direct patterns than to wrap it in the helper method.

@grant-d
Copy link
Contributor Author

grant-d commented Sep 25, 2020

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

Thanks for the ping Tanner, updated the spec's name to BitOperations

@grant-d
Copy link
Contributor Author

grant-d commented Sep 25, 2020

FYI, here's my previous research on duplicate functions for each operation.
A year+ old so may be more (or less) now:
https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128

eg, in ImmutableHashMap:

            [Pure]
            private static uint InsertBit(int position, uint bits)
            {
                Debug.Assert(0 == (bits & (1u << position)));
                return bits | (1u << position);
            }

@jkotas
Copy link
Member

jkotas commented Sep 25, 2020

This is a good example: It asserts and sets the bit. We would not be able to replace it with the proposed method, without losing the debug validation.

@BreyerW
Copy link

BreyerW commented Sep 30, 2020

A question: could be possible to include optimized array/span based versions? Use case is bitmask of arbitrary length backed by array of uints (likely, like BigInteger but this class isnt specially optimized for bitmasking and doesnt expose setting bits by index/position).

If theres no extra gains (like even more vectorization) then forget my question, it will be better to simply implement BitMask on top of current API proposal.

@CarolEidt
Copy link
Contributor

@CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants? If it is, what is the proposed mechanism to handle that _value is mutated and the original value needs to be returned, is it a value tuple like (int newValue, bool originalBitValue)?

IMO the best API for these cases is a value tuple. However, admittedly I haven't yet had the time to get my PR #37928 working, which will enable the best codegen for these cases.

If the ref API is preferred, the JIT can still take one of two avenues to achieve the same result:

  • Implement an internal intrinsic that returns a value tuple, and have the ref version call that.
  • Implement the ref API by effectively creating a value tuple in the JIT. However, this approach requires either creating a new struct type in the JIT, which is likely to be significantly more complicated.

could be possible to include optimized array/span based versions?

I would think that would be doable fully in C#, if it was considered desirable.

@tannergooding
Copy link
Member

tannergooding commented Mar 27, 2021

Lack of bit test and reset is what I was able to root cause some of the last missing perf gains down to in: davepl/Primes#6

Effectively, for a ClearBit function:

      void ClearBit(unsigned int index)
      {
          if (index % 2 == 0)
          {
              printf("You're setting even bits, which is sub-optimal.\n");
              return;
          }
          index = index / 2;
          rawbits[index / 8] &= ~(1 << (index % 8));
      }

vs C#:

private static void ClearBit(ref byte rawbits, uint index)
    {
        if ((index % 2) == 0)
        {
            Console.WriteLine("You are setting even bits, which is sub-optimal");
        }
        else
        {
            index /= 2;
            Unsafe.Add(ref rawbits, (nint)(index / 8)) &= (byte)~(1u << (int)(index % 8));
        }
    }

You get the following codegen for cpp:

this$ = 8
index$ = 16
void prime_sieve::ClearBit(unsigned int) PROC                    ; prime_sieve::ClearBit, COMDAT
        mov     r8d, edx
        test    dl, 1
        jne     SHORT $LN2@ClearBit
        lea     rcx, OFFSET FLAT:`string'
        jmp     printf
$LN2@ClearBit:
        shr     r8d, 1
        mov     edx, r8d
        and     r8d, 7
        shr     rdx, 3
        add     rdx, QWORD PTR [rcx+8]
        movzx   eax, BYTE PTR [rdx]
        btr     eax, r8d
        mov     BYTE PTR [rdx], al
        ret     0

as compared to C#:

    L0000: test dl, 1
    L0003: jne short L0017
    L0005: mov rcx, 0x2489d6296b8
    L000f: mov rcx, [rcx]
    L0012: jmp System.Console.WriteLine(System.String)
    L0017: shr edx, 1
    L0019: mov eax, edx
    L001b: shr eax, 3
    L001e: add rax, rcx
    L0021: mov ecx, edx
    L0023: and ecx, 7
    L0026: mov edx, 1
    L002b: shl edx, cl
    L002d: not edx
    L002f: movzx edx, dl
    L0032: and [rax], dl
    L0034: ret

This results in a much bigger inner loop and in worse overall codegen.

@jkotas
Copy link
Member

jkotas commented Mar 28, 2021

Lack of bit test and reset

This looks more like a missing optimization in the JIT, similar to #6815

@xtqqczze
Copy link
Contributor

xtqqczze commented May 24, 2023

Merged in #83400:

/// <summary>
/// Flip the bit at a specific position in a given value.
/// Similar in behavior to the x86 instruction BTC (Bit Test and Complement).
/// </summary>
/// <param name="value">The value.</param>
/// <param name="index">The zero-based index of the bit to flip.
/// Any value outside the range [0..31] is treated as congruent mod 32.</param>
/// <returns>The new value.</returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal static uint FlipBit(uint value, int index)
{
return value ^ (1u << index);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Numerics
Projects
None yet
Development

No branches or pull requests