Bit Fields and member order. #1390

sniggyfigbat · 2025-05-01T00:56:58Z

sniggyfigbat
May 1, 2025

A few days ago, I watched this CppCon24 lightning talk from Miodrag Djukic, in which he just lists all the syntactic uses of . and : in C++, and which I found quite unreasonably interesting. In particular, it educated me as to a language feature I'd somehow never come across in 9 years of C++ coding, which is bit fields.

I'm pretty taken by bit fields, as they solve a problem I was vaguely aware existed but didn't have a fix for. I am, of course, also aware that there are good reasons not to use them most of the time, same as packing multiple bools into a char, as it's slower for the CPU to unpack and process than using the default alignments (N.B.). But still, for some data-density-critical purposes that's a worthwhile trade-off (e.g. network packets).

All this got me thinking, how does the cpp2 syntax represent this feature? I had a look through the wiki and the existing discussions, and it doesn't seem like anyone else has covered it.

Mandatory performative humility disclaimer: I am but a humble junior developer who only discovered that bit-fields existed last week. My opinions given here are intended as an exploration of the topic and a starting point for discussion, and there's a good chance I'm wrong about literally everything, etc etc etc.

With that disclaimer given, I shall state what seems to me to be the obvious: bit fields are a great functionality wrapped in a truly heinous syntax. As best I can tell they're inherited from C, and this makes sense because they seem diametrically opposed to (as I interpret them) the design sensibilities of todays EWG. They overload an existing symbol in a way which isn't intuitively similar to previous uses, and add an entire exceptional syntax quirk, just to do one rather niche thing.

So, how would we implement this functionality today? Maybe I've completely misunderstood the philosophy here, but I would presume an attribute:

class A {
    unsigned int m_currentCpp1Syntax : 3;
    unsigned int m_betterCpp1Syntax [[width(3)]];
}

This is already an improvement to my mind, but adding attributes is still more like a language feature than a library feature. An alternative worth considering is that cpp2 could tackle it with the same approach that Herb showed for property back in his 2017 talk.

B : @densely_packed_container type = {
    m_possibleCpp2Solution : densely_packed_member<unsigned int, 3>;
}

@densely_packed_container could also automatically turn all bool members into 1-bit uints, too, potentially.

However, I'm not sure if this is a solid approach. I liked it initially as it feels in tune with how cpp2 expresses things, but upon consideration it may not be viable. As I understand it, metafunctions are fundamentally about translating cpp2 code into more complex cpp2 code, but this actually requires expressing something in cpp1 syntax. Presumably that means it'd need to be built in at the level of cppfront itself, and I don't know either way if that's the right path forward.

Thinking about this language feature also made me start wondering about the overall problem it's addressing. Bit fields are useful as part of the solution for "let's pack this data into the smallest possible memory footprint", but they're not the whole story. The much bigger issue is member variable ordering, and that (as I understand it) declaring members in a logical-for-human-reading manner can be suboptimal. A type definition with members running int, bool, int, bool might make sense contextually to the programmer, but it will cause the compiler to create a type which takes up 16 bytes instead of the optimal 12.

It also seems like a strange semantic overlap to me that what order one declares members sets the default order for both memory layout and member initialisation (which is overridable in the initialiser list, but then some IDEs complain about it), despite those being fundamentally different concepts.

In any case, reordering variables from smallest to largest is a problem metafunctions could fix very easily. Except, here we're at odds with another core pillar of cpp2, which is that of sensible defaults. Repacking types to be laid out without unnecessary padding seems like a no-brainer to me; Sure, the cpp2 library could include a @reorder_members_for_optimal_packing metafunction, but then the guidance would be "put this metafunction on basically everything except in very rare cases".

A better solution, as I see it, is to make optimal-member-variable-reordering a cppfront-level feature, and include a [[explicit_member_order]] attribute to disable that behaviour. That disambiguates layout from initialisation order. The programmer can then declare members in the order which is easiest for humans to decipher, use the constructor's initialiser list to declare non-standard member initialisation order in the rare cases that's needed, and rely on cppfront to handle memory layout.

Anyway, that's as far as my thinking had got on the topic. I'm not at all sure I'm right in my assumptions or reasoning, let alone my conclusions, but I do think there's some potentially valuable questions in this area. I'd be interested to hear if others have a different take.

gregmarr · 2025-05-01T17:14:15Z

gregmarr
May 1, 2025

what order one declares members sets the default order for member initialisation (which is overridable in the initialiser list, but then some IDEs complain about it)

FYI, the IDEs and compilers complain about it precisely because it ISN'T overridable in the constructor initializer list. They are still initialized in the order specified in the class definition, no matter which order you list them in the initializer list, so it may look like you are using one initialized variable in the initialization of another because of the order in the constructor initializer list, but you actually aren't because they're processed in the order in the class definition. It is directly analogous to local variables being constructed in the order they appear in the source code and destructed in the reverse order.

1 reply

sniggyfigbat May 1, 2025
Author

Huh, I had no idea! That... seems like another non-ideal behaviour, where we can't express an intent clearly to the compiler of desiring different layout and init orders.

Upon another day's reflection, I'm wondering if perhaps there are deep CPU-magic speed reasons why initialisation should be done strictly in the same order as layout – I guess it's more cache optimal for large objects?

DyXel · 2025-05-02T10:51:21Z

DyXel
May 2, 2025

I think that bitfields could be implemented in an elegant manner in cppfront/cpp2. Let's take this (a real use case) as an example: IP header's layout in the Linux kernel, also Wikipedia's article about it. Ideally, I would like to write something like this:

ip_header: @bitfield type =
{
	version  : 4  ;
	ihl      : 4  ;
	tos      : 8  ;
	tot_len  : 16 ;
	id       : 16 ;
	flags    : 3  ;
	frag_off : 13 ;
	ttl      : 8  ;
	protocol : 8  ;
	checksum : 16 ;
	src_addr : 32 ;
	dst_addr : 32 ;
	_        : 1  ; // Padding
}

For which the appropriate setters and getters would be generated, including checking (in debug mode / with contracts perhaps), that the values will fit the bitfields, as well as the appropiate array of values which actually represents the data in memory, similar to how @union is implemented.

Today it is not possible to write the @bitfield metafunction as used above because when defining a member of a type, the "type" of that member must be a valid identifier, for which a number is not a valid identifier. A workaround might be to tag each member, but IMO that just adds clutter, e.g.: version : i<4>; or just version : i4;, it does allow you to specify if something should be signed, unsigned, etc., but its not as concise.

2 replies

jcanizales May 5, 2025

Maybe the metafunction can bring into the scope of the type a bunch of i1, i2, i3, ... , u1, u2, u3, ... aliases?

DyXel May 5, 2025

Yeah, you can parse those, since (at least today) what you get is a string/identifier of the type. But again, not as concise (and familiar) as just declaring the number of bits. But actually, being explicit about what the bits represent might be better, who knows 😛

DyXel · 2025-05-02T11:01:36Z

DyXel
May 2, 2025

As for packing and/or ordering members, if cppfront had entire knowledge of how things will be laid out in memory, writing a metafunction that automatically reorders and packs type's members would be possible, but that's not the case today. However I would like to add that just reordering members might not even accomplish anything or might in fact be harmful for performance (always measure!), ideally, there would be a set of metafunctions specifically to design and profile the optimal layout you can have, but you still have to make tradeoffs between ABI stability and performance. Also, I don't see a way to implement the fabled Array of Structs metafunction today, it seems it would also require changing your logic to accommodate for it, as opposed to being transparent to the user.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bit Fields and member order. #1390

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Bit Fields and member order. #1390

Uh oh!

sniggyfigbat May 1, 2025

Replies: 3 comments · 3 replies

Uh oh!

gregmarr May 1, 2025

Uh oh!

Uh oh!

sniggyfigbat May 1, 2025 Author

Uh oh!

Uh oh!

DyXel May 2, 2025

Uh oh!

jcanizales May 5, 2025

Uh oh!

DyXel May 5, 2025

Uh oh!

DyXel May 2, 2025

sniggyfigbat
May 1, 2025

Replies: 3 comments 3 replies

gregmarr
May 1, 2025

sniggyfigbat May 1, 2025
Author

DyXel
May 2, 2025

DyXel
May 2, 2025