Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong order of bits in bit fields #307

Closed
AndreaOrru opened this issue Apr 5, 2017 · 20 comments
Closed

wrong order of bits in bit fields #307

AndreaOrru opened this issue Apr 5, 2017 · 20 comments
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. bug Observed behavior contradicts documented or intended behavior
Milestone

Comments

@AndreaOrru
Copy link
Contributor

AndreaOrru commented Apr 5, 2017

const VGAEntry = packed struct {
    c:  u8,
    fg: u4,
    bg: u4,
};

Expected bit layout as implemented by C compilers (c = char bits, b = bg bits, f = fg bits):
cccccccc bbbbffff

Zig produces this instead:
cccccccc ffffbbbb

Is this expected behavior?

@thejoshwolfe
Copy link
Contributor

here's some relevant discussion: http://stackoverflow.com/questions/19376426/order-of-fields-when-using-a-bit-field-in-c

i didn't cross check the claims there, but i expect they're right that the C spec doesn't specify the order at all. so "as implemented by C compilers" means "the precedent established by popular C compilers", not "as defined by the C spec".

In practice though, that might be irrelevant. If code is already written in C, it might be nice to do simple syntactic updates to translate it to Zig rather than carefully moving bitfields around.

This brings up a serious question: what happens when two different drivers require two different implementations of how to layout sub-byte bits? for example, a {u12,u20} has one byte that's shared between the two fields. does the u12 get the top half or the bottom half of that byte? You can't just swap the order of the fields to resolve this, because the other bytes in the fields would move too. Are all drivers designed to expect the "precedent established by popular C compilers", or do some of them require the opposite decision?

This issue looks exactly analogous to endianness, except at the bit level. are bits specified from lowest to highest or highest to lowest? I would have expected highest to lowest, like big endian, because that's how humans think about it (as shown in your diagram). Does the behavior of popular C compilers actually depend on byte endianness, or is ot always "little endian" for bit order?

Proposal: allow specifying the endianness of a packed struct, and allow specifying the byte order and bit order separately.

@andrewrk
Copy link
Member

andrewrk commented Apr 5, 2017

Is this expected behavior?

Yes. The idea is that you are specifying the exact order of fields in the struct. The order you define in the struct is c, fg, bg, and that matches the bits.

A couple other things to note:

  • If a field greater than 8 bits is byte-aligned, it is represented in memory with the endianness of the host.
  • If the field is not byte-aligned, it is represented in memory in big-endian.

for example, a {u12,u20} has one byte that's shared between the two fields.

Are you talking about a union?

@AndreaOrru
Copy link
Contributor Author

AndreaOrru commented Apr 5, 2017

As long as this is expected and consistent, the current behavior is OK for me.

+1 for customizable endianness at bit and byte level though.

@andrewrk andrewrk changed the title Potentially wrong order of nibbles in packed structs proposal: ability to set endianness at byte and bit level of bitfields Apr 6, 2017
@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Apr 6, 2017
@andrewrk
Copy link
Member

Turns out in addition to byte-level endianness systems also have bit-level endianness. Current implementation of bit fields assumes big endian bits. Instead, we should have the same bit endianness as the byte endianness of the host. Then byte-alignment no longer makes a difference, and everything is as expected.

@andrewrk andrewrk changed the title proposal: ability to set endianness at byte and bit level of bitfields wrong order of bits in bit fields Apr 11, 2017
@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior and removed enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Apr 11, 2017
@andrewrk andrewrk added this to the 0.1.0 milestone Apr 11, 2017
@AndreaOrru
Copy link
Contributor Author

AndreaOrru commented Apr 13, 2017

Here is a test case for x86(-64):

const io = @import("std").io;

const u4 = @intType(false, 4);

const Bitfields = packed struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
};

pub fn main() -> %void {
    var all: u64 = 0x7765443322221111;
    var bitfields = *@ptrcast(&Bitfields, &all);

    %%io.stdout.printf("f1: {X}\n", u64(bitfields.f1));
    %%io.stdout.printf("f2: {X}\n", u64(bitfields.f2));
    %%io.stdout.printf("f3: {X}\n", u64(bitfields.f3));
    %%io.stdout.printf("f4: {X}\n", u64(bitfields.f4));
    %%io.stdout.printf("f5: {X}\n", u64(bitfields.f5));
    %%io.stdout.printf("f6: {X}\n", u64(bitfields.f6));
    %%io.stdout.printf("f7: {X}\n", u64(bitfields.f7));
}

Output:

f1: 1111
f2: 2222
f3: 33
f4: 44
f5: 6     // Wrong
f6: 5     // Wrong
f7: 77

@thejoshwolfe
Copy link
Contributor

we should have the same bit endianness as the byte endianness of the host.

wikipedia disagrees. it looks any of the 4 combinations are possible, except maybe little byte big bit. I think we need two separate endiannesses.

the proposal for configuring the endianness(es) of a packed struct can be a separate issue. we still need to get the default native behavior correct regardless.

@andrewrk andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label May 2, 2017
@andrewrk
Copy link
Member

The whole point of bit fields is to have a well defined in-memory layout. So we're going to have a way to specify byte endianness as well as bit-endianness. Here's the proposed syntax:

const Endian = @import("builtin").Endian;
const Bitfields = packed(Endian.Little, Endian.Big) struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

So the packed keyword takes 2 arguments which is the byte endianness and the bit endianness, respectively.

You can leave off the bit endianness if it doesn't matter, e.g. if you don't have any sub-8-bit fields. You get a compile error if bit endianness is unspecified and it does matter.

You can leave off both the byte endianness and the bit endianness if both don't matter, e.g. if you don't have any sub-8-bit fields and you also don't have any fields greater than 1 byte. You get a compile error if the byte endianness is unspecified and it does matter.

Now consider 2 cases for bit fields:

  • Serialization / deserialization, e.g. with a network stream or a file on disk
  • OS dev, where bit/byte endianness depends on the architecture

In the first case you would hard code the parameters to the packed keyword, probably Endian.Big for both.

In the second case you would do something like this:

const builtin = @import("builtin");
const Bitfields = packed(builtin.byte_endian, builtin.bit_endian) struct {
    // ...
}

Pointers to fields with foreign endianness are not normal pointers, and cannot be used where normal pointers are expected.

@AndreaOrru
Copy link
Contributor Author

packed(builtin.byte_endian, builtin.bit_endian) wouldn't this be the default?

@andrewrk
Copy link
Member

andrewrk commented May 20, 2017

No, because this leads to accidentally non-portable code. If the endianness matters then you have to specify.

I can see this being annoyingly verbose in OS Dev, however, so here's another syntax proposal. This would be included in the builtin import:

const module = this;
pub const Struct = enum {
    Normal,
    Packed: Endian,

    pub const Endian = struct {
        byte: module.Endian,
        bit: module.Endian,
    };
};
pub const Endian = enum {
    Big,
    Little,
};
pub const byte_endian = Endian.Little; // set by target
pub const bit_endian = Endian.Big; // set by target
pub const NativePacked = Struct.Packed { Struct.Endian { .byte = byte_endian, .bit = bit_endian, } };

And then just like the proposal for setting the tag type of enums (See #305), you could pass a Struct argument to struct, like this:

const NativePacked = @import("builtin").NativePacked;
const Bitfields = struct(NativePacked) {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

And then get rid of the packed keyword. Maybe this is a little more palatable.

@AndreaOrru
Copy link
Contributor Author

I like that.

@andrewrk
Copy link
Member

andrewrk commented Aug 27, 2017

It's not clear to me how to proceed on this yet. How do we know what is the bit endianness of a system? Does this concept even exist at the machine level?

I think we can find the answers to these questions but it's tricky. I'm moving it to 0.2.0.

Reminder to self:

  • update docs about structs to include bit fields
  • update docs about pointers to talk about non-byte-aligned pointers

@andrewrk
Copy link
Member

I think what we want here is just the ability to specify endianness of packed structs. And the default is the same endianness as the byte endianness, instead of big endian.

In short, non-byte-aligned fields are part of a larger byte-aligned field. Currently we always use big endian to put the sub-byte fields into the larger field. The proposal is to consider little or big endian when shifting bits into the larger field. I believe this would solve the problem as originally presented.

struct { }
struct(builtin.PackedLittleEndian) { }
struct(builtin.PackedBigEndian) { }
struct(builtin.PackedNativeEndian) { }

Given this, it's clear to me how to proceed.

@andrewrk andrewrk modified the milestones: 0.1.0, 0.2.0 Aug 28, 2017
@kyle-github
Copy link

Some protocols have fields that are split in two parts. Those parts are NOT contiguous in the protocol if it is viewed as a bitstream. Some protocols (and CPUs apparently) do not have big endian or little endian data. It is mixed depending on the size of the entity. Look at things like hardware page table entries for some "interesting" examples. ModBus-based PLCs are probably the worst.

As I found out with C, packed, byte order and bit order are all separate concepts but not entirely orthogonal from each other.

Using Zig's nifty(tm) compile time features, I think you can implement something very close to Perl's pack/unpack and get semi-automated marshalling/unmarshalling.

Reading through this and the questions I posted on the forum have made me rethink whether a struct is the right abstraction for this. There is so much platform/compiler dependent behavior here...

If you want to make a syntactical representation of a foreign bit/byte format in a Zig struct, is it really a struct? Perhaps this is a different entity (new) since the layout is so important to the meaning of the data?

@kyle-github
Copy link

Note that if you allow fields to be reordered by the compiler, you will not be able to support some kinds of polymorphism where you cast pointers to a base type in form of OO. This is somewhat common in C code. See issue #130.

I suspect that the compile time generic functions will take care of many of those cases, but not all.

@andrewrk
Copy link
Member

This issue is about packed structs. The layout of packed structs is completely defined by the programmer. Field re-ordering cannot happen for packed or extern structs.

@kyle-github
Copy link

As I mentioned in #488, packing is potentially orthogonal from reordering (as concepts).

@andrewrk andrewrk removed this from the 0.1.0 milestone Sep 18, 2017
@andrewrk
Copy link
Member

cc @skyfex this is fixed now.

@andrewrk
Copy link
Member

Here's an explanation for why this fix works and why we don't have to care about parent integer size:

consider:

a: u4
b: u4
c: u4
d: u4

parent integer could be 2 u8's or a u16

parent integer = u8
 * Little Endian

parent[0] = ba
parent[1] = dc

 * Big Endian

parent[0] = ab
parent[1] = cd

parent integer = u16
 * Little Endian

parent[0] = dcba

 * Big Endian

parent[0] = abcd


On LE for u8s you look at parent[1] then parent[0].
For u16 there is only parent[0]. so:
 u8: dcba
u16: dcba

On BE for u8s you look at parent[0] then parent[1].
For u16 there is only parent[0]. so:
 u8: abcd
u16: abcd

@alinebee
Copy link

alinebee commented Nov 25, 2020

Sorry for the issue necromancy, but wanted to note that sub-byte fields in packed structs are not easily portable under the current behaviour. Consider the following code to parse a 16-bit big-endian integer into 4-bit nibbles:

const std = @import("std");

test "endianness and packed structs" {
    const Nibbles = packed struct {
        nibble1: u4,
        nibble2: u4,
        nibble3: u4,
        nibble4: u4,
    };
    
    const big_endian: u16 = 0x1234;

    // ⚠️ This will be 0x3412 on little-endian, not 0x4321
    const native_endian: u16 = std.mem.bigToNative(u16, big_endian);
    
    const nibbles = @bitCast(Nibbles, native_endian);
    
    // ⚠️ These tests will succeed on big-endian but fail on little-endian,
    // because nibbles will be .{ 0x2, 0x1, 0x4, 0x3 }
    std.testing.expectEqual(@as(u4, 0x1), nibbles.nibble1);
    std.testing.expectEqual(@as(u4, 0x2), nibbles.nibble2);
    std.testing.expectEqual(@as(u4, 0x3), nibbles.nibble3);
    std.testing.expectEqual(@as(u4, 0x4), nibbles.nibble4);
}

This out-of-the-box behaviour feels pretty subtle and surprising, especially since it doesn't play nice with the behaviour of @byteSwap and std.mem.bigToNative/littleToNative.

As a workaround, one could conditionally define different versions of the struct with different field orders depending on endianness, but doing so is a lot of ceremony - especially if that struct namespaces any functions or globals.

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 25, 2020

This is one area where packed structs are currently not well defined. I don't think #3133 actually addresses this problem. #5049 is one attempt to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

6 participants