Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: sub-byte-level editing #155

Closed
scgtrp opened this issue Mar 29, 2022 · 6 comments
Closed

Feature request: sub-byte-level editing #155

scgtrp opened this issue Mar 29, 2022 · 6 comments
Labels
feature A new feature
Milestone

Comments

@scgtrp
Copy link
Contributor

scgtrp commented Mar 29, 2022

Some file formats are specified in terms of non-byte-aligned bitfields instead of bytes. Sometimes they even cross byte boundaries. It would be neat if one could poke at the individual fields in these files.

Examples include instruction encodings (x86 has a lot of 3-bit fields), compression formats (gzip uses 3-bit headers on blocks and it gets weirder from there), and some image formats (webp starts off with reasonable byte-aligned headers and then suddenly 14-bit image width/height fields).

I have no idea what the UI for this would look like. I think the only reasonable approach is to default to normal bytes-displayed-in-hex view, but then allow the user to break up and reassemble the bits as needed (either manually or script-assisted).

@scgtrp
Copy link
Contributor Author

scgtrp commented Mar 29, 2022

also I just saw #152 which seems like a similar goal accomplished slightly differently?

@solemnwarning solemnwarning added the feature A new feature label Mar 29, 2022
@solemnwarning
Copy link
Owner

So, I was only really thinking about transforming the underlying bytes in #152 (e.g. inverting all bits or something).

This would probably be a good fit for the data type mechanism and custom regions, although I have no idea how the UI for that would look considering the values aren't byte aligned/contained.

@solemnwarning
Copy link
Owner

Alternatively could add something like the values tool panel, with options to mask/shift/etc value in/out of byte(s)

@solemnwarning
Copy link
Owner

solemnwarning commented Aug 5, 2023

Okay, my plan here is to add not-whole-byte sized types and allow setting types/comments/etc on bit boundaries rather than bytes.

Probably incomplete list of things to do:

  • Implement off_t replacement which stores offset with bit precision
  • Change cursor position and selection to bit precision
  • Use bit precision for DocumentCtrl region boundaries
  • Use bit precision for metadata types (ByteRangeSet, ByteRangeMap, etc)
  • Update Lua APIs to use byte+bit in place of byte offsets/lengths
  • Add optional binary view alongside hex/ascii to enable selecting bit-level offsets.
  • Update DocumentCtrl region cursor handling APIs to use bit precision
  • Support displaying sub-byte remainders in basic data region where bitfields don't align to byte boundaries
  • Add bitfields to template language

@arizvisa
Copy link

If you do end up taking it this far and eventually getting your templating engine to support fields/structures with conditional dependencies on certain values of bits, keep bit/byte ordering in mind.

Although you don't have to be concerned about bit-sequential mediums due to having contents of the whole file/stream, sub-byte editing isn't as straight-forward as adding support for sub-byte offsets and decoding due to the combination of conditional fields, LSB vs. MSB-first decoding, and then integer endianness (which can be unaligned depending on the bits that precede it).

Since you're doing memory editing as well (which is pretty awesome), you'll have to keep in mind that some fields are aligned (based on their address). This makes interspersing conditional decoding of sub-byte fields just a little more clumsier, and could require you to distinctly separate them from templates that require fields to be aligned to the architecture's word size.

Deflate is notorious for not addressing the order of bits in the RFC which typically results in LSB (iirc). Whereas Microsoft protocols are almost-always MSB and Little-endian. Some file formats with sub-byte fields (and are also known to be a real pita) are the h264/h265 codecs due to things like exp-golomb encoded integers and its conditional binary fields which require big-endian decoding. AS3 (actionscript) was an older format with similarly encoded fields, but a much smaller format.

@solemnwarning solemnwarning added this to the 0.62.0 milestone Mar 9, 2024
@solemnwarning
Copy link
Owner

Custom integer types underway!

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature
Projects
None yet
Development

No branches or pull requests

3 participants