Added peek, read, write, and append to std.bitmanip. #621

jmdavis · 2012-06-06T09:26:17Z

These functions make it easier to manipulate buffers of ubytes which are
meant to be parsed or written to as sequences of larger integral types.
They assume that the buffers hold their data in big endian.

I'm constantly needing functions like these when manipulating buffers of bytes, so I thought that they would be valuable additions to Phobos.

These functions make it easier to manipulate buffers of ubytes which are meant to be parsed or written to as sequences of larger integral types. They assume that the buffers hold their data in big endian.

DmitryOlshansky · 2012-06-06T12:06:37Z

While it's cool that big endian happens to be byte order of TCP netstack, I would like to see little endian byte order for the same interface too (it should be trivial as it's the same as that of machine for x86).

jmdavis · 2012-06-06T15:20:35Z

I suppose that it could be added as another template argument, but I have never seen anything which specifically used little endian for anything which needed to operate on buffers of bytes like this. All such formats end up operating in big endian order, and I would have considered it bad practice for them to do anything else.

DmitryOlshansky · 2012-06-06T16:25:05Z

Well going solely by I have never seen is rarely a good idea.
For one thing I never seen binray files that whose fileds are not little endian,
because back in the days of C I loaded whole file in RAM then casted buffer to plain struct pointer.
Recall this oldish C idiom:

struct MyFile
{
int magic, ver;
int length, etc;
SOMETHING data[0];
}
where SOMETHING is some POD struct of ints or floats.

jmdavis · 2012-06-07T03:44:09Z

Every binary format that I have ever dealt with has used big endian exclusively (or avoided entirely via other encoding schemes - e.g. H.264 uses exp-golomb encoding to do that), but I've also never dealt with a binary format where you wrote arbitrary types to it like you seem to be talking about. In every case, it's been sequences of bits or bytes which are read in as integers or chars of varying sizes.

In any case, I've now added the ability to indicate the endianness of the bytes in the range. It defaults to big endian.

alexrp · 2012-06-07T03:47:30Z

For instance, the Common Intermediate Language format (which is what the Common Language Runtime uses to store program IL in) is almost exclusively little endian, so I think that support is important.

jmdavis · 2012-06-07T03:54:11Z

I tend to think that little endian should be treated like leprosy, but as D is a systems language, and the most prevalent CPU is unfortunately little endian, we don't have much choice in the general case. I just didn't see much point with these functions. But I've added it.

alexrp · 2012-06-07T03:59:25Z

Reviewed, and the API seems good and workable-with to me.

dnadlinger · 2012-06-07T07:32:34Z

@jmdavis: I can confirm that little-endian formats are not as uncommon as you seem to imply – the first examples which come to my mind are Thrift (e.g. the Compact protocol) and Protocol Buffers.

The API seems to be okay – the use of pointers is justifiable here, I suppose. You might want to add at least an assertion checking for the pointer arguments to be non-null, though.

dnadlinger · 2012-06-07T07:36:07Z

std/bitmanip.d

Might want to add a void initializer here for performance reasons – peek and get are likely to be called in performance-sensitive I/O code. In theory, the optimizer could do this anyway, but I'm not sure if it is actually likely to happen with the ref foreach below.

jmdavis · 2012-06-07T07:43:26Z

Pointers are needed as long as they're overloaded functions, since there's no way to indicate that you want the ref overload at the call point. And it seems silly to me to create a new function name just to avoid using pointers. It's perfectly @safe too, since pointers are just as @safe as references are. It's stuff like pointer arithmetic which isn't.

dnadlinger · 2012-06-07T08:07:50Z

Oh, darn it, with the void initializer in place, you'd have to add @trusted, but you can't because you don't know whether the range is @safe…

jmdavis · 2012-06-07T08:19:44Z

Hmmm. It would be a poor range indeed which couldn't be trusted to have front or popFront called on it. I could add a template constraint to check for it and even overload the function for that case, but it seems like overkill to me. So, the choices would be

Mark it as @trusted and assume that front and popFront are safe to call (which would be fairly shocking if it weren't true).
Remove the = void.

#2 would be safe but less efficient, whereas #1 could be less safe but is probably just fine. So, I don't know. Another thing to consider is the fact that it's only an issue when the range isn't slliceable, and there's a good chance that the code which really cares about efficiency would be using arrays anyway.

alexrp · 2012-06-07T08:23:29Z

Really, just go with (2). I'd be surprised if this actually matters in practice. There'd probably be a measurable difference, but not one that's worth the code explosion.

jmdavis · 2012-06-07T08:28:53Z

Okay. The last commit has now been rebased so that it's just the added check on the indices.

Added peek, read, write, and append to std.bitmanip.

jmdavis · 2012-06-08T08:35:01Z

Since multiple people seem to be fine with this, merging...

dnadlinger · 2012-06-08T08:40:28Z

@blackwhale: Seems like we really need PApply for templates in Phobos… ;)

DmitryOlshansky · 2012-06-08T08:45:21Z

@klickverbot std.meta to the rescue! Oh wait... where is it? :)

Added peek, read, write, and append to std.bitmanip.

5e24378

These functions make it easier to manipulate buffers of ubytes which are meant to be parsed or written to as sequences of larger integral types. They assume that the buffers hold their data in big endian.

Made it possible to give endianness to peek, read, write, and append.

b36b646

dnadlinger reviewed Jun 7, 2012
View reviewed changes

Added checks to verify indices.

9cf1eea

jmdavis added a commit that referenced this pull request Jun 8, 2012

Merge pull request #621 from jmdavis/bitmanip

ae5e758

Added peek, read, write, and append to std.bitmanip.

jmdavis merged commit ae5e758 into dlang:master Jun 8, 2012

Uh oh!

Added peek, read, write, and append to std.bitmanip. #621

Added peek, read, write, and append to std.bitmanip. #621

Uh oh!

Conversation

jmdavis commented Jun 6, 2012

Uh oh!

DmitryOlshansky commented Jun 6, 2012

Uh oh!

jmdavis commented Jun 6, 2012

Uh oh!

DmitryOlshansky commented Jun 6, 2012

Uh oh!

jmdavis commented Jun 7, 2012

Uh oh!

alexrp commented Jun 7, 2012

Uh oh!

jmdavis commented Jun 7, 2012

Uh oh!

alexrp commented Jun 7, 2012

Uh oh!

dnadlinger commented Jun 7, 2012

Uh oh!

dnadlinger Jun 7, 2012

Choose a reason for hiding this comment

Uh oh!

jmdavis commented Jun 7, 2012

Uh oh!

dnadlinger commented Jun 7, 2012

Uh oh!

jmdavis commented Jun 7, 2012

Uh oh!

alexrp commented Jun 7, 2012

Uh oh!

jmdavis commented Jun 7, 2012

Uh oh!

jmdavis commented Jun 8, 2012

Uh oh!

dnadlinger commented Jun 8, 2012

Uh oh!

DmitryOlshansky commented Jun 8, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants