-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new kernel 4ic_deinterleave_8i_x2 with generic #399
Conversation
5a1af82
to
0bbdeb0
Compare
Clearly needs formatting. There appears to also be an issue with the input or output types which doesn't surprise me. Any feedback/advice welcome. I'll test for correctness when I can, preferably after sorting out the types issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. I put in a lot of comments. I hope that'll help.
* \b Example | ||
* \code | ||
* int N = 10000; | ||
* | ||
* volk_4ic_deinterleave_8i_x2(); | ||
* | ||
* volk_free(x); | ||
* \endcode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An example with your use case in mind that describes input and output would be great.
Implemented generic and SSE2 protokernels
0bbdeb0
to
5e45b61
Compare
There's some const char * error at runtime. I'm not sure what the cause is. I'll have time to debug on the weekend. I changed the current SSE2 function to unaligned, I'll work on an aligned one with the "instrinsic intrinsic" setup once I have the first two kernels working. |
The problem is that the qa subsystem does not handle sub-byte types: it sees the size of the input type as zero bytes and allocates a zero byte array to store it. I think it might be easiest to rename the function to something like void volk_8i_deinterleave_nibbles_8i_x2( int8_t * lsn, int8_t * msn, const int8_t *in, unsigned int num_samples); Where |
@odrisci thanks for pointing this out! Unfortunately, our system to deduce IO is fragile. |
for (unsigned int i = 0; i < num_points; i++) { | ||
*iBufferPtr++ = (int8_t)(*complexVectorPtr) >> 4; | ||
*qBufferPtr++ = ((int8_t)(*complexVectorPtr++) << 4) >> 4; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if a bitmask 0x0f
or bit shifts should be preferred but intuitively I'd prefer bitmasks because they seem to be more explicit.
The output is a signed type but the current implementation would never return a negative value. Is that intended? I assume no because the SSE impl takes explicit case of the sign bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately bitshifting negative numbers is undefined behaviour (or implementation defined, I can't remember) though most compilers will do what you expect.
The other problem here is that C/C++ only define integer arithmetic over int
types, so smaller types are up-cast. So technically the result of
(int8_t)(*complexVectorPtr++) <<4
is of type int
.
I would rewrite this as either:
for (unsigned int i = 0; i < num_points; i++) {
*iBufferPtr++ = (*complexVectorPtr >> 4);
*qBufferPtr++ = (int8_t)((*complexVectorPtr++) << 4) >> 4;
}
if you are happy with the undefined behaviour, or:
for (unsigned int i = 0; i < num_points; i++) {
*iBufferPtr++ = (*complexVectorPtr) /16;
*qBufferPtr++ = (int8_t)((*complexVectorPtr++) *16) /16;
}
if you are not. The latter might not optimise as well as the former, but is guaranteed to be correct. Note that I've left the last cast to int8_t
as implicit in each case - I'm not sure if this agrees with your coding style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saw the use case @dkozel was working on, and both nibbles would have a sign bit. I wonder if this may be the correct way. That would put the signed bits where they need to be then scale the values down appropriately?
for (unsigned int i = 0; i < num_points; i++) {
*iBufferPtr++ = (*complexVectorPtr & 0xF0)/16;
*qBufferPtr++ = (int8_t)((*complexVectorPtr++) << 4)/16;
}
@jdemel No problem. I don't think its too fragile - just if you want to handle packed bits you will probably need to think about it and plan for that. For example, here we are looking at a datatype We have done some work in the GNSS community on a metadata format for GNSS IF data logs. Most GNSS front-ends use only 1 to 4 bits per sample and pack multiple samples from multiple frequency bands into words of 1, 2 or 4 bytes. Essentially we have seen almost every possible combination of bit and byte ordering (big endian bytes in the word with little endian arrangement of samples within a byte for example!) The metadata standard was recently approved by the Institute of Navigation in the US and can be found here if you want to have a look: https://sdr.ion.org/ Anyway, that's why I recommend using the |
I wasn't able to figure out the best path forward here, and the immediate use has been worked around separately. I'm going to close and will open a new PR if I ever loop back around to this. Thanks all for the feedback and comments. |
Here's the generic implementation of a new 4bit signed integer deinterleave kernel. I'll be adding an SSE2 kernel shortly.
I'm interested in feedback on the code arrangement and implementation style, this is my first VOLK modification.