Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opj_malloc replacement #625

Closed
malaterre opened this issue Oct 9, 2015 · 6 comments
Closed

opj_malloc replacement #625

malaterre opened this issue Oct 9, 2015 · 6 comments

Comments

@malaterre
Copy link
Collaborator

Some section of the code rely on specific alignement using sentinel __SSE2__:

#ifdef __SSE2__
void opj_mct_encode(
    OPJ_INT32* restrict c0,
    OPJ_INT32* restrict c1,
    OPJ_INT32* restrict c2,
    OPJ_UINT32 n)
{
  OPJ_SIZE_T i;
  const OPJ_SIZE_T len = n;

  for(i = 0; i < (len & ~3U); i += 4) {
    __m128i y, u, v;
    __m128i r = _mm_load_si128((const __m128i *)&(c0[i]));

But opj_malloc.h header does not have logic to cope with this (based on totally different logic). I suggest:

  1. Move most of the implementation code to c code
  2. Provide a dummy implementation for 16 bytes alignement. See for example: http://stackoverflow.com/questions/227897/how-to-allocate-aligned-memory-only-using-the-standard-library
  3. Make sure that opj_mct_encode_sse_version is only called when buffer is 16 bytes aligned.
  4. Provide an opj_aligned_realloc !
@malaterre
Copy link
Collaborator Author

@detonin I am less convinced I need to implement (3). Technically the code in the branch now garantee that memory is 16 bytes aligned in all cases, instead of hoping the code is executed on x86_64. So I would like people to comment on my proposal now.

@mayeut
Copy link
Collaborator

mayeut commented Oct 10, 2015

@malaterre,

calloc implementation shall probably be updated too.
FYI, don't know if you've seen this but travis is failing for debug builds.

@malaterre
Copy link
Collaborator Author

@mayeut opj_calloc is there... what do you mean ? I've fix my typo in the assert() sorry for the mess.

@mayeut
Copy link
Collaborator

mayeut commented Oct 11, 2015

@malaterre I'll comment in your commit, it'll be easier I think.

@mayeut
Copy link
Collaborator

mayeut commented Oct 11, 2015

My bad for opj_calloc, I misread the code.

@mayeut
Copy link
Collaborator

mayeut commented Oct 16, 2015

Seems an aligned realloc is not needed after all. I kept it nevertheless.
Compilation with -m32 -march=i686 -mfpmath=sse -msse2 now completes the test suite. With much closer results to x86_64 than without -mfpmath=sse -msse2, as one would expect. Still #135 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants