opj_compress fails to compress lossless on gcc/x86 (-m32) #571

malaterre · 2015-09-01T12:13:41Z

I am having an issue with OPJ 1.5 and the following file:

http://gdcm.sourceforge.net/thingies/lossless_issue.pgm

It looks as if I cannot do a round-trip lossless compression / decompression, within a 32bits schroot. Everything is fine within a 64bits schroot system (debian jessie if that matter).

malaterre · 2015-09-01T12:14:58Z

Steps from debian/jessie i386:

$ image_to_j2k -i lossless_issue.pgm -o lossless_issue.j2k
$ j2k_to_image -i lossless_issue.j2k -o lossless_issue.opj.pgm
$ vim -b lossless_issue.opj.pgm
-> remove comment
$ vbindiff lossless_issue.pgm lossless_issue.opj.pgm

malaterre · 2015-09-01T12:26:04Z

I can reproduce the exact same behavior using openjpeg 2.1.0 (as packaged in debian).

malaterre · 2015-09-08T07:29:53Z

It seems the compression step is not working as expected:

$ crc32 *
49d5c244    lossless_issue.pgm
c611132c    lossless_issue.sid32.j2k
a913d7c3    lossless_issue.sid32.kdu.pgm
a913d7c3    lossless_issue.sid32.pgm
9922c6b1    lossless_issue.sid.j2k
49d5c244    lossless_issue.sid.kdu.pgm
49d5c244    lossless_issue.sid.pgm

malaterre · 2015-09-08T07:33:36Z

The good news is that if I run:

$ valgrind  opj_compress -i lossless_issue.pgm -o lossless_issue.sid32.j2k

Everything works as expected. So valgrind is doing some kind of initialisation (rounding?) step that is missing in normal code executition.

malaterre · 2015-09-08T08:59:22Z

If I use now clang-3.5 from my sid 32bits chroot everything works nicely. So the issue is not within the libc and such, but rather in the code generated by gcc.

malaterre · 2015-09-08T08:59:39Z

gcc-4.8, gcc-4.9 and gcc-5.2 all have the same symptoms

malaterre · 2015-09-08T10:21:40Z

If I recompile openjpeg with -fsanitize=undefined I get:

/home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/t1.c:1517:28:
runtime error: left shift of negative value -128

ref:

https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/t1.c#L1517

malaterre · 2015-09-08T11:57:40Z

EVen if I fix the code using:

tiledp[tileIndex] *= (1 << T1_NMSEDEC_FRACBITS);

it still fails

malaterre · 2015-09-08T11:58:14Z

Now if I compile the whole code using gcc and then do:

(cd /home/mathieu/tmp/opj-bug/openjpeg/bin/src/lib/openjp2 && /usr/bin/clang  -Dopenjp2_EXPORTS -O0   -g -fPIC -I/home/mathieu/tmp/opj-bug/openjpeg/bin/src/lib/openjp2    -o CMakeFiles/openjp2.dir/tcd.c.o   -c /home/mathieu/tmp/opj-bug/openjpeg/src/lib/openjp2/tcd.c)

the code works as expected

malaterre · 2015-09-08T12:55:07Z

It appears that the issue is within this function opj_tcd_makelayer:

https://github.com/uclouvain/openjpeg/blob/master/src/lib/openjp2/tcd.c#L218

malaterre · 2015-09-08T13:19:15Z

To fix the symptoms one could use excess precision using: -msse2 -mfpmath=sse

malaterre · 2015-09-08T13:22:48Z

It is still not clear why such minor change leads to a lossy compressed stream, this must only be the tip of the iceberg.

malaterre · 2015-09-08T20:07:00Z

So the patch is simply, replace:

if (dd / dr >= thresh)

with:

OPJ_FLOAT64 div;
div = dd / dr;
if (div >= thresh)

Then compile the code with -ffloat-store to remove excess precision.

malaterre · 2015-09-09T07:01:19Z

The gcc documentation states that -fexcess-precision=standard should be enough but not in this case. We are still required to use -ffloat-store which feels like a hack.

malaterre · 2015-09-09T09:12:34Z

Confirmed by upstream gcc dev here. We really need to understand why the code is so sensible to excess precision.

mayeut · 2015-09-09T21:17:09Z

@malaterre,

I had a look at the piece of code that's mentioned here.
In this specific case, for one of the computed value (dd/dr "==" threshold), the comparison is true with -ffloat-store & false without it.

This can be shown easily with this piece of code (with a bit of context):

  if (!dr) {
    if (dd != 0)
      n = passno + 1;
    continue;
  }
  if ((dd / dr) < 0.45) {
    printf("%.32f - %.32f\n", (dd / dr), thresh);
  }

  if ((dd / dr) >= thresh) {
    n = passno + 1;
    if ((dd / dr) < 0.45) {
      printf("True compare\n");
    }
  }
}

layer->numpasses = n - cblk->numpassesinlayers;

We should not rely on "==" comparison for floats.

Update uclouvain#571

mayeut · 2015-09-09T22:43:51Z

I'll add this specific image to the test suite if needed

malaterre · 2015-09-10T07:15:44Z

Just in case my series of comments were not clear:

Using -ffloat-store is not the correct solution. I'd rather see -fexcess-precision=standard (implied using -std=c99)
-fexcess-precision=standard only have effect on cast and assigment so code need to be change (either store the result in a variable or add an explicit cast with a comment)
Get rid of -ffast-math, this option is just retar*d. So I should have made my commit 6d7f5cc much clearer and removed completely. If some linux distro decide to use it, then it's there right to shoot themself in the foot. But OpenJPEG should not make it a default option when compiling in Release, that's just wrong.

And finally as discussed with Antonin, I fail to understand this piece of code. So removing the excess precision using compiler specific option is just plain wrong. One would need to understand why this piece of code is so damn sensible to excess precision.

Thanks for adding the dataset to the test suite !

Add non regression data for uclouvain/openjpeg#571

malaterre · 2015-09-11T14:17:35Z

Should I re-open this issue for opj 1.5 branch ?

mayeut · 2015-09-11T16:42:50Z

The issue can only be affected to one milestone...
I think a new one shall be created or maybe just list this one it in #574 .

vinc17fr · 2015-10-20T13:41:46Z

Note that even if you remove the excess precision, your code might still be affected by double rounding (which occurs with a probability of 1/2048 on random data). So, you really need to understand the code and how it can be affected by floating-point rounding errors.
The bug was closed by doing "thresh - (dd / dr) < DBL_EPSILON" but one question is whether DBL_EPSILON is sufficient, i.e. what is the maximum error on thresh - (dd / dr) that you can expect?

malaterre · 2015-10-20T13:53:09Z

Salut Vincent ! Indeed I had seen this at least once through a funky example. Do you believe that simply increasing the precision for DBL_EPSILON will make the issue go away ?

vinc17fr · 2015-10-20T14:26:58Z

Depending on the context, a bound like DBL_EPSILON or something larger may or may not be correct. First, the code should be commented so that the reader can know what it tries to do, for instance some mathematical specification.
About the floating-point implementation: The variable dr is an integer, so that I suppose that it is exact. The variables dd and thresh are floating-point numbers. There are two questions about them: What is the range of thresh and dd / dr? What is the floating-point error bound on the variables dd and thresh?
Note that the current code may be regarded as suspicious, because the error bound DBL_EPSILON is an absolute one, while a relative one may be needed. For instance, if thresh and dd / dr can be large (or small) compared to 1, then the bound DBL_EPSILON does not make much sense. Hence my question about the range.

malaterre added the bug label Sep 1, 2015

malaterre assigned detonin Sep 1, 2015

malaterre changed the title ~~OPJ 1.5 is not lossless for 16bits grayscale input~~ openjp2/t1.c:1517:28: runtime error: left shift of negative value -128 Sep 8, 2015

malaterre modified the milestones: OPJ v2.1.1, OPJ v1.5.3 Sep 8, 2015

malaterre changed the title ~~openjp2/t1.c:1517:28: runtime error: left shift of negative value -128~~ opj_compress fails to compress lossless on gcc/x86 (-m32) Sep 8, 2015

malaterre added Priority-High Conformance labels Sep 8, 2015

mayeut added a commit to mayeut/openjpeg that referenced this issue Sep 9, 2015

Correct lossless issue on linux x86

5f02757

Update uclouvain#571

mayeut mentioned this issue Sep 9, 2015

Correct lossless issue on linux x86 #579

Merged

mayeut added a commit to mayeut/openjpeg-data that referenced this issue Sep 9, 2015

Add non regression data for uclouvain/openjpeg#571

4ed3802

mayeut mentioned this issue Sep 9, 2015

Add non regression data for uclouvain/openjpeg#571 uclouvain/openjpeg-data#10

Merged

mayeut added a commit to mayeut/openjpeg that referenced this issue Sep 9, 2015

Add test for uclouvain#571

4cde646

mayeut added a commit to uclouvain/openjpeg-data that referenced this issue Sep 11, 2015

Merge pull request #10 from mayeut/lossless

68b30b7

Add non regression data for uclouvain/openjpeg#571

mayeut closed this as completed in 5d95355 Sep 11, 2015

mayeut modified the milestones: OPJ v1.5.3, OPJ v2.1.1 Sep 11, 2015

malaterre mentioned this issue Sep 12, 2015

Backport bugfixes from git/master to 1.5 branch #574

Closed

detonin mentioned this issue Sep 16, 2015

opj_compress fails to compress lossless on gcc/x86 (-m32) in 1.5 branch #591

Closed

mayeut mentioned this issue Oct 15, 2015

better -ffast-math handling #554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opj_compress fails to compress lossless on gcc/x86 (-m32) #571

opj_compress fails to compress lossless on gcc/x86 (-m32) #571

malaterre commented Sep 1, 2015

malaterre commented Sep 1, 2015

malaterre commented Sep 1, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 9, 2015

malaterre commented Sep 9, 2015

mayeut commented Sep 9, 2015

mayeut commented Sep 9, 2015

malaterre commented Sep 10, 2015

malaterre commented Sep 11, 2015

mayeut commented Sep 11, 2015

vinc17fr commented Oct 20, 2015

malaterre commented Oct 20, 2015

vinc17fr commented Oct 20, 2015

opj_compress fails to compress lossless on gcc/x86 (-m32) #571

opj_compress fails to compress lossless on gcc/x86 (-m32) #571

Comments

malaterre commented Sep 1, 2015

malaterre commented Sep 1, 2015

malaterre commented Sep 1, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 8, 2015

malaterre commented Sep 9, 2015

malaterre commented Sep 9, 2015

mayeut commented Sep 9, 2015

mayeut commented Sep 9, 2015

malaterre commented Sep 10, 2015

malaterre commented Sep 11, 2015

mayeut commented Sep 11, 2015

vinc17fr commented Oct 20, 2015

malaterre commented Oct 20, 2015

vinc17fr commented Oct 20, 2015