Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is an issue with rendering some type of jpeg file. Please ref the link. #672

Closed
henrygg opened this issue Dec 14, 2015 · 9 comments
Closed
Labels

Comments

@henrygg
Copy link

henrygg commented Dec 14, 2015

https://bugs.chromium.org/p/pdfium/issues/detail?id=277

@szukw000
Copy link
Contributor

pdfimages -jp2 possible-error-0004841_Crash_fxdecod1_RG_001_eMag.pdf .

Syntax Warning: Found a misplaced 'cmap' box outside jp2h box<0a>
Syntax Warning: Non conformant codestream TPsot==TNsot.<0a>

j2k_to_image -i PDF-IMAGES-001.jp2 -o PDF-IMAGES-001.jp2-1.png

[WARNING] SOT marker inconsistency in tile 0: tile-part index greater (5) than number of tile-parts (5)
[INFO] tile 1 of 1
[INFO] - tiers-1 took 0.000000 s
[INFO] - dwt took 0.000000 s
[INFO] - tile decoded in 0.000000 s
Successfully generated Outfile PDF-IMAGES-001.jp2-1.png

The resulting image is 77 x 186 large and completely black.

opj_decompress -i PDF-IMAGES-001.jp2 -o PDF-IMAGES-001.jp2-2.png

[WARNING] Found a misplaced 'cmap' box outside jp2h box
[INFO] Start to read j2k main header (166).
[INFO] Main header has been correctly decoded.
[INFO] No decoded area parameters, set the decoded area to the whole image
[WARNING] Non conformant codestream TPsot==TNsot.
[INFO] Header of tile 1 / 1 has been read.
[INFO] Tile 1/1 has been decoded.
[INFO] Image data has been updated with tile 1.

[ERROR] Invalid component index 24400 (>= 1).
[ERROR] Invalid component index 1 (>= 1).
[ERROR] Invalid component/palette index for direct mapping 51.
[ERROR] Invalid component/palette index for direct mapping 234.
[ERROR] Invalid component/palette index for direct mapping 203.
[ERROR] Component 1 doesn't have a mapping.
[ERROR] Component 2 doesn't have a mapping.
[ERROR] Component 3 doesn't have a mapping.
ERROR -> opj_decompress: failed to decode image!

@szukw000
Copy link
Contributor

Downloading on 2015-12-15 the current OPENJPEG from GITHUB:

bin/opj_decompress -i PDF-IMAGES-001.jp2 -o PDF-IMAGES-001.jp2.png

[WARNING] Found a misplaced 'cmap' box outside jp2h box
[INFO] Start to read j2k main header (166).
[INFO] Main header has been correctly decoded.
[INFO] No decoded area parameters, set the decoded area to the whole image
[WARNING] Non conformant codestream TPsot==TNsot.
[INFO] Header of tile 1 / 1 has been read.
[INFO] Tile 1/1 has been decoded.
[INFO] Image data has been updated with tile 1.

[ERROR] Invalid component index 24400 (>= 1).
[ERROR] Invalid component index 1 (>= 1).
opj_decompress: openjpeg-master-2015-12-15/src/lib/openjp2/jp2.c:904: opj_jp2_check_color: Assertion `cmap[i].mtyp == 0 || cmap[i].mtyp == 1' failed.
Aborted

After adding fprintf() into jp2.c, line 904 I get:

[ERROR] Invalid component index 24400 (>= 1).
[ERROR] Invalid component index 1 (>= 1).
jp2c:904: cmap[0].mtyp(1)
jp2c:904: cmap[1].mtyp(75)
opj_decompress: openjpeg-master-2015-12-15/src/lib/openjp2/jp2.c:905: opj_jp2_check_color: Assertion `cmap[i].mtyp == 0 || cmap[i].mtyp == 1' failed.
Aborted

My simple JP2_READER shows:

FILE(PDF-IMAGES-001.jp2)
LENG(344)

read_jp2.c:1434:
name(ftyp)
brand(jp2 ) minv(0)
CL(0)(jp2 )
CL(1)(jpxb)
CL(2)(jpx )
read_jp2.c:1434:
name(rreq)
read_jp2.c:1434:
name(jp2h)
read_jp2.c:900:
read_jp2h
BOX name(ihdr) len(22)
read_ihdr
w(77) h(187) nc(1) bpc(7)
signed(0) depth(8)
compress(7) unknown_c(1) ipr(0)
read_jp2.c:900:
read_jp2h
BOX name(colr) len(15)
read_colr
meth(1) prec(2) approx(1) enumcs[12]CMYK
read_jp2.c:900:
read_jp2h
BOX name(pclr) len(19)
nr_entries(1) nr_channels(4)
----------- start pclr ----------------
channel[0]signed(0) depth(8)
channel[1]signed(0) depth(8)
channel[2]signed(0) depth(8)
channel[3]signed(0) depth(8)
---------------------------------------
entry[000]255 230 0 0
----------- end pclr ------------------
read_jp2.c:862:EXIT read_pclr
--- EXIT read_jp2h ---
read_jp2.c:1434:
name(cmap)

The CMAP box is outside the JP2H box. This file is buggy.

winfried

@szukw000
Copy link
Contributor

@henrygg , can you create a PDF page with a JP2 file that is not buggy?

winfried

@santoch
Copy link

santoch commented Dec 15, 2015

The problem is these pdfs are provided to us by a 3rd party.
Somehow Adobe is able to display it. I assume they (Adobe) must be generating bad jpgs within the pdfs? We see lots of failures of this assertion now. (~10% of the pdfs from of a pool of 70,000 or so)

Is there a way the code can handle (handle? fixup?) these bad CMAPs and continue without asserting and thereby aborting the whole process?

@henrygg
Copy link
Author

henrygg commented Dec 21, 2015

At around line 887 of jp2.c file, in the following code. In the data structure of opj_jp2_pclr_t, what is the relationship between nr_channels and nr_entries? Does the "nr_entries" or "nr_channels" specify the count of entries contained in cmap?

    for (i = 0; i < nr_channels; i++) {
        if (cmap[i].cmp >= image->numcomps) {
            opj_event_msg(p_manager, EVT_ERROR, "Invalid component index %d (>= %d).\n", cmap[i].cmp, image->numcomps);
            is_sane = OPJ_FALSE;
        }
    }

@szukw000
Copy link
Contributor

@santoch ,

the library already tries to handle CMAP boxes outside the JP2H box. But if the respective values
are incorrect one can only ASSERT or return with FALSE.
@henrygg, @santoch,

you should download the following file:

https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/20_T-REC-T_1_.800-200208-I__PDF-E.pdf

Later files for 15444-1 can be bought ( search for '15444-1-2004').

Search for 'palette' and 'cmap'.

winfried

@henrygg
Copy link
Author

henrygg commented Dec 22, 2015

It seems the cmap entries are not validated when read from the buggy image. Also for this particular pdf(from https://bugs.chromium.org/p/pdfium/issues/detail?id=277), numcomps field in jp2 is 1, and the number of channels is 4, I am not sure whether the values are good or not.

@szukw000
Copy link
Contributor

@henrygg,

CMYK has 4 channels. The color shown by ADOBE or XPDF is blue only: one palette entry.

----------------------------------------

meth(1) prec(2) approx(1) enumcs[12]CMYK
read_jp2.c:900:
read_jp2h
BOX name(pclr) len(19)
nr_entries(1) nr_channels(4)
----------- start pclr ----------------
channel[0]signed(0) depth(8)
channel[1]signed(0) depth(8)
channel[2]signed(0) depth(8)
channel[3]signed(0) depth(8)
---------------------------------------
entry[000]255 230 0 0
----------- end pclr ------------------

One can show this blue color if one only uses the color palette (PCLR) and ignores
the wrong CMAP values.

winfried

@rwhitworth
Copy link

This issue is easily reproducible with a debug build using afl-fuzz and a small test corpus.

The Chromium project solved their issue by returning false instead of asserting (and exiting). Is there a benefit of asserting instead of handling the error condition cleanly?

rouault added a commit that referenced this issue Jul 27, 2017
…eck (#672, #895)

Fixes test case openjeg-crashes-2017-07-27/id:000000,sig:06,src:000001,op:flip1,pos:808.jp2
of #895
@rouault rouault closed this as completed Jul 27, 2017
@detonin detonin added the bug label Aug 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants