Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufRead needs to be adjusted after seek() #1790

Merged

Conversation

kevinbackhouse
Copy link
Collaborator

Fixes: GHSA-mvc4-g5pv-4qqq

It looks to me like the intention of this code is that bufRead tracks the current position in the file, so it needs to be updated after every call to read() or seek(). There was a missing update which caused an infinite loop.

@kevinbackhouse kevinbackhouse marked this pull request as draft July 18, 2021 09:45
@kevinbackhouse
Copy link
Collaborator Author

I converted this PR to draft because I'd like to do some further clean-up in JpegBase::printStructure before this is merged.

@kevinbackhouse kevinbackhouse added this to the v0.27.5 milestone Jul 18, 2021
@kevinbackhouse kevinbackhouse added bug forward-to-main Forward changes in a 0.28.x PR to main with Mergify labels Jul 18, 2021
@kevinbackhouse
Copy link
Collaborator Author

I spent some more time looking at this and ended up doing a major overhaul of the code in this file, which you can see in the third commit. I believe the new code behaves exactly as before, except for one place where I made a minor change to the output by fixing a (benign) out-of-bounds read and another place where I fixed a bug in some code that unfortunately seems to be untested.

In the original code, a variable called bufRead was used to keep track of the current position in the file. There were also a lot of calls to seek(). Often the code would use seek(..., -bufRead, ...) to rewind and re-read the data. So any mistakes in the accounting of bufRead could cause the code to get confused, which is exactly what caused this bug.

Looking at the code, I was puzzled by the hard-coded number 36. JPG files are divided into segments, each of which starts with a "marker". The code seems to assume that every segment is at least 36 bytes long, which isn't actually true. In other words, it possible for the read() on line 369 to be incomplete. For example, that happens with our test file test/data/exiv2-bug1231b.jpg.

So I have concluded that the number 36 is a hack which ought to be fixed. I have replaced it with logic which (hopefully) correctly calculates the size of the payload, allocates a buffer of the corresponding size, and then reads the whole payload. As a side-effect, this has also enabled me to remove most of the calls to seek(). The purpose of most of those seeks was to rewind and reread parts of the payload when the length of the payload was greater than 36 bytes. That's not necessary if the whole payload has already been loaded into a buffer.

It turns out that this file already contained all the necessary logic to calculate the size of the payload, but it wasn't used systematically throughout the code. There are two steps:

  1. The "marker" determines whether the segment has a non-zero payload. Markers are 1-byte numbers so this can be encoded as 256 element Boolean array, named mHasLength.
  2. If the segment has a payload, then the size of the payload is encoded as a uint16_t in the two bytes immediately following the marker. The 2 bytes that are used to encode the size of the payload are included in the size of the payload, so the size is always at least 2.

I have duplicated the code which calculates the size of the payload and loads it into a buffer in the 4 source locations where segments are parsed. I duplicated the code because I can't create a shared utility function without modifying the header files on the 0.27-maintenance branch. When this PR is forwarded to the main branch, I will refactor it to reduce the code duplication.

The change in the expected output in test/data/icc-test.out is because I fixed an out-of-bounds read caused by line 721. The size of the payload is already included in the value of size, so adding start to it causes the subsequent call to binaryToString to read 2 bytes beyond the end of the payload.

The second bug that I fixed is on line 860. I believe that calculation is wrong. It is causing a negative number to be calculated on line 879, which causes a crash in the subsequent memory allocation. Unfortunately, that code is not covered by any of our tests, so I can't be completely sure what the expected behavior is. (I found the crash by fuzzing.)

@kevinbackhouse kevinbackhouse marked this pull request as ready for review July 21, 2021 21:17
@clanmills
Copy link
Collaborator

@kevinbackhouse This is really good stuff, Kev. Thank You for digging into this. I will be very happy if somebody else reviews this. If nobody volunteers by 2021-08-01, please email me and I will get involved. robin@clanmills.com

src/jpgimage.cpp Outdated Show resolved Hide resolved
src/jpgimage.cpp Outdated Show resolved Hide resolved
@kevinbackhouse kevinbackhouse merged commit de6b706 into Exiv2:0.27-maintenance Jul 26, 2021
kevinbackhouse added a commit that referenced this pull request Jul 26, 2021
bufRead needs to be adjusted after seek() (backport #1790)
@clanmills clanmills mentioned this pull request Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug forward-to-main Forward changes in a 0.28.x PR to main with Mergify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants