Skip to content

Latest commit

 

History

History
158 lines (113 loc) · 10.9 KB

tis-interpreter-tutorial.md

File metadata and controls

158 lines (113 loc) · 10.9 KB

STEP 1

Typing ./configure modifies the files Makefile and zconf.h, and generates a couple of other files that weren't present. tis-interpreter, as a compilation platform, is identical neither to the host platform on which I typed ./configure nor to the minimal compilation platform that the initial zconf.h targeted without assumptions. But tis-interpreter is closer to the latter, and also I may want to compile zlib with an ordinary compiler later, so I will go with the flow and start from the zconf.h that was generated when I typed ./configure. We will not use the generated Makefile for tis-interpreter, instead passing the C source files on the command-line.


STEP 2

We will want to invoke the program minigzip, provided in the [tests] directory, in decompression mode, as if this utility had been invoked as program -d input.gz. Note that minigzip tests the name it has been invoked under. By default, tis-interpreter invokes it as “program”, but we use the command-line options to specify decompression:

gzip < compress.c > compress.c.gz
tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz"

The *.c in the list of files passed to tis-interpreter includes the mkfs_filesystem.c file that was generated by tis-mkfs.

The result we get is:

[kernel] user error: Incompatible declaration for deflate:
                     Definitions of type z_streamp are not isomorphic. Reason follows:
                     Definitions of type z_stream are not isomorphic. Reason follows:
                     Definitions of struct z_stream_s are not isomorphic. Reason follows:
                     different number of fields in struct internal_state and struct internal_state: 1 != 59.
                     First declaration was at  zlib.h:246
                     Current declaration is at zlib.h:246
[kernel] Frama-C aborted: invalid user input.

Good, but not great.


STEP 3

The error we get means that tis-interpreter, when it puts all the source files together, sees two different types for the function deflate, and more to the point, the types seen for deflate are different because they use different definitions, in fine, for struct internal_state. Grepping for struct internal_state takes us to this “hack for buggy compilers” that is only accepted by non-buggy compilers because they lack any sort of link-time type validation (in theory, having different incompatible types for the same external-linkage variable in different files is undefined behavior, but zlib's use of undefined behavior obviously works in practice with nearly all compilers, since zlib is used everywhere).

We will simply define NO_DUMMY_DECL for now, although I wish that those with the buggy C compilers would have to opt-in to the ugly hack instead of the reverse. The commands invoking tis-interpreter become:

gzip < compress.c > compress.c.gz
tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL

The new result we get is:

$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed

NIY WARNING: new file isn't being added to the dirent

crc32.c:257:[value] warning: The following sub-expression cannot be evaluated:
                 (long)buf_0 & (long)3
                 
                 All sub-expressions with their values:
                 long  (long)buf_0 ∈ {{ (long)&hbuf }}
                 long  (long)3 ∈ {3}
                 unsigned char const *  buf_0 ∈ {{ &hbuf[0] }}
                 int  3 ∈ {3}
                 
                 Stopping
                 stack: crc32_little :: crc32.c:222 <-
                        crc32 :: inflate.c:649 <-
                        inflate :: gzread.c:191 <-
                        gz_decomp :: gzread.c:248 <-
                        gz_fetch :: gzread.c:347 <-
                        gzread :: test/minigzip.c:439 <-
                        gz_uncompress :: test/minigzip.c:540 <-
                        file_uncompress :: test/minigzip.c:629 <-
                        main

The first message, “NIY WARNING: new file isn't being added to the dirent”, is partially good news: it means that minigzip goes as far as creating a new file input. The bad news is that support for creating new files in tis-mkfs is still partial, but this does not need to worry us for now.

The next message indicates that tis-interpreter was unable to preserve the property that the values of variables at each point of the execution are the only possible values for these variables at that respective point. The construct causing problem, here, is (ptrdiff_t)buf & 3, which takes different values depending on the alignment of the pointer buf.


STEP 4

There exists an obscure commandline option to tell tis-interpreter to remain silent while analyzing the function crc32_little as written, but I would rather not use it here and change the source code instead. The obscure option is -address-alignment 4. It tells tis-interpreter to assume that every base address is a multiple of 4. The first reason for changing the source code instead of using that option is that without the option, tis-interpreter guarantees that the meaning of the program does not depend on parameters outside the programmer's control, such as the actual integers used as representations of addresses. Using the option means trading some of tis-interpreter's sensitivity in exchange for silence in the logs. The software's behavior, as written, does depend on the alignment of the buf pointer, and that makes it more difficult to test. The zlib library could, for instance, work fine on a specific test platform for all the testcases in the world because the test platform happens to align buf, and still fail when deployed on a different platform.

The second reason I would rather change the code here is that I have been interested in “strict aliasing” violations recently, and the *buf4 in function crc32_little, using a 32-bit unsigned int to access four unsigned char, violates these rules. A C compiler would be free to start miscompiling this function tomorrow even if it has been kind enough, until now, to respect the authors' intentions.

After removing the difficult-to-test and undefined-behavior-inducing part of the crc32_little function, we get to a state where tis-interpreter no longer gets stuck in this function, and warns about something else further down the execution:

$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed

NIY WARNING: new file isn't being added to the dirent

inftrees.c:188:[kernel] warning: pointer arithmetic: assert \inside_object(base-257);
                  stack: inflate_table :: inflate.c:1000 <-
                         inflate :: gzread.c:191 <-
                         gz_decomp :: gzread.c:248 <-
                         gz_fetch :: gzread.c:347 <-
                         gzread :: test/minigzip.c:439 <-
                         gz_uncompress :: test/minigzip.c:540 <-
                         file_uncompress :: test/minigzip.c:629 <-
                         main
[value] Stopping at nth alarm

STEP 5

The inftrees.c warning is easy to confirm, but a bit unpleasant to deal with: the code is making the pointer base point to the first element of an array lbase, and then subtracts 257 from it using pointer arithmetic. The intention is to make a sort of 257-indexed array, but these aren't any more allowed than 1-indexed arrays made by subtracting one are.

An implementation-defined solution for compilation platforms with a flat address space is to use only uintptr_t to refer to addresses outside lbase. This solution may still cause issues on platforms where the mapping between pointers and integers is more complicated than the one-to-one correspondance of flat address spaces, but let's face it, if you are targeting a platform without a flat address space, you should not be playing with out-of-bounds pointers either.

There may exist a cleaner solution than the one implemented in this commit for pointers base and extra, but I do not immediately see it.


STEP 6

The next message from tis-interpreter indicates another pointer going out of bounds in a way that was quite harmless in the 1990s. Again, the pointer trick is intentional but is becoming increasingly likely to bite you as compiler authors become “more interested in trying to find out what can be allowed by the c99 specs than about making things actually work”:

inffast.c:101:[kernel] warning: pointer arithmetic: assert \inside_object(strm->next_out-1);
                  stack: inflate_fast :: inflate.c:1024 <-
                         inflate :: gzread.c:191 <-
                         gz_decomp :: gzread.c:248 <-
                         gz_fetch :: gzread.c:347 <-
                         gzread :: test/minigzip.c:439 <-
                         gz_uncompress :: test/minigzip.c:540 <-
                         file_uncompress :: test/minigzip.c:629 <-
                         main

Fortunately the solution this time is already included in the zlib 1.2.8 source code: the out of bound pointer resulting from the *++(a) anti-pattern can be avoided by defining POSTINC, which makes zlib use the *(a)++ pattern instead.

And it seems this is the last change needed to decompress input.gz without invoking undefined behaviors (as detected by tis-interpreter):

$ gzip < compress.c > compress.c.gz
$ tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL --cc -DPOSTINC
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed

NIY WARNING: new file isn't being added to the dirent


NIY WARNING: unlinked file not being removed from dirent

[value] done for function main