Typing ./configure
modifies the files Makefile
and zconf.h
, and generates a couple of other files that weren't present. tis-interpreter, as a compilation platform, is identical neither to the host platform on which I typed ./configure
nor to the minimal compilation platform that the initial zconf.h targeted without assumptions. But tis-interpreter is closer to the latter, and also I may want to compile zlib with an ordinary compiler later, so I will go with the flow and start from the zconf.h
that was generated when I typed ./configure
. We will not use the generated Makefile
for tis-interpreter, instead passing the C source files on the command-line.
We will want to invoke the program minigzip, provided in the [tests] directory, in decompression mode, as if this utility had been invoked as program -d input.gz
. Note that minigzip tests the name it has been invoked under. By default, tis-interpreter invokes it as “program”, but we use the command-line options to specify decompression:
gzip < compress.c > compress.c.gz
tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz"
The *.c
in the list of files passed to tis-interpreter includes the mkfs_filesystem.c
file that was generated by tis-mkfs
.
The result we get is:
[kernel] user error: Incompatible declaration for deflate:
Definitions of type z_streamp are not isomorphic. Reason follows:
Definitions of type z_stream are not isomorphic. Reason follows:
Definitions of struct z_stream_s are not isomorphic. Reason follows:
different number of fields in struct internal_state and struct internal_state: 1 != 59.
First declaration was at zlib.h:246
Current declaration is at zlib.h:246
[kernel] Frama-C aborted: invalid user input.
Good, but not great.
The error we get means that tis-interpreter, when it puts all the source files together, sees two different types for the function deflate
, and more to the point, the types seen for deflate
are different because they use different definitions, in fine, for struct internal_state
. Grepping for struct internal_state
takes us to this “hack for buggy compilers” that is only accepted by non-buggy compilers because they lack any sort of link-time type validation (in theory, having different incompatible types for the same external-linkage variable in different files is undefined behavior, but zlib's use of undefined behavior obviously works in practice with nearly all compilers, since zlib is used everywhere).
We will simply define NO_DUMMY_DECL
for now, although I wish that those with the buggy C compilers would have to opt-in to the ugly hack instead of the reverse. The commands invoking tis-interpreter
become:
gzip < compress.c > compress.c.gz
tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL
The new result we get is:
$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
NIY WARNING: new file isn't being added to the dirent
crc32.c:257:[value] warning: The following sub-expression cannot be evaluated:
(long)buf_0 & (long)3
All sub-expressions with their values:
long (long)buf_0 ∈ {{ (long)&hbuf }}
long (long)3 ∈ {3}
unsigned char const * buf_0 ∈ {{ &hbuf[0] }}
int 3 ∈ {3}
Stopping
stack: crc32_little :: crc32.c:222 <-
crc32 :: inflate.c:649 <-
inflate :: gzread.c:191 <-
gz_decomp :: gzread.c:248 <-
gz_fetch :: gzread.c:347 <-
gzread :: test/minigzip.c:439 <-
gz_uncompress :: test/minigzip.c:540 <-
file_uncompress :: test/minigzip.c:629 <-
main
The first message, “NIY WARNING: new file isn't being added to the dirent”, is partially good news: it means that minigzip goes as far as creating a new file input
. The bad news is that support for creating new files in tis-mkfs is still partial, but this does not need to worry us for now.
The next message indicates that tis-interpreter was unable to preserve the property that the values of variables at each point of the execution are the only possible values for these variables at that respective point. The construct causing problem, here, is (ptrdiff_t)buf & 3
, which takes different values depending on the alignment of the pointer buf
.
There exists an obscure commandline option to tell tis-interpreter to remain silent while analyzing the function crc32_little
as written, but I would rather not use it here and change the source code instead. The obscure option is -address-alignment 4
. It tells tis-interpreter to assume that every base address is a multiple of 4. The first reason for changing the source code instead of using that option is that without the option, tis-interpreter guarantees that the meaning of the program does not depend on parameters outside the programmer's control, such as the actual integers used as representations of addresses. Using the option means trading some of tis-interpreter's sensitivity in exchange for silence in the logs. The software's behavior, as written, does depend on the alignment of the buf
pointer, and that makes it more difficult to test. The zlib library could, for instance, work fine on a specific test platform for all the testcases in the world because the test platform happens to align buf
, and still fail when deployed on a different platform.
The second reason I would rather change the code here is that I have been interested in “strict aliasing” violations recently, and the *buf4
in function crc32_little
, using a 32-bit unsigned int
to access four unsigned char
, violates these rules. A C compiler would be free to start miscompiling this function tomorrow even if it has been kind enough, until now, to respect the authors' intentions.
After removing the difficult-to-test and undefined-behavior-inducing part of the crc32_little
function, we get to a state where tis-interpreter no longer gets stuck in this function, and warns about something else further down the execution:
$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
NIY WARNING: new file isn't being added to the dirent
inftrees.c:188:[kernel] warning: pointer arithmetic: assert \inside_object(base-257);
stack: inflate_table :: inflate.c:1000 <-
inflate :: gzread.c:191 <-
gz_decomp :: gzread.c:248 <-
gz_fetch :: gzread.c:347 <-
gzread :: test/minigzip.c:439 <-
gz_uncompress :: test/minigzip.c:540 <-
file_uncompress :: test/minigzip.c:629 <-
main
[value] Stopping at nth alarm
The inftrees.c
warning is easy to confirm, but a bit unpleasant to deal with: the code is making the pointer base
point to the first element of an array lbase
, and then subtracts 257 from it using pointer arithmetic. The intention is to make a sort of 257-indexed array, but these aren't any more allowed than 1-indexed arrays made by subtracting one are.
An implementation-defined solution for compilation platforms with a flat address space is to use only uintptr_t
to refer to addresses outside lbase
. This solution may still cause issues on platforms where the mapping between pointers and integers is more complicated than the one-to-one correspondance of flat address spaces, but let's face it, if you are targeting a platform without a flat address space, you should not be playing with out-of-bounds pointers either.
There may exist a cleaner solution than the one implemented in this commit for pointers base
and extra
, but I do not immediately see it.
The next message from tis-interpreter indicates another pointer going out of bounds in a way that was quite harmless in the 1990s. Again, the pointer trick is intentional but is becoming increasingly likely to bite you as compiler authors become “more interested in trying to find out what can be allowed by the c99 specs than about making things actually work”:
inffast.c:101:[kernel] warning: pointer arithmetic: assert \inside_object(strm->next_out-1);
stack: inflate_fast :: inflate.c:1024 <-
inflate :: gzread.c:191 <-
gz_decomp :: gzread.c:248 <-
gz_fetch :: gzread.c:347 <-
gzread :: test/minigzip.c:439 <-
gz_uncompress :: test/minigzip.c:540 <-
file_uncompress :: test/minigzip.c:629 <-
main
Fortunately the solution this time is already included in the zlib 1.2.8 source code: the out of bound pointer resulting from the *++(a)
anti-pattern can be avoided by defining POSTINC
, which makes zlib use the *(a)++
pattern instead.
And it seems this is the last change needed to decompress input.gz
without invoking undefined behaviors (as detected by tis-interpreter):
$ gzip < compress.c > compress.c.gz
$ tis-mkfs -local input.gz:compress.c.gz -nb-max-files 5
$ tis-interpreter.sh test/minigzip.c *.c --fs -val-args=" -d input.gz" --cc -DNO_DUMMY_DECL --cc -DPOSTINC
[value] Analyzing a complete application starting at main
[value] Computing initial state
[value] Initial state computed
NIY WARNING: new file isn't being added to the dirent
NIY WARNING: unlinked file not being removed from dirent
[value] done for function main