In short, a binary is the output file that the computer can actually run when you compile high level code, such as C or C++. I believe in hands on learning, so we can take a look inside one to really find out.
Consider the file hello_world.c:
# include<stdio.h>
int main() {
printf("Hello World!\n");
}
This is your average C file, more or less. It's got a main function, some includes, and a little bit of code to be run. However, your computer can't actually run it. In order to make it usable, we must compile it:
$ gcc -m32 hello_world.c -o hello_world.bin
You can ignore the -m32
argument (we'll talk about it later), but the -o hello_world.bin
simply specifies what the name of the output file is going to be.
From here, we can execute it:
$ ./hello_world.bin
Hello World!
Unsurprisingly, we get "Hello World!"
as output. But let's go a bit deeper. We can open gdb (GNU Debugger)
and see what's happening under the hood:
$ gdb -q ./hello_world.bin
Reading symbols from ./hello_world.bin...(no debugging symbols found)...done.
gdb-peda$ disas main
Dump of assembler code for function main:
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
0x08048432 <+21>: leave
0x08048433 <+22>: ret
End of assembler dump.
gdb-peda$ quit
Your prompt probably looks like (gdb)
, whereas mine is gdb-peda$
. Don't worry about this, my gdb is modified.
The weird code that gdb
displayed is called assembly language. It's the lowest level human readable code out there. Each line maps directly to a machine instruction. Let's break this down.
0x0804841d <+0>: push %ebp
0x0804841e <+1>: mov %esp,%ebp
0x08048420 <+3>: and $0xfffffff0,%esp
0x08048423 <+6>: sub $0x10,%esp
The hex numbers you see on the left are addresses. You can think of these just like your house address: 0x0804841d
is where the instruction push %ebp
lives. These first four instructions are just conventions for a function, in this case main()
.
0x08048426 <+9>: movl $0x80484d0,(%esp)
0x0804842d <+16>: call 0x80482f0 <puts@plt>
These instructions are what actually print out "Hello World!"
. The program moves the address of the string "Hello World!"
into the memory address that %esp
points to. %esp
is a register, which you can think of as a special place the processor uses for storing values it needs quick access to. Each register can hold up to four bytes, usually some memory address. Our program then calls the puts()
function, which prints out whatever is at the address we supplied.
0x08048432 <+21>: leave
0x08048433 <+22>: ret
The last two instructions return control from our main()
function back to the C library, which then does some clean up and exits the program. We'll be learning more about how these binaries function in later tutorials.