Bash, Python, Lua, Perl (among many, many others) are interpreted languages. When we "run" a script from an interpreted language, we need a program called an "interpreter" to run our code.
A compiled program is completely different. Its code is actually loaded into memory and executed as machine language, literally a sequence of bytes that the hardware processor understands as instructions.
Let's look at our obligatory "helloworld.c":
#include <stdio.h> int main(int argc, char **argv) { printf("Hello world!\n"); }
We can use the program xxd to illustrate the difference in a Bash script and a compiled C program:
$ xxd ~/anagrams/anagram-build.sh | head -7 0000000: 2321 2f62 696e 2f62 6173 680a 0a64 6563 #!/bin/bash..dec 0000010: 6c61 7265 202d 4120 6469 6374 696f 6e61 lare -A dictiona 0000020: 7279 0a0a 2366 6f72 2828 203b 203b 2029 ry..#for(( ; ; ) 0000030: 290a 7768 696c 6520 7265 6164 0a64 6f0a ).while read.do. 0000040: 2320 2020 2072 6561 6420 0a23 2020 2020 # read .# 0000050: 6966 205b 2024 3f20 2d67 7420 3020 5d0a if [ $? -gt 0 ]. 0000060: 2320 2020 2074 6865 6e0a 2309 6563 686f # then.#.echo $ xxd ~/helloworld | head -3 0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 0000010: 0200 3e00 0100 0000 1004 4000 0000 0000 ..>.......@..... 0000020: 4000 0000 0000 0000 4811 0000 0000 0000 @.......H.......
Compiling a "Hello World" can be done in a single line:
$ gcc -o helloworld helloworld.c
We are actually hiding some important steps; actually the first thing that happens is that a text preprocessor (generally m4 these days) is run over the helloworld.c file; we can ask the compiler to just do this stage with the -E option:
$ gcc -E helloworld.c
The next stage is the translation of the pre-processed C source into assembly language (a human-readable represenation of actual machine language); we can ask the compiler to stop after this stage with the -S option:
$ gcc -S helloworld.c $ cat helloworld.s .file "helloworld.c" .section .rodata .LC0: .string "Hello world!" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl %edi, -4(%rbp) movq %rsi, -16(%rbp) movl $.LC0, %edi call puts leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits
We can also ask the compiler to stop at the next stage, the creation of actual machine language but before we "link" the C program with the C runtime and its shared libraries:
$ gcc -c helloworld.c $ file helloworld.o $ xxd helloworld.o 0000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............ 0000010: 0100 3e00 0100 0000 0000 0000 0000 0000 ..>............. 0000020: 0000 0000 0000 0000 3801 0000 0000 0000 ........8....... 0000030: 0000 0000 4000 0000 0000 4000 0d00 0a00 ....@.....@..... 0000040: 5548 89e5 4883 ec10 897d fc48 8975 f0bf UH..H....}.H.u.. 0000050: 0000 0000 e800 0000 00c9 c300 4865 6c6c ............Hell 0000060: 6f20 776f 726c 6421 0000 4743 433a 2028 o world!..GCC: ( 0000070: 5562 756e 7475 2f4c 696e 6172 6f20 342e Ubuntu/Linaro 4. 0000080: 362e 332d 3175 6275 6e74 7535 2920 342e 6.3-1ubuntu5) 4. 0000090: 362e 3300 0000 0000 1400 0000 0000 0000 6.3............. 00000a0: 017a 5200 0178 1001 1b0c 0708 9001 0000 .zR..x.......... 00000b0: 1c00 0000 1c00 0000 0000 0000 1b00 0000 ................ 00000c0: 0041 0e10 8602 430d 0656 0c07 0800 0000 .A....C..V...... 00000d0: 002e 7379 6d74 6162 002e 7374 7274 6162 ..symtab..strtab 00000e0: 002e 736
The final stage is the linking/loading stage, where we resolve any outstanding references, and combine any other needed modules with our code modules to make our final executable. (With C, we generally need at least the C runtime files, such as libcrt?.o)
The traditional program in the Unix world to automate the process of compilation is called make. It allows one to specify a set of rules to specify how the units in a compilation (or compilations!) all depend on each other and how to create each bit.
helloworld: helloworld.c <tab> gcc -o helloworld helloworld.cThis is a complete Makefile. Let's try it out:
$ make make: `helloworld' is up to date. $ rm helloworld $ make gcc -o helloworld helloworld.c $ make make: `helloworld' is up to date. $ touch helloworld.c $ make gcc -o helloworld helloworld.c
So make is quite intelligent about when to re-create a binary using the dependency information we have provided in the first line.
We can quieten make down by using the "@" sign:
helloworld: helloworld.c @gcc -o helloworld helloworld.c
We can also use the very powerful "wildcard" system to automate compilation:
%.o: %.c gcc -c $*.c helloworld: helloworld.o gcc -o helloworld helloworld.o
Another popular thing to do is add targets that are merely conveniences, such as a clean target:
%.o: %.c gcc -c $*.c helloworld: helloworld.o gcc -o helloworld helloworld.o clean: @rm -f helloworld helloworld.o helloworld.s
Now when we do a "clean", all of the generated files that might be lingering around are removed:
$ make clean $ make gcc -c helloworld.c gcc -o helloworld helloworld.o
In recent years, cmake has been gaining some ground as a tool to automate the creation of Makefiles.
We can use cmake with our helloworld program:
$ mkdir helloworld.d $ cd helloworld.d $ cp ~/helloworld.c . $ cat > CMakeLists.txt <<EOF project(helloworld) add_executable(helloworld helloworld.c) EOF $ mkdir build-dir $ cd build-dir $ cmake .. $ make $ ./helloworld Hello world!