Digital Forensics
Due Monday, March 18th

Assignment 2: Assembly language basics in Linux

Your assignment is to write two "hello world" programs in Linux assembly language using the GAS assembler (gas is available on the linprog machines as as, but with a small caveat mentioned later.) Also please create a "Makefile" that will assemble these two source files into two executables.

Your first program should be called hello_world.s.

Your second program should be called hello_world64.s.

Your make file should be called Makefile. It should by default produce two static (NOT dynamic) binaries: hello_world and hello_world64.

Your output for hello_world should look like:

langley@sophie ~/assembly $ ./hello_world
Hello World --- this is John Smith!

CIS 4385 Spring 2013

where "John Smith" is to be substitued with your name.

Your output for hello_world64 should look like:

langley@sophie ~/assembly $ ./hello_world64
Hello World (64 bit version) --- this is John Smith!

CIS 4385 Spring 2013

where, again, your name should appear rather than "John Smith".

When you do an strace hello_world, it should look like this:

strace ./hello_world
execve("./hello_world", ["./hello_world"], [/* 40 vars */]) = 0
[ Process PID=12870 runs in 32 bit mode. ]
write(0, "Hello World --- this is John Smi"..., 59Hello World --- this is John Smith!

CIS 4385 Spring 2013

) = 59
_exit(0)

When you run strace ./hello_world64, it should look this:

strace ./hello_world64
execve("./hello_world64", ["./hello_world64"], [/* 40 vars */]) = 0
write(1, "Hello World (64 bit version) ---"..., 76Hello World (64 bit version) --- this is John Smith!

CIS 4385 Spring 2013

) = 76
_exit(0)                                = ?

Notice that there are no shared libraries being loaded. Instead, you should see exactly three system calls: the execve of your program, a system call to write(2), and a system call to exit(2) with a value of 0. That's it. If you see anything else, then you haven't done this assignment correctly.

Also, please make sure that both programs use an entry point named hello_world, as shown below in the ld examples.

Linux assembly tips

Invoking assemblers, loaders, and debuggers

You must use GAS, and not NASM, FASM, or any of the many other assemblers that are out there. GAS is far better documented than most of the others (viz., AS manual); it's installed on most machines by default, and it handles multiple versions very gracefully.

Assembling with GAS is very simple:

as --32 -g -o hello_world.o hello_world.s       # assembles 32 bit
as --64 -g -o hello_world64.o hello_world64.s   # assembles 64 bit

The above lines will let you assemble your source files hello_world.s and hello_world64.s. The -g option adds debugging support, which you probably will want. You will end up with two object files, hello_world.o and hello_world64.o.

Next, you need to use the ld linker to create your executables:

ld -m elf_i386 -e hello_world -g -static -o hello_world hello_world.o
ld -m elf_x86_64 -e hello_world -g -static -o hello_world64 hello_world64.o

The above invocations of ld use -m to make sure that 32 bit and 64 bit binaries are created. The -g is to keep debug information around, and the -static tells the linker that this is a static binary which does not depend on any dynamic libraries.

Using gdb is very simple:

% gdb ./hello_world       # start up the debugger
% break hello_world       # set a breakpoint at your entry point
% run                     # start the program, which will stop immediate
% info reg                # show your registers, the most important command here (can be abbreviated "i r")
% step                    # step one instruction (can be abbreviated "s")
% help                    # everything you could want to know about GDB ;-)
% help all                # all the possible commands...

Linux 32 bit assembly

There are two big differences in 32bit and 64bit Linux assembly, and both center around how system calls are made.

1) In 32bit assembly, you use the instruction

        int    $0x80

to make a system call.

2) In 32bit assembly, arguments to system calls are loaded into these registers:

EAX -- the system call that you want to execute
EBX -- argument 1 for the system call
ECX -- argument 2 for the system call
EDX -- argument 3 for the system call
ESI -- argument 4 for the system call
EDI -- argument 5 for the system call
EBP -- argument 6 for the system call

Linux 64 bit assembly

1) In 64bit assembly, you use the instruction

         syscall

to make a system call.

2) In 64bit assembly, arguments to system calls are loaded into these registers:

RAX -- the system call that you want to make
RDI -- argument 1 for the system call
RSI -- argument 2 for the system call
RDX -- argument 3 for the system call
RCX -- argument 4 for the system call
R8  -- argument 5 for the system call
R9  -- argument 6 for the system call

To CPP or not

I happen to like using cpp to rewrite my logical system call names using "unistd_32.h" and "unistd_64.h", like this

         mov $__NR_exit, %eax    # cpp turns "__NR_exit" into "1"
	 int $0x80

However, for this assignment, it's probably easier to note that the relevant system calls and their numbers are:

32bit: 
  write(2) is 4
  exit(2) is 1

64bit:
  write(2) is 1
  exit(2) is 60

Thus, the above lines of assembly could be just written as

         mov $1, %eax    # just put "1" in by hand rather than use cpp to find it
	 int $0x80

Submission

The assignment is due by the beginning of class on Monday, March 18th.

Please create a tar file like this:

% tar cf assign2.tar Makefile hello_world.s hello_world64.s

and mail me your tar file (the email address is langley @ cs . fsu . edu).