Written by: Derick Swanepoel (derick@maple.up.ac.za)
Version 1.0 - 2002-04-19, 01:50am
Download as zipfile
JMP Quickstart
This tutorial is an introduction to coding assembly in Linux. There are two "versions" to accommodate various people:
Mainly, the reason for this tutorial is to make assembly programming easier, better and more practical by doing it in Linux instead of DOS. Also, it may teach you a bit of Linux while you're at it (unless you're already at home with it).
Programming in assembly may seem quite masochistic (and writing entire programs in it simply ridiculous), especially in these days of super-optimizing compilers and visual development tools that do just about everything for you. However, there is an advantage in understanding more about the inner workings of your processor and kernel, and assembly is a good way of learning this. Sometimes assembly can be extremely useful for sticking inline in a C/C++ program. And if your program really has a "need for speed", you can tweak and optimize the assembly generated by the compiler (of course, you need to be pretty elite to produce better code than today's compilers.)
Since there was this notion that we were to be taught how to use Linux during COS284 (sort of as an aside), the idea was that we would code assembly in Linux. But not Linux assembly - DOS assembly, in a DOS emulator, with a DOS text editor. Of course, this entirely defeats the purpose, but maybe it was to be done this way mainly because there aren't so many Linux assembly tutorials and sample code as for DOS. Well, here is a tutorial that'll teach you the basics of Linux assembly.
Linux will almost always be intalled with the default assemblers as and as86 available, and quite likely also gas. However, we will be using NASM, the Netwide Assembler. It uses the Intel syntax just like TASM, MASM, and other DOS assemblers, and the structure is also fairly similar. (Useless info: as and gas use the AT&T syntax, which is somewhat different – eg. all registers must be prefixed with a %, and the source operand comes before the destination. See the References for a link to a tut using as and AT&T syntax.)
NASM is cool because it's portable (there are Linux, Unix and DOS versions), it's free and it's powerful with lots of nice features. Trust me.
If you selected "Development Tools" when you installed Linux, chances are you already have NASM. It comes standard with most Linux distributions, so you don't need to download it. To check if you've got it, just ask Linux, "Where is NASM?" Here's how:
If you see a line that says something like nasm: /usr/bin/nasm then you're fine. If all you see is nasm: then you need to install NASM. Here are some instructions on how to install NASM (or anything else) on Linux.
If you feel like getting the latest and greatest version of NASM, visit their website www.cryogen.com/Nasm, or get it with FTP from our local Linux mirror ftp.kernel.za.org/pub/software/devel/nasm/binaries.
An assembly program can be divided into three sections:
section .data message: db 'Hello world!' ; Declare message to contain the bytes 'Hello world!' (without quotes) msglength: equ 12 ; Declare msglength to have the constant value 12 buffersize: dw 1024 ; Declare buffersize to be a word containing 1024
section .bss filename: resb 255 ; Reserve 255 bytes number: resb 1 ; Reserve 1 byte bignum: resw 1 ; Reserve 1 word (1 word = 2 bytes) realarray: resq 10 ; Reserve an array of 10 reals
section .text global _start _start: pop ebx ; Here is the where the program actually begins . . .
Linux system calls are called in exactly the same way as DOS system calls:
Some example code always helps:
mov eax,1 ; The exit syscall number mov ebx,0 ; Have an exit code of 0 int 80h ; Interrupt 80h, the thing that pokes the kernel and says, "Yo, do this"
But how do you find out what these system calls are, and what they do, and what arguments they take? Firstly, all the syscalls are listed in /usr/include/asm/unistd.h, together with their numbers (the value to put in EAX before you call int 80h). However, for your convenience you can simply find them in this Linux System Call Table, together with some other useful information (eg. what arguments they take). Take a look at the list of syscalls – there are things like sys_write (4), sys_nice (34) and of course sys_exit (1). To find out just what these things do, you can look them up in the Linux manual pages (commonly called "the manpages"). That is what the next section is about.
First, open a terminal (or switch to one of the 6 consoles with CTRL+ALT+F1 through F6 - to get back to graphical mode press CTRL+ALT+F7). Say now you want to know what the "write" syscall does. Type man 2 write and press ENTER. This will bring up the manual page on "write" from section 2 of the manpages.
Under the NAME section is the syscall's name and what it does – in this case:
write - write to a file descriptorThis is the syscall you use to write to, well, a file. But you also use it to print stuff on the screen. "Why the heck is that?" you ask. See, in Linux everything is a file. Things like the screen, mice, printers, etc. are special files called "device files", but you read and write to them just like you do to a text file. This actually makes sense, because reading/writing files is one of the simplest things to do in programming, so why not do everything in the same simple way - but I digress.
Next, under the SYNOPSIS section you see a fairly ugly line:
ssize_t write(int fd, const void *buf, size_t count);OK, if you know C it won't be ugly, because this is just the C definition of the syscall. As you can see, it takes three arguments: the file descriptor, followed by the buffer, and then how many bytes to write, which should be however long the buffer is. (The DESCRIPTION section tells us what the arguments are for.) The file descriptor (fd) is an integer, the buffer (buf) is a pointer to a memory location (that's what the * means), so it's also an integer, and the bytes to write (count) is of type size_t, which is also an integer. This makes sense because we put values for these arguments in the registers EBX, ECX and EDX, which are all 32-bit integers. Finally, the write syscall returns a value in EAX: the number of bytes actually written. This can be used to verify if all went well.
Now we can finally write our first Linux assembly program!
Of course, the appropriate way to begin would be to print out "Hello world!" To print to the screen, we write to the special file called STDOUT (standard output), which is file descriptor 1. Here is the program in full:
section .data hello: db 'Hello world!',10 ; 'Hello world!' plus a linefeed character helloLen: equ $-hello ; Length of the 'Hello world!' string ; (I'll explain soon) section .text global _start _start: mov eax,4 ; The system call for write (sys_write) mov ebx,1 ; File descriptor 1 - standard output mov ecx,hello ; Put the offset of hello in ecx mov edx,helloLen ; helloLen is a constant, so we don't need to say ; mov edx,[helloLen] to get it's actual value int 80h ; Call the kernel mov eax,1 ; The system call for exit (sys_exit) mov ebx,0 ; Exit with return code of 0 (no error) int 80h
Copy this program into a text editor of your choice (I use vi or SciTE), and save it as hello.asm in your home directory (/home/yourname).
Before I go on, you're probably wondering what that equ $-hello thing is doing in our Hello World program (line 3). As you may remember, when you use equ to declare a variable (instead of db, for example), you are actually declaring a constant. Declaring the length of our string as a constant is sensible because it sure isn't going to change. But how does $-hello turn out to be the length of 'Hello world!' ? When NASM sees a '$' it replaces it with the assembly position at the beginning of that line. (That is also the position at the end of the previous line.) So subtracting the position of a variable from '$' will give us the number of bytes between the variable and '$'. If we want to declare a variable that contains the length of a string we've declared by saying hello: db 'Hello world!',10 then we just stick helloLen: equ $-hello on the next line. That will make helloLen equal to the number of bytes that hello takes up in memory, which in this case is 13 (the linefeed character also counts). Don't worry if this confuses you – just remember that it's a neat and easy way to declare the length of a string.
If you're more than just casually interested, I'd encourage you to check out the NASM documentation for more information on these things, and how to use some of the other neat features that I'm not going to mention in this tutorial.
Getting the command line arguments from a DOS program is not an enjoyable experience, because working with the PSP and having to worry about segments is simply a pain. In Linux things are much simpler: all arguments are available on the stack when the program starts, so to get them you just pop them off.
As an example, say you run a program called program and give it three arguments:
./program foo bar 42The stack will then look as follows:
|
|
Now lets write the program program that takes the three arguments:
section .text global _start _start: pop eax ; Get the number of arguments pop ebx ; Get the program name pop ebx ; Get the first actual argument ("foo") pop ecx ; "bar" pop edx ; "42" mov eax,1 mov ebx,0 int 80h ; ExitAfter all that popping, EAX contains the number of arguments, EBX points to wherever "foo" is stored in memory, ECX points to "bar" and EDX to "42". This is obviously way more elegant and simple than in DOS. It took us just 5 lines to get the arguments and even how many there are, while in DOS it takes 14 rather complicated lines just to get one argument! Note that the 3rd pop overwrites the value we put in EBX with the 2nd pop (which was the program name). Unless you have a really good reason, you can usually chuck away the program name as we did here.
NB: NASM doesn't have procedure definitions like you may have used in TASM. That's because procedures don't really exist in assembly: everything is a label. So if you want to write a "procedure" in NASM, you don't use proc and endp, but instead just put a label (eg. fileWrite:) at the beginning of the "procedure's" code. If you want to, you can put comments at the start and end of the code just to make it look a bit more like a procedure. Here's an example in both Linux and DOS:
Linux | DOS |
; proc fileWrite - write a string to a file fileWrite: mov eax,4 ; write system call mov ebx,[filedesc] ; File descriptor mov ecx,stuffToWrite mov edx,[stuffLen] int 80h ret ; endp fileWrite |
proc fileWrite mov ah,40h ; write DOS service mov bx,[filehandle] ; File handle mov cl,[stuffLen] mov dx,offset stuffToWrite int 21h ret endp fileWrite |
NB2: I assume that you're familiar with labels and jumping to them with instructions like JMP, JE or JGE. Now that you've seen that "procedures" are actually labels, there is one very important thing to remember: If you are planning to return from a procedure (with the RET instruction), don't jump to it! As in "never!" Doing that will cause a segmentation fault on Linux (which is OK – all your program does is terminate), but in DOS it may blow up in your face with various degrees of terribleness. The rule to remember is:
You may jump to labels, but you must call a procedure.
Calling a procedure is of course done with the CALL instruction. This makes life a bit difficult when you want to do things like "if-then-else". If you have a situation such as "if this happens, call procedure 1, else call procedure 2" there's only one thing to do: Jump around like a kangaroo weaving a spaghetti code masterpiece. Lets look at an example. First, here is some normal, sane code:
if (AX == 'w') { writeFile(); } else { doSomethingElse(); }This is how you would do it in assembly:
cmp AX,'w' ; Does AX contain 'w'? jne skipWrite ; If not, skip writing by jumping to another label, and doSomethingElse there... call writeFile ; ...else call the writeFile procedure... jmp outOfThisMess ; ...and jump past all of this spaghetti skipWrite: call doSomethingElse outOfThisMess: ... ; The rest of the program goes on hereNote that this is applicable to any assembly, not just Linux or NASM.
Now we can finally take a look at a program that does something remotely useful, containing almost everything we've covered. In the Quickstart version of this tutorial, I have included a Linux and a DOS version of the program we wrote in Practical 3 (the one that writes 'Hello world!' to the file given as a command line argument). Check it out and see how much simpler and logical the Linux program is compared to the DOS one.
Well, that's about it for this tutorial. I hope this has been a suitable introduction to doing
assembly programming in Linux. If you have any questions, suggestions or problems, feel free to
e-mail me at derick@maple.up.ac.za. This is my first
tutorial and I'm no assembly hacker either, so I welcome your comments.
Good luck and happy coding!
The terminal / console is an integral and very useful part of Linux. Linux has an excellent set of command line utilities and programs, and you can control the whole system without a GUI. Sometimes this is actually easier and faster. For programming in assembly you are obviously going to have to work in the terminal, and this part will show you how.
Before you start, keep in mind that Unix/Linux is case sensitive, so "Blah" is not the same as "blah" or "blaH".
[delta@quantumcow asmtut]$The part before the '@' tells you your username (mine is delta), then the computer name (quantumcow), and then the top-level current directory (asmtut).
[delta@quantumcow asmtut]$ pwd /home/delta/asmtut
[delta@quantumcow asmtut]$ cd /usr/share/doc [delta@quantumcow doc]$ pwd /usr/share/doc [delta@quantumcow doc]$ cd .. [delta@quantumcow share]$ cd ../.. [delta@quantumcow /]$At the end of this example, you end up in the root directory, / (similar to C:\). Now to get back to your home directory, type cd ~
[delta@quantumcow asmtut]$ cat foo.txt Hello, world!
In order to install programs on your Linux system, you must be root (administrator). You can decide whether you want to do this with the GUI utilities or in a terminal – I recommend you try both, for the added experience ;)
If you're working in KDE / Gnome, installing things is fairly straightforward:
Installing stuff by means of a terminal isn't difficult either:
Writing a useful program with NASM
The NASM documentation
Introduction to UNIX assembly programming
Linux Assembler Tutorial by Robin Miyagi
Section 2 of the manpages