Custom Operating System Series
Part Two : Using NASM to Create a Disk Image File
This article is the second part in a series introducing system software development concepts for the Intel x86 or compatible processors running in protected mode.
In this series we are developing a simple protected mode operating system in assembly language, using the Netwide Assembler (NASM) to assemble our code and VMware Player as our test platform.
In our first article, we downloaded, installed and configured VMware Player to run a virtual machine that boots from a virtual floppy disk image file (os.flp).
Now we will use NASM to create a floppy disk image file to run when our virtual machine is started.
Step 1. Download NASM
Although these lab notes use NASM to assemble code, other assemblers can be used.
Our samples are assembled using version 2.09.10 of NASM, which is available at www.nasm.us.
Newer versions of NASM are made available periodically.
NASM can be downloaded as a simple .zip archive of binaries or as a Windows installer.
As of this writing, the download link for the .zip archive for win32 is:
Create a directory, unzip the contents of this archive into the directory and add the directory to your path.
A PDF version of the NASM user manual is available here.
Step 2. Create Our First Program
Our first program simply displays a message when our virtual machine starts.
The program demonstrates VMware finding our program on the first sector of the floopy disk image, the boot sector, loading it into memory and running it.
For these articles, I am using the Crimson Editor, first copyrighted by Ingyu Kang and now copyrighted by the Emerald Editor Community.
The version I am using can be downloaded here.
The program's source code is shown below.
Its entry point is at the line labeled "Start".
There is one subroutine that begins at the label "PutTTYString".
The program calls the subroutine to send a series of characters to the screen using a BIOS interrupt call.
After displaying the message, the program enters an infinite loop executing the halt (HLT) instruction.
Save this code and assemble it using the nasm command, "nasm os.asm -f bin -o os.flp -l os.lst".
NASM creates both the binary output, os.flp, and the assembly listing, os.lst.
Note the name of the executable output file is not an ".exe" or ".com" file.
That is because we are not creating a program to run on Windows or some other operating system.
In our first article we configured VMware player to load the file os.flp as a floppy-disk image.
After compiling our program, we need to copy it into the VMware directory to update the disk image.
To make assembling this program and copying it into its VMware directory easier, we create the following batch file, build.bat and have stored it and the program source in a directory reserved for this version of the program
The contents of the Floppy Disk image file can be inspected using a hex editor.
In these samples, we are using HxD version 18.104.22.168.
Information about this product can be found here.
Here is the assembly listing found in the file os.lst:
Step 3. Test the Program using VMware Player
After assembling the os.flp file, and copying it into the VMware directory, we can start our virtual machine to test it. To make this easier, I've created a batch file, "run.bat", in the directory for this program that simply refers to the os.vmx file in the VMware directory. Since installing VMware, the ".vmx" extension is associated with the VMware player.
Executing the run.bat file will launch VMware Player using the os.vmx file.
Step 4. Examine the VMware Log File
As we have just demonstrated, VMware emulates a PC Basic Input/Output System (BIOS) when our virtual machine is started.
This emulation allowed our program to be loaded into memory and run by the VMware player.
It also allowed our program to display a message by using a BIOS function call.
VMware Player intercepted this BIOS function call and emulated its operation, displaying output to the virtual machine window.
On startup, the BIOS searches for a bootable disk - one that contains program code on its first sector, known as the boot sector.
If you are familiar with the boot process used by PC-compatible computers, you may have noticed that the code we just tested is not a valid boot sector.
It is missing two important parts: a disk parameter table and a valid boot sector signature.
The disk parameter table tells the BIOS, among other things, how large a sector is.
This lets the BIOS know how many bytes it needs to read from the disk to have our entire boot sector program code in memory.
The boot sector signature is a two-byte code (0x55 0xAA) at the end of the boot sector.
Although our boot sector was missing these two critical parts, VMware was still able to load and run our program.
How was VMware able to do this if it didn't find a disk parameter table to tell it how big the boot sector is?
To answer this question, we can look at a file called "vmware.log".
VMware writes to this file each time it runs, keeping a log of what happened when it ran our virtual machine.
After running our virtual machine with our first boot sector code, we see these lines in the vmware.log file:
These lines in vmware.log show that VMware found invalid parameters (our code) where a disk parameters table should be and it did not find a valid signature at the end of the sector.
But we also see that VMware used default disk parameters, making some assumptions about the disk geometry.
Always check the vmware.log file after running your virtual machine.
We don't want VMware making assumptions.
In our next article, we will supply the missing boot sector elements and add code to the boot sector to search for, load and run an operating system program from the disk image file.
To stop the virtual machine, select the "Power" and "Shut Down Guest" menu option from the "Player" menu.
If prompted to confirm, select "Yes".
This will stop the virtual machine and return VMware Player to the Home screen.
We have successfully assembled a small program that uses a BIOS function to display a message.
We copied this file onto our floppy disk image file and saw that VMware would load and run this code, making assumptions about our disk geometry.
In the next article, we will replace this simple program with a new boot sector program that actually searches the floppy
disk directory for an operating system file, loads the file into memory and transfers control to it.
Source Lines 1-31:
The assembler interprets characters following a semicolon as comments.
At the start of a source code file, in addition to including the name of the file, it is also helpful to include a description of the program, any instructions helpful in creating and program and any additional notes.
We include in our program notes the NASM syntax used to assemble the program.
In this case, the "nasm" program name is followed by four arguments which can appear in any order.
"os.asm" indicates the name of the source code file to assemble.
"-f bin" tells NASM to create a binary executable and not an object deck.
"-o os.flp" tells NASM to name the binary executable "os.flp".
"-l os.lst" tells NASM to create an assembly listing named "os.lst".
Source Lines 32 and 33:
Here we define two symbolic constants.
We prefer to use symbolic names for constants instead of coding literal values wherever they are used.
This is so if the literal value must change, we can change it only once where its symbolic name is defined.
Source Line 45:
We include a "cpu" assembler directive at the start of our source code to let NASM know that we expect only minimal 8086 instructions at this point. This will make sure we do not introduce more advanced instructions until we verify what kind of CPU is executing our code.
We'll add a programmatic check of the CPU type in a later article.
Source Lines 46 and 47:
At the start of our program we can assume that our Code Segment (CS) register and Instruction Pointer (IP) register point to the first instruction of our program.
Beyond that, however, we should limit our assumptions about what values our registers hold upon entry to the program.
In fact, we really don't even know what exact values CS and IP will be set to when our program starts.
BIOS implementations vary.
Some will enter our code with CS:IP set to 700:C00, others with 0:7C00 or even 7C0:0.
Since the instruction pointer, an offset into our code segment, is unknown at program startup,
we cannot instruct the assembler as to what offset to assume when it generates machine-code addresses in the executable.
For this reason, trying to put a label on our message and loading that label's offset address into SI requires making an unsafe assumption about our CS:IP values at assembly time.
Unlike a hosting operating system, the BIOS alone won't "fix up" any address references when it loads our boot sector.
Now we certainly will want to define various constants and variables with labels.
How we will accomplish this safely we will demonstrate in our next article.
For now, we will use only relative addressing in the CALL and JMP/Jcc instructions.
Note that since we are asking NASM to simply output machine code without making any addressing assumptions, we do not need any "section" statements yet.
Line 46 carries the label "Start".
The first line of a program does not need to have a label.
But, a few lines further down we find two special labels, ".10" and ".20".
These are called "local" labels because they begin with a period.
Local labels are a NASM feature that allows the same labels to be reused within the same program.
The rules are that local labels can only be used after a non-local label has been defined and a local label cannot be repeated until another non-local label appears.
So, in order to use local labels on lines 50 and 52, we need a non-local label somewhere before them.
That's the only reason we've put a non-local label, "Start", on line 46.
Although not required, using a label called "Start" might also make it easier to understand where the program begins.
Source Lines 48:
Here we "call" our local label ".10".
The call instruction pushes the next sequential instruction address onto the stack.
What this is really doing is saving the address of our message on the stack.
The message we want to display is defined on line 49.
Source Lines 50 and 51:
The POP SI instruction takes the value off the top of the stack and puts it into our SI register.
This puts our message address into SI in preparation for our call to "PutTTYString".
Since SI really now holds what the instruction pointer, IP, held at the time of the call and because we have set DS to equal CS,
we can be confident that DS:SI actually addresses our message.
Finally we call "PutTTYString" on line 51.
This places the address of our next instruction, HLT, onto the stack.
Source Lines 52 and 53:
After we return from our subroutine we've run out of things to do.
Usually a program will "return" to its caller.
But a boot sector doesn't do that. It moves on to call whatever program it loads.
For now, we just want the program to keep running so we can admire the message it displays.
To do this we issue the HLT instruction to halt the CPU.
But, since HLT can be interrupted, we also place a JMP instruction to repeat our HLT until the virtual machine is powered off or restarted.
Source Line 63:
Our subroutine, "PutTTYString", calls a BIOS subroutine using a software interrupt.
To tell BIOS what function we want it to perform, we put a function code in register AH using one of our symbolic constant names.
The function code we use, 14, is a teletype function that instructs the BIOS interrupt handler to output a text character to the display.
Fortunately, this BIOS routine doesn't overwrite AH on return, so we can load it only once prior to our loop.
Source Line 64:
At the top of our loop, we have the LODSB instruction, which reads a byte at DS:SI into AL and advances SI depending on our direction flag.
This program assumes the direction flag is zero at power-up.
To avoid making this assumption, insert a "CLD" instruction before line 64.
Source Lines 65 and 66:
Having loaded a character into AL, we test to see if we have reached the end of our message, which is indicated by a NUL.
If we find a null, line 66 will perform a JMP to the local label ".20".
Source Line 67 and 68:
Having found a non-null byte, we can now call the BIOS function using interrupt 16.
After returning from the BIOS call, we JMP to local label ".10" to repeat our loop.
Source Line 69:
Line 69 contains our RET instruction to return from our subroutine.
This takes the address of our HLT instruction off the top of the stack and places it in the instruction pointer, IP, register, returning us to our mainline logic.
Three final points about the program listing, os.lst:
Observe how NASM represents operand bytes in the listing.
These appear in precisely the order in which they appear in the binary executable, byte for byte.
This is different than how other assemblers might display WORD or larger operands.
For example the bytes produced on listing line 48, "E81000", would appear as "E8 0010" on a listing created by MASM.
Note how NASM reports all output bytes, wrapping long lines, such as with the bytes of our "Hello, World!" message.
This is helpful because it lets us see each and every output byte.
But it can cause the line numbers of the listing to mismatch when compared to the line numbers of our source code.
Note how our source code ended at label ".20" on line 69, but our listing has 70 lines.
You may notice that the assembled operation codes produced by NASM do not match those produced by other assemblers.
Many times it is simply because NASM uses a "reg/rm" instead of a "rm/reg" encoding of the operands.
So, for example, NASM might encode an "xor al,al" instruction as "30C0" while MASM might encode it as "32C0".
The processor will produce the same results either way.
This difference is just a result of the fact that there are often several ways to encode the same instruction on x86 processors.
Revised 10 October 2014