duskos

dusk os fork
git clone git://git.alexwennerberg.com/duskos
Log | Files | Refs | README | LICENSE

commit dd04bc4bc2b35f03229a4f28144cd55272f9dc0c
parent c48f263a4adba663308ab35afa23ad270c30272c
Author: Virgil Dupras <hsoft@hardcoded.net>
Date:   Sun, 30 Oct 2022 14:58:05 -0400

doc: add i386 assembler docs

I hadn't realized that it was missing!

Diffstat:
Dfs/doc/asm.txt | 72------------------------------------------------------------------------
Afs/doc/asm/i386.txt | 139+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Afs/doc/asm/index.txt | 108+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 247 insertions(+), 72 deletions(-)

diff --git a/fs/doc/asm.txt b/fs/doc/asm.txt @@ -1,72 +0,0 @@ -# Assemblers - -## Labels and flow - -The labels system at asm/label.fs is arch-neutral and rests on top of -arch-specific jump words, which have these properties: - -* They take their jump target directly from PS. -* They can be either relative or absolute jumps. -* In the case of a relative jump, "0" means an infinite loop. When the native - op doesn't have the same scheme, the op itself has to adjust the offset. - -For jumps to work, assemblers all need to have a point of reference, which is -called "org". It's the address at which the binary you're assembling begins. If -you're assembling for the live system, "org" is 0, its default value. If you're -cross-compiling, you'll want to set "org" to "here" when you begin assembling. - -If the binary you're assembling isn't designed to run at address 0, then you -also need to define "binstart". For example, a x86 PC boot sector will want to -have "binstart" set to $7c00. - -These 2 values give us a new word, "pc" (for Program Counter), which will yield -the address where the next assembled byte will be, returned in terms of the -target machine. - -Jumps to locations in Dusk are made easier by the use of labels. Labels are a -simple "value" that you declare before beginning the code: - - 0 value mylabel - -There are two possible types of jumps, backward or forward. Backward jumps are -easy: - - pc to mylabel - nop, nop, - mylabel jmp, \ assuming "jmp," takes an absolute address - -If the jump you want to use takes a relative address, you can use "abs>rel": - - mylabel abs>rel jmp, - -Forward jumps are something else. We don't know in advance what's the target pc, -so what we need to do is to emit the jump with a dummy value and save the -address where the jump was written in a label. Then, when we reach that forward -label location, we go back to the jump address saved and fiddle with its offset: - - forward jmp, to mylabel - nop, nop, - mylabel forward! - -That simple interface hides a few complexities we'll explain right here. You're -probably asking yourself: how does the "forward!" word know about the -properties of the "jmp," op? namely: - -* is it absolute or relative? -* is the immediate 1b, 2b, 4b? -* is its opcode part 1b or 2b in length? -* if relative, what is it's "zero point"? - -and you would be quite right to ask these questions, because it's a tricky one. -The answer is that the "forward!" word is implemented in an arch-specific manner -that reads the target address itself and then decodes its own opcodes to answer -those questions. - -Another question: how does "forward" know in advance the magnitude (4b? 2b? 1b?) -of the jump? In most assemblers, jump words auto-detect their width based on the -value they are passed. - -Answer: It doesn't. "forward" sends to jmp a value that intentionally doesn't -fit in 2b so that, by default, forward jumps don't end up with space problems -(at the expense of binary space). If you want to create tighter forward jumps, -use "forward16" or "forward8" instead. diff --git a/fs/doc/asm/i386.txt b/fs/doc/asm/i386.txt @@ -0,0 +1,139 @@ +# i386 assembler + +The i386 assembler lives at asm/i386.fs and covers a big part of the +architecture (some parts are not implemented yet). + +You write code by calling operands specifiers followed by an operation word. For +example, "ax bx add," writes the ADD operations with EAX as a destination and +EBX as a source. + +Register specifiers don't have a "e" prefix and their meaning depend on the +current operation width (see below). + +The assembler keeps track of specified operands and will error out with "asm +error" when operands are nonsensical. For example, "ax add," errors out. "ax bx +cx" errors out because it's impossible to have more than 2 operands. + +Error checking is not foolproof the the assembler might let you assemble +nonsensical opcodes, for example if you use an addressing mode that is not +supported by the operation you're writing. + +## Register operand + +You use a register operand by typing its name. The list of registers is: + +ax bx cx dx +sp bp si di +al bl cl dl +ah bh ch dh + +There is also a list of "special" registers which can only be used with "mov," +and another "regular" register operand: + +es ss ds fs gs +cr0 cr2 cr3 +dr0 dr1 dr2 dr3 dr6 dr7 +tr6 tr7 + +## Dynamic register operand + +If you want to specify a register in a dynamic manner, you can use the uppercase +versions of those registers specified below (which are simple constants that +yield the encoded in the opcode) and use the "opreg!" word. For example: + +create mylist DX c, SI c, BP c, +mylist 2 + c@ opreg! inc, + +would result in the equivalent of "bp inc," being written + +## Immediate operand + +You can specify an immediate operand with the "i)" word. For example "si 42 i) +sub," writes the SUB operation with ESI as destination and 42 as an immediate +source. + +An immediate operand is always the source so order don't matter, but it's +usually placed last. You will not get an error by placing it first. + +## Memory operand + +You can refer to a memory address with the "m)" word. "cl $1234 m) mov," loads +byte at memory address $1234 in register CL. + +Order is important: "$2345 m) dx mov," writes the contents of EDX in memory +address $2345. + +## Indirect register operand + +We can also refer to memory address stored in register and add a constant offset +to it. For example, if EDI is $1200, "ax di 42 d) mov," loads the 4 bytes value +add memory address $122a. Again, order is important and "di 42 d) ax mov" does +the opposite. + +If you want indirect addressing without offset, use "0 d)". The assembler will +automatically use the operation form that is more compact (because it contains +no offset). + +## Operation width + +By default, operations are written in their 32-bit wide versions. But operations +can be 32-bit, 16-bit or 8-bit wide. There are multiple factors deciding on that +width. + +First, using an 8-bit register operator (al, ch, etc.) implicitly switches the +assembler to 8-bit mode (for one operation, of course). + +Some operations can be 8-bit and not involve any register. For example "$1234 m) +inc,". To have such an operation operate in 8-bit or 16-bit mode, you prefix it +with "8b!" or "16b!". Example: "8b! $1234 m) inc,". This override lasts one +operation. + +You can set the "realmode" global value to 1 to put the assembler in real mode. +In this mode, the default width is 16-bit until you set "realmode" back to 0. + +## Jumps and calls + +jmp, and call, have two possible form. Immediate or mod/rm. + +In mod/rm mode, these operations work like others. For example, "ax jmp," works +as you'd expect. + +The immediate offset form is used directly, without the "i)" word. The number +supplied to it is expected to be an offset relative the operations *beginning* +position (yes, *beginning*, unlike what the i386 operation expects, which is an +offset from the end of the operation). This means that "0 jmp," is always an +infinite loop. + +At this point a bit of fiddling happens to this offset. First, we check if the +offset is big enough to fit in 8-bit. If it is, we will write the 8-bit form of +the jump/call. If it's not, we will write the 32-bit form (or 16-bit form if +we're in real mode). + +Then, after that, we need to adjust that offset so that it jumps where it's +supposed to. This means subtracting 2, 3 or 5 bytes to that offset (depending on +the width) before writing it. + +Conditional jumps (jz, jnc, etc.) work the same way except that they only +support the immediate mode (again, no "i)") and will subtract an additional 1 +to the resuting offset in 16-bit/32-bit because the opcode is 2 bytes wide. + +## mul, and div, + +With mul, and div, the destination is always ax and you don't specify it. So, +you'll write them like "bx mul," or "cx div," + +## in, and out, + +The in, and out, operations support both their immediate form and their ax/dx +form. You have to specify registers even if only al and ax are legal. Examples: + +42 i) al out, \ 8-bit out to port 42 +16b! 42 i) ax in, \ 16-bit in from port 42 +dx ax out, \ 32-bit out to port DX + +## Convenience macros + +The i386 assembler has a single convenience macro: movclr, + +It is called like mov, but if the operation is 8-bit or 16-bit, it ensures that +the destination register is zeroed out before performing the move. diff --git a/fs/doc/asm/index.txt b/fs/doc/asm/index.txt @@ -0,0 +1,108 @@ +# Assemblers + +Assemblers are a crucial part of Dusk and a Dusk system will almost always have +its native assembler loaded in memory because one of the machine drivers +required it to load. + +On other systems, we're used to assemblers that parse a document containing a +specific syntax and produce a resulting binary blob. In Dusk, assemblers are +simple collections of words that sprinkle binary code around and can be called +in many different context. For example, on a i386 system, you can insert binary +contents in the middle of a word definition: + +: foo 42 + [ bp 0 d) 2 i) shl, ] ; + +which is the faster equivalent to: + +: foo 42 + << << ; + +Each assembler is different and you'll want to refer to arch-specific assembler +documentation before you begin using it: + +* i386 (doc/asm/i386) + +While each assembler is different, they follow broad conventions which are +described below. + +## Write symbol in words + +Words that write binary contents in assemblers all have a "," (write) suffix to +indicate that effect. + +## Operands order + +Operands are specified before the operator word is called and in the same order +as the architecture "natural" order (the order specified on the CPU assembler +documentation). For example, on i386, the destination register is specified +first, followed by the source operand, if any. + +## Labels and flow + +The labels system at asm/label.fs is arch-neutral and rests on top of +arch-specific jump words, which have these properties: + +* They take their jump target directly from PS. +* They can be either relative or absolute jumps. +* In the case of a relative jump, "0" means an infinite loop. When the native + op doesn't have the same scheme, the op itself has to adjust the offset. + +For jumps to work, assemblers all need to have a point of reference, which is +called "org". It's the address at which the binary you're assembling begins. If +you're assembling for the live system, "org" is 0, its default value. If you're +cross-compiling, you'll want to set "org" to "here" when you begin assembling. + +If the binary you're assembling isn't designed to run at address 0, then you +also need to define "binstart". For example, a x86 PC boot sector will want to +have "binstart" set to $7c00. + +These 2 values give us a new word, "pc" (for Program Counter), which will yield +the address where the next assembled byte will be, returned in terms of the +target machine. + +Jumps to locations in Dusk are made easier by the use of labels. Labels are a +simple "value" that you declare before beginning the code: + + 0 value mylabel + +There are two possible types of jumps, backward or forward. Backward jumps are +easy: + + pc to mylabel + nop, nop, + mylabel jmp, \ assuming "jmp," takes an absolute address + +If the jump you want to use takes a relative address, you can use "abs>rel": + + mylabel abs>rel jmp, + +Forward jumps are something else. We don't know in advance what's the target pc, +so what we need to do is to emit the jump with a dummy value and save the +address where the jump was written in a label. Then, when we reach that forward +label location, we go back to the jump address saved and fiddle with its offset: + + forward jmp, to mylabel + nop, nop, + mylabel forward! + +That simple interface hides a few complexities we'll explain right here. You're +probably asking yourself: how does the "forward!" word know about the +properties of the "jmp," op? namely: + +* is it absolute or relative? +* is the immediate 1b, 2b, 4b? +* is its opcode part 1b or 2b in length? +* if relative, what is it's "zero point"? + +and you would be quite right to ask these questions, because it's a tricky one. +The answer is that the "forward!" word is implemented in an arch-specific manner +that reads the target address itself and then decodes its own opcodes to answer +those questions. + +Another question: how does "forward" know in advance the magnitude (4b? 2b? 1b?) +of the jump? In most assemblers, jump words auto-detect their width based on the +value they are passed. + +Answer: It doesn't. "forward" sends to jmp a value that intentionally doesn't +fit in 2b so that, by default, forward jumps don't end up with space problems +(at the expense of binary space). If you want to create tighter forward jumps, +use "forward16" or "forward8" instead.