commit dd04bc4bc2b35f03229a4f28144cd55272f9dc0c
parent c48f263a4adba663308ab35afa23ad270c30272c
Author: Virgil Dupras <hsoft@hardcoded.net>
Date: Sun, 30 Oct 2022 14:58:05 -0400
doc: add i386 assembler docs
I hadn't realized that it was missing!
Diffstat:
D | fs/doc/asm.txt | | | 72 | ------------------------------------------------------------------------ |
A | fs/doc/asm/i386.txt | | | 139 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
A | fs/doc/asm/index.txt | | | 108 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
3 files changed, 247 insertions(+), 72 deletions(-)
diff --git a/fs/doc/asm.txt b/fs/doc/asm.txt
@@ -1,72 +0,0 @@
-# Assemblers
-
-## Labels and flow
-
-The labels system at asm/label.fs is arch-neutral and rests on top of
-arch-specific jump words, which have these properties:
-
-* They take their jump target directly from PS.
-* They can be either relative or absolute jumps.
-* In the case of a relative jump, "0" means an infinite loop. When the native
- op doesn't have the same scheme, the op itself has to adjust the offset.
-
-For jumps to work, assemblers all need to have a point of reference, which is
-called "org". It's the address at which the binary you're assembling begins. If
-you're assembling for the live system, "org" is 0, its default value. If you're
-cross-compiling, you'll want to set "org" to "here" when you begin assembling.
-
-If the binary you're assembling isn't designed to run at address 0, then you
-also need to define "binstart". For example, a x86 PC boot sector will want to
-have "binstart" set to $7c00.
-
-These 2 values give us a new word, "pc" (for Program Counter), which will yield
-the address where the next assembled byte will be, returned in terms of the
-target machine.
-
-Jumps to locations in Dusk are made easier by the use of labels. Labels are a
-simple "value" that you declare before beginning the code:
-
- 0 value mylabel
-
-There are two possible types of jumps, backward or forward. Backward jumps are
-easy:
-
- pc to mylabel
- nop, nop,
- mylabel jmp, \ assuming "jmp," takes an absolute address
-
-If the jump you want to use takes a relative address, you can use "abs>rel":
-
- mylabel abs>rel jmp,
-
-Forward jumps are something else. We don't know in advance what's the target pc,
-so what we need to do is to emit the jump with a dummy value and save the
-address where the jump was written in a label. Then, when we reach that forward
-label location, we go back to the jump address saved and fiddle with its offset:
-
- forward jmp, to mylabel
- nop, nop,
- mylabel forward!
-
-That simple interface hides a few complexities we'll explain right here. You're
-probably asking yourself: how does the "forward!" word know about the
-properties of the "jmp," op? namely:
-
-* is it absolute or relative?
-* is the immediate 1b, 2b, 4b?
-* is its opcode part 1b or 2b in length?
-* if relative, what is it's "zero point"?
-
-and you would be quite right to ask these questions, because it's a tricky one.
-The answer is that the "forward!" word is implemented in an arch-specific manner
-that reads the target address itself and then decodes its own opcodes to answer
-those questions.
-
-Another question: how does "forward" know in advance the magnitude (4b? 2b? 1b?)
-of the jump? In most assemblers, jump words auto-detect their width based on the
-value they are passed.
-
-Answer: It doesn't. "forward" sends to jmp a value that intentionally doesn't
-fit in 2b so that, by default, forward jumps don't end up with space problems
-(at the expense of binary space). If you want to create tighter forward jumps,
-use "forward16" or "forward8" instead.
diff --git a/fs/doc/asm/i386.txt b/fs/doc/asm/i386.txt
@@ -0,0 +1,139 @@
+# i386 assembler
+
+The i386 assembler lives at asm/i386.fs and covers a big part of the
+architecture (some parts are not implemented yet).
+
+You write code by calling operands specifiers followed by an operation word. For
+example, "ax bx add," writes the ADD operations with EAX as a destination and
+EBX as a source.
+
+Register specifiers don't have a "e" prefix and their meaning depend on the
+current operation width (see below).
+
+The assembler keeps track of specified operands and will error out with "asm
+error" when operands are nonsensical. For example, "ax add," errors out. "ax bx
+cx" errors out because it's impossible to have more than 2 operands.
+
+Error checking is not foolproof the the assembler might let you assemble
+nonsensical opcodes, for example if you use an addressing mode that is not
+supported by the operation you're writing.
+
+## Register operand
+
+You use a register operand by typing its name. The list of registers is:
+
+ax bx cx dx
+sp bp si di
+al bl cl dl
+ah bh ch dh
+
+There is also a list of "special" registers which can only be used with "mov,"
+and another "regular" register operand:
+
+es ss ds fs gs
+cr0 cr2 cr3
+dr0 dr1 dr2 dr3 dr6 dr7
+tr6 tr7
+
+## Dynamic register operand
+
+If you want to specify a register in a dynamic manner, you can use the uppercase
+versions of those registers specified below (which are simple constants that
+yield the encoded in the opcode) and use the "opreg!" word. For example:
+
+create mylist DX c, SI c, BP c,
+mylist 2 + c@ opreg! inc,
+
+would result in the equivalent of "bp inc," being written
+
+## Immediate operand
+
+You can specify an immediate operand with the "i)" word. For example "si 42 i)
+sub," writes the SUB operation with ESI as destination and 42 as an immediate
+source.
+
+An immediate operand is always the source so order don't matter, but it's
+usually placed last. You will not get an error by placing it first.
+
+## Memory operand
+
+You can refer to a memory address with the "m)" word. "cl $1234 m) mov," loads
+byte at memory address $1234 in register CL.
+
+Order is important: "$2345 m) dx mov," writes the contents of EDX in memory
+address $2345.
+
+## Indirect register operand
+
+We can also refer to memory address stored in register and add a constant offset
+to it. For example, if EDI is $1200, "ax di 42 d) mov," loads the 4 bytes value
+add memory address $122a. Again, order is important and "di 42 d) ax mov" does
+the opposite.
+
+If you want indirect addressing without offset, use "0 d)". The assembler will
+automatically use the operation form that is more compact (because it contains
+no offset).
+
+## Operation width
+
+By default, operations are written in their 32-bit wide versions. But operations
+can be 32-bit, 16-bit or 8-bit wide. There are multiple factors deciding on that
+width.
+
+First, using an 8-bit register operator (al, ch, etc.) implicitly switches the
+assembler to 8-bit mode (for one operation, of course).
+
+Some operations can be 8-bit and not involve any register. For example "$1234 m)
+inc,". To have such an operation operate in 8-bit or 16-bit mode, you prefix it
+with "8b!" or "16b!". Example: "8b! $1234 m) inc,". This override lasts one
+operation.
+
+You can set the "realmode" global value to 1 to put the assembler in real mode.
+In this mode, the default width is 16-bit until you set "realmode" back to 0.
+
+## Jumps and calls
+
+jmp, and call, have two possible form. Immediate or mod/rm.
+
+In mod/rm mode, these operations work like others. For example, "ax jmp," works
+as you'd expect.
+
+The immediate offset form is used directly, without the "i)" word. The number
+supplied to it is expected to be an offset relative the operations *beginning*
+position (yes, *beginning*, unlike what the i386 operation expects, which is an
+offset from the end of the operation). This means that "0 jmp," is always an
+infinite loop.
+
+At this point a bit of fiddling happens to this offset. First, we check if the
+offset is big enough to fit in 8-bit. If it is, we will write the 8-bit form of
+the jump/call. If it's not, we will write the 32-bit form (or 16-bit form if
+we're in real mode).
+
+Then, after that, we need to adjust that offset so that it jumps where it's
+supposed to. This means subtracting 2, 3 or 5 bytes to that offset (depending on
+the width) before writing it.
+
+Conditional jumps (jz, jnc, etc.) work the same way except that they only
+support the immediate mode (again, no "i)") and will subtract an additional 1
+to the resuting offset in 16-bit/32-bit because the opcode is 2 bytes wide.
+
+## mul, and div,
+
+With mul, and div, the destination is always ax and you don't specify it. So,
+you'll write them like "bx mul," or "cx div,"
+
+## in, and out,
+
+The in, and out, operations support both their immediate form and their ax/dx
+form. You have to specify registers even if only al and ax are legal. Examples:
+
+42 i) al out, \ 8-bit out to port 42
+16b! 42 i) ax in, \ 16-bit in from port 42
+dx ax out, \ 32-bit out to port DX
+
+## Convenience macros
+
+The i386 assembler has a single convenience macro: movclr,
+
+It is called like mov, but if the operation is 8-bit or 16-bit, it ensures that
+the destination register is zeroed out before performing the move.
diff --git a/fs/doc/asm/index.txt b/fs/doc/asm/index.txt
@@ -0,0 +1,108 @@
+# Assemblers
+
+Assemblers are a crucial part of Dusk and a Dusk system will almost always have
+its native assembler loaded in memory because one of the machine drivers
+required it to load.
+
+On other systems, we're used to assemblers that parse a document containing a
+specific syntax and produce a resulting binary blob. In Dusk, assemblers are
+simple collections of words that sprinkle binary code around and can be called
+in many different context. For example, on a i386 system, you can insert binary
+contents in the middle of a word definition:
+
+: foo 42 + [ bp 0 d) 2 i) shl, ] ;
+
+which is the faster equivalent to:
+
+: foo 42 + << << ;
+
+Each assembler is different and you'll want to refer to arch-specific assembler
+documentation before you begin using it:
+
+* i386 (doc/asm/i386)
+
+While each assembler is different, they follow broad conventions which are
+described below.
+
+## Write symbol in words
+
+Words that write binary contents in assemblers all have a "," (write) suffix to
+indicate that effect.
+
+## Operands order
+
+Operands are specified before the operator word is called and in the same order
+as the architecture "natural" order (the order specified on the CPU assembler
+documentation). For example, on i386, the destination register is specified
+first, followed by the source operand, if any.
+
+## Labels and flow
+
+The labels system at asm/label.fs is arch-neutral and rests on top of
+arch-specific jump words, which have these properties:
+
+* They take their jump target directly from PS.
+* They can be either relative or absolute jumps.
+* In the case of a relative jump, "0" means an infinite loop. When the native
+ op doesn't have the same scheme, the op itself has to adjust the offset.
+
+For jumps to work, assemblers all need to have a point of reference, which is
+called "org". It's the address at which the binary you're assembling begins. If
+you're assembling for the live system, "org" is 0, its default value. If you're
+cross-compiling, you'll want to set "org" to "here" when you begin assembling.
+
+If the binary you're assembling isn't designed to run at address 0, then you
+also need to define "binstart". For example, a x86 PC boot sector will want to
+have "binstart" set to $7c00.
+
+These 2 values give us a new word, "pc" (for Program Counter), which will yield
+the address where the next assembled byte will be, returned in terms of the
+target machine.
+
+Jumps to locations in Dusk are made easier by the use of labels. Labels are a
+simple "value" that you declare before beginning the code:
+
+ 0 value mylabel
+
+There are two possible types of jumps, backward or forward. Backward jumps are
+easy:
+
+ pc to mylabel
+ nop, nop,
+ mylabel jmp, \ assuming "jmp," takes an absolute address
+
+If the jump you want to use takes a relative address, you can use "abs>rel":
+
+ mylabel abs>rel jmp,
+
+Forward jumps are something else. We don't know in advance what's the target pc,
+so what we need to do is to emit the jump with a dummy value and save the
+address where the jump was written in a label. Then, when we reach that forward
+label location, we go back to the jump address saved and fiddle with its offset:
+
+ forward jmp, to mylabel
+ nop, nop,
+ mylabel forward!
+
+That simple interface hides a few complexities we'll explain right here. You're
+probably asking yourself: how does the "forward!" word know about the
+properties of the "jmp," op? namely:
+
+* is it absolute or relative?
+* is the immediate 1b, 2b, 4b?
+* is its opcode part 1b or 2b in length?
+* if relative, what is it's "zero point"?
+
+and you would be quite right to ask these questions, because it's a tricky one.
+The answer is that the "forward!" word is implemented in an arch-specific manner
+that reads the target address itself and then decodes its own opcodes to answer
+those questions.
+
+Another question: how does "forward" know in advance the magnitude (4b? 2b? 1b?)
+of the jump? In most assemblers, jump words auto-detect their width based on the
+value they are passed.
+
+Answer: It doesn't. "forward" sends to jmp a value that intentionally doesn't
+fit in 2b so that, by default, forward jumps don't end up with space problems
+(at the expense of binary space). If you want to create tighter forward jumps,
+use "forward16" or "forward8" instead.