Author: Virgil Dupras <firstname.lastname@example.org>
Date: Thu, 20 Oct 2022 20:00:03 -0400
doc: pimp up and clean up
11 files changed, 106 insertions(+), 165 deletions(-)
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -36,9 +36,9 @@ I was thinking of beginning with something like [GNU diffutils][diffutils].
-I was thinking of not only porting the C uxn VM, but also making uxn a first
-class citizen of Dusk, with the possibility of creating adhoc uxn words. I think
-it would open interesting doors.
+I was thinking of not only porting the C [uxn][uxn] VM, but also making uxn a
+first class citizen of Dusk, with the possibility of creating adhoc uxn words.
+I think it would open interesting doors.
## Word annotations
@@ -77,3 +77,4 @@ What about the Raspberry pi?
Especially to access mass storage through it.
diff --git a/fs/doc/arch.txt b/fs/doc/arch.txt
@@ -1,18 +1,18 @@
# Dusk OS Architecture
-# Subroutine Threaded Code
+## Subroutine Threaded Code
This Forth is a Subroutine Thread Code (STC) Forth, that is, each reference to
words is a native call instead of being a reference. This means that we don't
have a "next" interpret loop. It's calls all the way down.
-# Linked lists
+## Linked lists
The linked list is a data structure that is heavily used in Dusk: dictionaries
are a specialized form of linked lists. A linked list is a structure where the
first element is a 4 bytes pointer to the next element.
-# Dictionary structure
+## Dictionary structure
Words in this Forth are embedded in a dictionary, which is a list of entries
each pointing to the previous entry. We keep that last added entry in "current".
@@ -43,18 +43,18 @@ we add 5 bytes.
Except for words specifically made for manipulating dictionary entries, we
rarely deal with "entry" pointers. We most often deal with word pointers.
When we use the word "cross-compiled" below, it means that the binary that is
being ran was not compiled by the system running it.
-# The layers of the system
+## The layers of the system
A running Dusk OS instance has a few layers upon which the prompt is laid out.
We try to keep a consistent terminology for each of those layers, this
terminology is defined below.
-## Native Kernel (kernel)
+### Native Kernel (kernel)
This is the very core of the system, written in assembler and cross-compiled.
Its role is to implement a core set of words such as "word", "runword", "parse",
@@ -67,7 +67,7 @@ being".
The source for this kernel is posix/vm.c for the CVM and fs/xcomp/i386.fs for
the i386 kernel.
-## Boot layer (boot)
+### Boot layer (boot)
At this point, we're done with cross-compiled binaries and we're now entirely
on our own. Let's pick ourselves up by the bootstraps!
@@ -109,7 +109,7 @@ The "boothi" part takes all of this and implements "fload", and then loads
/sys/file.fs and then /init.fs. Then, it executes the word "init".
This is sourced from /xcomp/boothi.fs
-## Initialization layer (init)
+### Initialization layer (init)
The 2 first layers are machine-dependent and will not change unless something
fundamental changes with your machine. The "init" layer, however, is
@@ -128,7 +128,7 @@ which sets ConsoleIn to RdlnIn, which makes the system interactive.
The system is yours.
-# lib or sys?
+## lib or sys?
What's in /lib? What's in /sys? This question can sometimes lead to confusion.
@@ -160,7 +160,7 @@ a subsystem.
For example, sys/scratch is centered around a buffer which can vary in size
depending on what the sysop wants.
-# Parens words ()
+## Parens words ()
At the core of each kernel is a set of words that all have their name wrapped
inside parentheses, such as (br), (val), etc. These words are designed to not
diff --git a/fs/doc/dict.txt b/fs/doc/dict.txt
@@ -272,23 +272,6 @@ entry 'dict s --
Create entry with name s in dictionary 'dict.
code "x" -- Same as "entry", but reads name from input stream.
current -- w Yield the last word to be added to the system dictionary.
-emeta w -- m Yield the value of the "metadata" field of the entry
- associated with word w.
-'emeta w -- 'm Same as "emeta", but yield the address of that field.
-wordlen w -- n Yield the length of the name of word w.
-wordname w -- a u Yields a range with the name of word w.
-.word w -- Emit name of word w.
-words -- Emit names of all words of the system dictionary.
-## Entry metadata
-An entry metadata is a linked list followed by a type id.
-emetatype m -- n Yield the type id of metadata m.
-'emetadata m -- a Yield the address of the "arbitrary data" part of m.
-findmeta typeid m -- m-or-0
- Find a metadata element of type "typeid" in meta LL "m" and
- yield this meta if found. Otherwise, yield 0.
@@ -296,7 +279,11 @@ struct[ "x" -- Create new struct "x" and begin defining it.
]struct -- Exit current struct definition.
extends "x" -- Find struct "x" in system dictionary and make the next
defined struct extend it.
-sfield "x" -- Add a new struct field named "x".
+sfield "x" -- Add a new struct 4b field named "x".
+sfieldw "x" -- Add a new struct 2b field named "x".
+sfieldb "x" -- Add a new struct 1b field named "x".
+sconst "x" -- Add a new struct read-only 4b field named "x".
+sfield' sz "x" -- Add a new struct buffer of size sz named "x".
smethod "x" -- Add a new struct method named "x".
structbind 'data "x y" --
Create a new binding named "x" that binds 'data to struct
diff --git a/fs/doc/dirs.txt b/fs/doc/dirs.txt
@@ -1,7 +0,0 @@
-# Directory structure
-/lib: collections of words to be used in multiple apps/subsystems
-/sys: subsystems. self-contained set of words that provide a "background"
- service. Readline interface, Grid interface, etc. They're similar to
- libraries, but only a handful of words in them are externally usable.
diff --git a/fs/doc/io.txt b/fs/doc/io.txt
@@ -1,6 +1,6 @@
-The lib/io subsystem offers a unified API to read and write on any device or
+The sys/io subsystem offers a unified API to read and write on any device or
filesystem. All words in this API revolves around a structures that we call the
"IO handle". These handle represent one "place" we read and write to, saving,
if needed, all states needed to fulfill the API's specifications.
diff --git a/fs/doc/loading.txt b/fs/doc/loading.txt
@@ -1,20 +0,0 @@
-# loading files
-most of these words are defined in lib/file.fs and boot.fs
-## parsing words
-these words all parse a filename from the input stream.
-* unconditionally load file: f<<
-* load file if not already loaded: ?f<<
-* throw error if file is not loaded: require
-## string words
-these all accept a filename as a string
-* unconditionally load file: fload
-* test if a file is loaded: floaded?
-* print a list of all loaded files: .floaded
diff --git a/fs/doc/selfhost.txt b/fs/doc/selfhost.txt
@@ -1,21 +0,0 @@
-Dusk OS is self-hosting. The documentation to do so is preliminary, but here's
-an example of self-hosting under QEMU. From the POSIX shell:
-dd if=/dev/zero of=tgt.img bs=1M count=1
-qemu-system-i386 -drive file=pc.img,format=raw -drive file=tgt.img,unit=1,format=raw
-Then from the Dusk shell:
-ATA0:1 1938 buildPC ( fat )
-S" /xcomp/pc/init.fs" S" /xcomp/init.fs" rot combineInit
-Then, from the POSIX shell:
-qemu-system-i386 -drive file=tgt.img,format=raw
-You're running a freshly self-hosted Dusk!
diff --git a/fs/doc/usage.txt b/fs/doc/usage.txt
@@ -1,9 +1,36 @@
# Dusk OS usage
-NOTE: this document isn't complete. I'm only writing a few notes that will end
-up being in the usage guide once it's done.
+Dusk OS is a Forth that generally follow conventions described in "Starting
+Forth" by Leo Brodie, except that words are lowercase. If you don't know Forth,
+it's recommended that you start there.
-# String literals
+Then, you can look at doc/dict to have an broad idea of the vocabulary that is
+available to you. You should recognize many words in there from Starting Forth
+and should be able to get started.
+That being said, Dusk OS has some additional features that need explaining:
+## Number literals
+Dusk has no DEC/HEX mode. Number literals are parsed using a prefix system.
+* "naked" numbers are parsed as decimal: 1234
+* "$" is the prefix for hexadecimal notation: $12fe
+* "'" is the prefix for a character literal and must be closed: 'A'
+String are an address to an area in memory starting with a length byte followed
+by that many characters. When we refer to a "string", we refer to that address.
+For example, this code will yield a "hello" string to PS (Parameter Stack):
+here 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,
+The code above is the equivalent of:
+## String literals
When a string literal word such as S" ." or ," is used, the following content
is parsed in an almost verbatim manner until the closing " is reached. We say
@@ -19,7 +46,34 @@ character:
Any other character following the '\' results in that character being parsed as-
is, the preceding '\' being ignored.
-# "to" semantics
+## Values, cells, constants, aliases
+A "cell" is a word that refers to an area in memory. Calling this word yields
+the address directly following it:
+create mycell 5 c, 'h' c, 'e' c, 'l' c, 'l' c, 'o' c,
+Calling "mycell" will yield the string "hello".
+A "value" is a 4 byte area where a value is stored. It's a bit like a cell,
+but calling the value dereferences its address.
+42 value myvalue
+Calling "myvalue" yields 42. Moreover, it obeys to "to" semantics (see below).
+A constant is a read-only value that doesn't obey "to" semantics:
+42 const myconst
+An alias is a shortcut to another word:
+alias noop myalias
+Calling "myalias" is the same as calling "noop". Aliases obey "to" semantics and
+can thus be changed.
+## "to" semantics
Values and aliases are very similar to cells: they're a piece of memory attached
to a "handling" routine. With the cell, the routine is a noop, it returns the
@@ -31,7 +85,7 @@ the second jumps to the address contained by that memory.
These routines come with... side effects. How can you modify a value or an
alias? You need a "to" word.
-The "to" words ("to", "to+", "to'") set a global variable with a pointer to an
+The "to" words ("to", "to+", etc.) set a global variable with a pointer to an
alternate routine for value or alias to execute. For example, the "to" word
makes the "to" global pointer point to "!".
@@ -47,7 +101,7 @@ close to your value/alias call.
to+ sets "to" to "+!"
to' sets "to" to "noop" (returns the address)
It's a common pattern to want to "chain" behaviors in aliases. For example, one
could want to set the "emit" alias to a word that calls the previous "emit"
@@ -73,7 +127,7 @@ would work too:
: moduleinit chain emit myemitroutine ;
-# Linked lists
+## Linked lists
Linked lists are a fundamental data structure in Dusk. They are simply addresses
in memory pointing to each other, with the last element of the list pointing to
@@ -87,7 +141,7 @@ When you want to add a new element to the list, you can call "lladd", which
makes the list's last element point to "here". You can then write your new
-# Dictionary entry metadata
+## Dictionary entry metadata
Each entry in the dictionary can have metadata linked to it in the form of a
linked list. The pointer to the first element (or 0 if none) for an entry is
@@ -99,7 +153,7 @@ Each metadata element has this structure:
4b type ID
( any other type-specific data )
-# Local variables
+## Local variables
It's a common pattern, to avoid PS juggling, to place an element on RS and
recall that element with r@. It works well, but unfortunately, this only works
@@ -133,7 +187,7 @@ RS. All those variables give you are "to" semantics to a "RS slot". Example:
: inc5 ( a -- a+5 ) >r 5 >r begin 1 to+ V1 next V1 rdrop ;
42 inc5 . \ prints 47
What's this "rfree" used above? It's an automatic RS adjuster. It looks at the
"R counter" and emits an RS adjustment equivalent to its current level, and then
@@ -143,7 +197,7 @@ rdrop".
Be aware that the "R counter" is not always accurate! If you have conditional
modifications to RS levels, "rfree" is going to be broken. See section below.
-## Manual [rcnt] adjustments
+### Manual [rcnt] adjustments
The "R counter" that determines local variable slots is oblivious to conditional
codes. It's not common to have code that conditionally maintain separate RS
@@ -155,7 +209,7 @@ V1, you would precede it with:
[ 0 [rcnt] ! ]
-# Binary width modulation
+## Binary width modulation
In a 32-bit system, it is frequent to want to access memory in 3 widths: 32-bit,
16-bit and 8-bit.
@@ -194,7 +248,7 @@ TODO: allow creation of width-modulable words in Forth. Something like:
TODO: add dictionary entry flag to indicate that the word is binary modulable.
this way, we can avoid crashes, making the system a bit easier to debug.
Structures are an effective way to address offsets from base addresses while
keeping the general namespace clean. Structures have a name and a list of fields
@@ -218,16 +272,16 @@ words inside the struct directly.
automatically place themselves inside the struct at the correct offset and
increase the struct's size.
-A struct size can be obtained with "structsz". It returns the size, in bytes, of
-the fields included in the struct. It can also be used inside a struct
-definition to get the struct size "up until now". This can be useful for buffers
+A struct size can be obtained with a word "SZ" automatically added to every
+struct. It returns the size, in bytes, of the fields included in the struct. It
+can also be used inside a struct definition to get the struct size "up until
+now". This can be useful for initialization methods:
- ' Foo structsz &+ buf( \ yields the address at the end of the fields
+ : :new ( -- 'foo ) here SZ allot0 ;
A struct hold no data by itself and can't be used directly to access fields from
@@ -282,6 +336,15 @@ to the data. It can be used to get a structbind's data reference:
data2 :self \ MyData1 is on PS TOS
+There are several kinds of fields:
+* sfield: a 4 bytes field
+* sfieldw: a 2 bytes field
+* sfieldb: a 1 byte field
+* sconst: a 4 bytes field that doesn't obey "to" semantics
+* sfield': a field that yields its address instead of a value. Useful for
+ buffers. It must be called with a size argument.
You can also extend a previous struct with a new struct:
extends Foo struct[ Bar
@@ -340,4 +403,3 @@ redirectable output" word.
The basic Dusk console, the sys/rdln subsystem, inserts itself between "key" and
"stdin". It feeds itself from key and provides line editing capabilities. When
a whole line is ready to be interpreted, it is fed to stdin.
diff --git a/fs/doc/value.txt b/fs/doc/value.txt
@@ -1,55 +0,0 @@
-# value and alias semantics in-depth
-There are several words that are used to define other words, such as
-":" and "code", but the three that will be explained here are
-"create", "value", and "alias".
-create is mostly straightforward, it defines a word that pushes an
-address to the stack. The address that it pushes is the same as the
-one returned by "here" after the word is defined. To use this to
-store a cell-sized value (4 bytes), you would to something like this:
- create foo 7 ,
- foo @ . \ 7
- 9 foo !
-This works fine, but always having to specify the extra "@" can become
-a bit repetitive. "value" exists to reduce the amount of calls
-needed. It does this by making the defined word implicitly call "@"
-after putting the address on the stack.
-This poses a new problem: if the word is always calling "@", how do we
-change the value? We can do this using "to", which sets a global flag
-telling the next value-defined word to execute "!" instead of "@".
-Using this, our example above becomes:
- 7 value foo
- foo .
- 9 to foo
-There are two more words that modify the same global variable as "to":
-"to+" and "to'". to+ replaces the implicit "@" with "+!", while to'
-replaces it with "noop", causing the address to be pushed to the
-stack, same as if the word was defined with "create".
-"alias" works very similar to "value", except instead of words just
-implicitly calling "@", they implicitly call "@" then "execute". this
-is useful when you want to (re)define a word later. "to" and similar
-words all behave the same on alias-defined words, replacing the
-implicit "@ execute" call.
-all of these words use different syntax for defining words.
-"create" is once again the simplest, only parsing a name for the new
-word and requiring you to do everything else (such as "allot"-ing
-"value" is nearly the same, except it takes an argument on the stack
-for the initial value.
-"alias" parses two words, first the name of an already defined word,
-then the name of the name of the new word being defined. It does not
-have any effect on the stack.
diff --git a/fs/doc/x86.txt b/fs/doc/x86.txt
@@ -1,18 +1,12 @@
# x86 architecture
-The x86 kernel source code is /dusk.asm. Register roles:
+The x86 kernel source code is xcomp/i386.fs. Register roles:
All other registers are free.
-For now, this kernel needs to run on a Linux kernel and uses its syscalls
-for user interaction and file reading.
-It includes the boot source in its data section and makes boot< point to it
## EBP and PS
Here is a schema of PS with ( 3 2 1 ) in it, 1 being the top
diff --git a/fs/xcomp/pc/init.fs b/fs/xcomp/pc/init.fs
@@ -15,10 +15,10 @@ f<< /sys/grid.fs
-ahci$ ahci? [if]
- ." Using AHCI driver...\n"
- 0 AHCIDrive :new dup bootfs to Filesystem drv ( drv )
- AHCIDrive :enable [then]
+\ ahci$ ahci? [if]
+\ ." Using AHCI driver...\n"
+\ 0 AHCIDrive :new dup bootfs to Filesystem drv ( drv )
+\ AHCIDrive :enable [then]