duskos

dusk os fork
git clone git://git.alexwennerberg.com/duskos
Log | Files | Refs | README | LICENSE

commit 4595025cbc70bf5d61d7a98cedaa7ac860c54636
parent 8a80671fbe99ef4c96023b52febc1cd87e70644d
Author: Virgil Dupras <hsoft@hardcoded.net>
Date:   Sat,  7 Jan 2023 21:13:19 -0500

doc/design/simple: update

Diffstat:
Mfs/doc/design/simple.txt | 32+++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/fs/doc/design/simple.txt b/fs/doc/design/simple.txt @@ -14,12 +14,12 @@ reference when comparing complexity is Fabrice Bellard's Tiny C Compiler. Tcc enjoys a very good reputation among geeks, and Fabrice Bellard is generally considered to be a genius. Nevertheless, Dusk's C compiler, excluding backends, -is 1200 lines of code and tcc, excluding backend is roughly 30,000 lines of +is 1300 lines of code and tcc, excluding backend is roughly 30,000 lines of code. At the time of this writing, Dusk CC isn't quite completed yet, but there isn't much left to add, I don't think it will exceed 2000 lines by much. The i386 backend of Dusk CC, including its assembler, is 600 lines of code. In -tcc, the i386 backend weighs in at 3800 lines of code. +tcc, the i386 backend weighs in at 1600 lines of code. How can we explain this difference? It's true that Forth code is generally denser than C, but not by a factor of 15. It's true that I'm sometimes clever, @@ -37,11 +37,7 @@ see some of the complexity associated with computing as unavoidable. It's not. That is why Forth's approach to simplicity is revolutionary, because it removes a blindfold. -(TODO: there used to be a comparison between DuskCC's macro system and tcc's -pre-processor, but the macro system since changed significantly and that -comparison didn't hold. Re-compare when the new macro system is completed.) - -A third simplicity factor is parsing boilerplate. Tcc's assembler's input is +Another simplicity factor is parsing boilerplate. Tcc's assembler's input is text formatted in GNU assembler format. This parsing boilerplate is a significant part of tcc assembler-related complexity. This contraint in UNIX is inevitable because inter-process communication in UNIX generally has to be done @@ -50,3 +46,25 @@ serialization and deserialization boilerplate at multiple levels. In Forth, memory is shared and no such constraint exists. Words communicate through structured memory. We can thus afford to sidestep this complexity and use regular Forth words to assemble binaries. + +We have a good example of the kind of constraints UNIX imposes on programs by +looking at i386-gen.c. In there, we see that the assembler included in tcc isn't +used. Instead opcodes are directly generated in binary format. This makes sense +because proceeding this way is simpler than generating textual assembler syntax +in a buffer and then processing it through the assembler. + +It is unfortunate, however, to have to do this because it makes the code more +cryptic than in has to be. DuskCC can freely use the words from its assemblers +and doesn't have to go through a clunky text interface. + +Inline assembly is another interesting one. Yes, it's nice to have the option to +add inline assembly to a C unit. To this end, tcc includes an assembler that +weights 1200 lines with 2100 lines for the i386 backend. It's nice, but DuskCC +doesn't need it because we already have the ability to mix and match C and Forth +words freely and Forth words can be written in native code. We simply don't need +this, we have this feature for free. + +In conclusion, I think we can say that DuskCC is simpler than TinyCC because it +has less features. Some of these features would, if implemented in DuskCC, make +it nicer, but a good chunk of them are simply not needed in a Forth world. That +is what I call "side-stepping".