Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C version #4

Open
Kroc opened this issue May 24, 2024 · 16 comments
Open

C version #4

Kroc opened this issue May 24, 2024 · 16 comments
Labels
help wanted Extra attention is needed

Comments

@Kroc
Copy link
Owner

Kroc commented May 24, 2024

v80 is intended to allow development of software for 8-bit systems (both modern and historic) using the same source code and toolchain on both PC and the retro system itself. A system that can't fix and deploy its own software isn't a computer, it's an appliance.

At the moment, v80 is written in WLA-DX, a very good C89 assembler for many processors. Once complete, the goal is for v80 to assemble itself. Using emulators as part of a build process is unfortunately rather clumsy and fraught with problems getting the automation to work.

I wish for there to be a C version of v80 that can assemble v80 source files on PC the same as the Z80 version can on retro systems. That way, developers can have the best of both worlds by keeping their GitHub-driven development but not exclude the ability to develop and build on the retro system itself.

These are the requirements for a C version of v80:

  • Must be portable C89. It should, in theory, be compilable on ancient systems such as MS-DOS
  • Use no dependencies and require the minimal amount to be installed to build;
    if the compiler is small and self-contained enough to be included in the Git repository, even better
  • It should not be a direct translation of the Z80 version's code.
    It should do things in a natural "C" way and follow the syntax defined in the ReadMe
@Kroc Kroc added the help wanted Extra attention is needed label May 24, 2024
@Kroc Kroc pinned this issue May 24, 2024
@Kroc
Copy link
Owner Author

Kroc commented Oct 2, 2024

I had some time to think about the handling of multiple ISAs and whilst, naturally, the 8-bit native versions will utilise a separate executable for each ISA for memory reasons, I believe this should also be the case for C (let's call the C-port "c80" for shorthand) for a handful of reasons. Whilst it wouldn't be difficult for the C version to support multiple ISAs in the executable, each ISA does have unique parsing and opcode emitting rules (e.g. Z80 has shadow registers, 6502 doesn't) and because of this I don't want the ISA tables to be external; they should be compiled into the executable. I say this because I think the solution to v80/c80 building its own ISA tables can be resolved without requiring an intermediate format or a custom definition syntax (as you've currently done in your C code) by compiling/assembling a version of v80/c80 without instruction support and using that to assemble the ISA table. This can be done both in C and Z80. The definition syntax (.m) only duplicates ISA definitions and I'd rather avoid c80-specific syntax / increasing workload for anybody wanting to add new ISAs.

That said, the format of the ISA tables (v2-ISA has been merged with main) will be slightly altered to be consistent and to be embeddable as a pre-assembled binary. The 26 word "a"-"z" jump table will be moved to the beginning of the ISA table and will be changed to offsets into the table rather than absolute addresses. This'll mean that the table can be places anywhere in RAM/ROM on an 8-bit system without having to be assembled with v80. v80 doesn't support binary includes yet, but that's something that may yet be added.

Now that I've pushed the v2-ISA parser to main and I'll be modifying the v2-ISA format as above, be aware that you'll need to pull from GitHub before you commit changes! I'd like if you could rename "bootstrap" to just "c" as this is clearer and the v0 WLA-DX code will remain as a tried-and-tested "source of truth" for ISA opcode verification.

@Kroc Kroc assigned Kroc and unassigned Kroc Oct 2, 2024
@gvvaughan
Copy link
Collaborator

I think you're asking for the C version to use the isa_z80.v80 table in memory to lookup what opcodes to write when encountering an assembly instruction?

But, the C version already correctly assembles z80 and 6502 ISAs without the need to artificially split it into two binaries. And the tbl_*.v80 ISA tables are much simpler to write and to parse than the assembly lookup tables from v2. This makes bootstrapping a new ISA from the C version a simple matter of creating a new e.g. tbl_6809.v80, and then using that to assemble a 6809 native assembly assembler. No need to simultaneously write a new isa_6809.v80, and a new parser for it in assembly and the assembler itself...

If you strongly prefer a separate binary for each C bootstrap assembler, the easiest way to do that is to convert the tbl files into a C string array, compile those into each binary and always load only the table in each binary's memory instead of picking an ISA at run time by loading the appropriate mnemonics from a file. I think writing a C parser for the assembled in-memory isa_*.v80 is chasing a moving target and much more work every time you want to add support for a new ISA.

Of course, I'll stop working on the tbl->isa table generation if you don't want to use that approach?

And, by all means, please go ahead and rename the directory or make any changes you would like to the C code. Or I can make a PR for that after my last one is merged if you'd prefer :-)

@Kroc
Copy link
Owner Author

Kroc commented Oct 2, 2024

Ultimately the goal of v80 is to enable native development on 8-bit machines without dependence on PC infrastructure. That includes developing and extending v80 itself. If v80 requires C to process tbl_* files then ultimately we've got an 8-bit CP/M program that requires 10+GB of OS and compiler tooling :P The C version is secondary to the 8-bit versions, not the other way -- yes, things can be done better and more intelligently in C, but that isn't part of the goal. The C version should bend to suit the 8-bit version, not the other way around, sorry :( That is not to say that my understanding isn't incorrect or incomplete though, it takes me time to absorb code and I need to spend more time with the C code to process what's the right approach.

@gvvaughan
Copy link
Collaborator

gvvaughan commented Oct 2, 2024

I'm not convinced that hand-rolling thousands of complex assembly lines and manually updating an ISA specific parser in C is a good use of time and effort. Doubly so as you improve the v80 table layouts and parsers that will all need to be tracked in the C version for it to keep working. That's not to say that you can't still hand-roll a massive lookup table for each assembly assembler if you want to... but either way, whether it's generated from a simple input format, or lovingly created by hand, it's a checked in file that doesn't have to be rewritten every time you build the assembler and makes no difference to use of the assemblers on 8-bit platforms.

Let's say you want to tweak the assembler lookup table layout in memory in future... you can either adjust the C code that creates it, regenerate and commit the result, or you can manually update the actual isa lookup table and commit that. The result is the same. I certainly agree that I shouldn't write a table generator that won't be used. I'm offering to write it so that bootstrapping a new ISA would only require creating a simple tbl_*.v80 file, and then running the C assembler on new ISA assembler sources as they are built out until it's self hosting -- that saves having to create all the moving parts in parallel.

I also agree that ultimately each ISA will need its own tbl-to-z80/tbl-to-6502/etc program since each ISA has a different lookup table format. I do, however, strongly disagree that the C bootstrap assembler should be tied to the individual memory layouts of each ISA lookup table. That will be significantly more difficult for both of us to maintain. I have a mild preference for one simple ISA definition file as a source of truth, but since they are very quick and easy to write, if you much prefer to hand maintain individual v80 isa_*.v80 instruction lookup tables and parsers for current and future architectures, then I'm not at all unhappy with maintaining the tbl files in the c/ tree that let the C assembler work on any tbl-defined instruction set without recompiling. Either way, I'm certain that you'll find adding a tbl_*.v80 file for a new ISA to the c/ directory as you bootstrap into self hosted assembly language assemblers will be much easier for you than writing the new isa_*.v80, while simultaneously designing the memory layout and bit flags, and a parser as well as creating the assembler itself.

@Kroc
Copy link
Owner Author

Kroc commented Oct 2, 2024

We are probably talking somewhat at cross-purposes; neither quite fully understanding the other. I'll try respond to this as I understand it;

Let's say you want to tweak the assembler lookup table layout in memory in future... you can either adjust the C code that creates it, regenerate and commit the result, or you can manually update the actual isa lookup table and commit that. The result is the same

Regeneration is not an option (for 8-bit native). It can't be done on an 8-bit system (unless compiling the C code on CP/M?). The goal is no reliance on a PC if the developer so wants, including developing new ISAs for v80. Otherwise what's the point in owning real hardware? :) The C version is not without use; it is better to automate an array of host CPU + ISA combinations, i.e. producing binaries for all versions of v80 in seconds rather than a minute each, manually, on real hardware. But if I want to write a from-scratch OS on a real Z80/6502, a PC shouldn't be involved once I have the first binary and the source code to rebuild it.

I'm not convinced that hand-rolling thousands of complex assembly lines and manually updating an ISA specific parser in C is a good use of time and effort

No, but making yet another Z80 assembler isn't either. There are goals other than complete efficiency. You know what they say about early optimisation. Nobody is even using the thing yet. It will evolve into something better in due time and it's better to let experience in practical use guide improvements than chasing code purity. The parser is 99% the same code, with a tiny bit of logic for ISA-specific quirks. For Z80 this is <128 bytes of Z80 code; for 6502 this is 32 bytes. ISAs will not be added quickly. Far more work is involved in tailoring the v80 source code to specific 8-bit HW / OSes like CP/M. Sometimes a little bit of work now and again saves a lot of work abstracting away the problem.

Doubly so as you improve the v80 table layouts and parsers that will all need to be tracked in the C version for it to keep working.

I've settled on the ISA table layout. This will not be changing in any meaningful way. The way v1-ISA worked was the only way I could get it to work early on when I had an incomplete assembler, it was ugly and very difficult to write and follow but it was necessary to move on to more important things. v2 is how I always wanted it to work.

I certainly agree that I shouldn't write a table generator that won't be used. I'm offering to write it so that bootstrapping a new ISA would only require creating a simple tbl_*.v80 file, and then running the C assembler on new ISA assembler sources as they are built out until it's self hosting -- that saves having to create all the moving parts in parallel.

I understand a PC-first attitude is normal, but none of that is possible from an 8-bit machine. If I can't modify and assemble the assembler on an 8-bit machine then we're not even Turing-complete. Could we also generate these files on the machine? Probably. Maybe that'll come down the line when there is more demand to do so, but I fear you're overestimating the need to pump out ISAs :P

I also agree that ultimately each ISA will need its own tbl-to-z80/tbl-to-6502/etc program since each ISA has a different lookup table format. I do, however, strongly disagree that the C bootstrap assembler should be tied to the individual memory layouts of each ISA lookup table. That will be significantly more difficult for both of us to maintain. I have a mild preference for one simple ISA definition file as a source of truth, but since they are very quick and easy to write, if you much prefer to hand maintain individual v80 isa_*.v80 instruction lookup tables and parsers for current and future architectures, then I'm not at all unhappy with maintaining the tbl files in the c/ tree that let the C assembler work on any tbl-defined instruction set without recompiling. Either way, I'm certain that you'll find adding a tbl_*.v80 file for a new ISA to the c/ directory as you bootstrap into self hosted assembly language assemblers will be much easier for you than writing the new isa_*.v80, while simultaneously designing the memory layout and bit flags, and a parser as well as creating the assembler itself.

You are not wrong and this is a hard path to navigate. I can see the benefit of having a way to define an ISA that isn't itself code but I will need much time to ruminate on how to reconcile the differences in approach with the extremely strict limitations and minimalist goals of the 8-bit code. New ISAs will not be appearing quickly, there is much work to do in platform enablement; I do not work quickly. I am slow, methodical and often wait for a perfect solution to pop in to my head once I've digested all the information necessary and left it to stew for a while.

There are two paths I would like to follow with the 8-bit code. One will be expanding the Z80 ISA to eZ80 so as to support the Agon Light and then, eventually, a native 24-bit port of v80 in eZ80. The second is to write a version of v80 that runs on 6502 platforms. I'm expectant that porting the entire code base will enlighten me to any abstraction needed and I'd rather wait until then before jumping-the-gun on how ISAs should be portable.

@gvvaughan
Copy link
Collaborator

We are probably talking somewhat at cross-purposes; neither quite fully understanding the other

You could very well be right. Rather than continuing along that road, I think I have a more concise argument for you:

  • The only reason to have the C version at all is to enable bootstrapping an 8-bit assembler binary from your sources without the need for a CP/M emulator or WLA-DX. Once you have a working 8-bit binary on your 8-bit machine, you can continue working on an 8-bit machine. One way to get a working 8-bit binary is to assemble the sources for the first time with the compiled C version of the assembler, and after that I don't think it's needed again.
  • The best way to have the C version stay useful after feature completion is for it to be completely self contained. It shouldn't rely on parsers or memory-blobs from various existing or upcoming 8-bit assembly sources, otherwise it will need ongoing maintenance and debugging as the 8-bit sources grow and improve. Even without Use a radix tree instead of a hash table, to support in-order traversal #15 to switch from hash tables to radix trees for the symbol table, what's on the c branch is already standalone and feature complete (and arguably, Use a radix tree instead of a hash table, to support in-order traversal #15 should be closed without merging if in-order symbol traversal is not going to be used, since the hash table is simpler and smaller). I think adding anything else makes it less useful as a bootstrap tool, and harder to maintain.
  • Inevitable bugs aside, apart from supporting new features that are needed to assemble the 8-bit sources, I don't expect very much work at all should be required on the C version in future. If I learn about reproducible issues that prevent compiling with an 8-bit C-compiler, I think those are worth fixing too.

All of that aside, my ulterior motive in writing it this way is because I have been fiddling with emulators for fantasy computers with self designed ISAs, and getting myself past the "I need an assembler to write programs to evaluate whether the ISA design is good" stage has been a (years long) pain for me. Especially as half way through writing the self-hosted assembler, I can't resist improving the ISA... throw away all the hand assembled code so far, and having to start over. The existing C version is already almost perfect to get me past that stage (I just need to add support for prefix arguments), because tweaking the ISA is a simple matter of tweaking the tbl file rather than having to start hand assembling hex codes from scratch again whenever I change my mind. Of course, I have no plans to burden v80.c with code that doesn't help your project, but I do expect to fork it, remove polyfills to simplify for C11 and use the result for this upcoming Masto #DecemberAdventure :-D

@Kroc
Copy link
Owner Author

Kroc commented Oct 4, 2024

  • The only reason to have the C version at all is to enable bootstrapping an 8-bit assembler binary from your sources without the need for a CP/M emulator or WLA-DX.

I would also add to that rapid prototyping on modern systems / IDEs. There are a couple of integration issues c80 can solve too; for the different builds of v80 across different platforms there will be a mix of shared source code (the v80 core) and platform-specific code, on top of which I would like to produce a set of optional libraries for interfacing with specific platforms (e.g. Apple II, C64, MSX etc.) -- I can't just put all of these files in the one directory! Whilst v80 can't support sub-directories due to 8-bit system limitations, c80 can include additional folders from the command parameters to produce on-the-fly builds without a mass of duplicating source files for each build. For native 8-bit builds, the necessary files will be copied for each platform release since the user is likely only using one platform.

@Kroc
Copy link
Owner Author

Kroc commented Nov 18, 2024

I want to use a portable C compiler to be able to compile c80 on Windows without the user/devloper having to install LLVM/GCC/MSVC. I've created a branch, "tcc" where I've add the binary for the Tiny C Compiler and the basics of the command to "build.bat" but I can't get it to compile; it errors on line 24 in "polyfill/stdio.h" with "invalid type".

image

Is this something you would be able to look at? Do you have Windows at all? if not, if you could devine the correct compiler invocation on Linux I can test that on Windows.

@gvvaughan
Copy link
Collaborator

Hey @Kroc ! No, I haven't touched a Windows machine in almost 30 years. And tcc is incompatible with macOS Catalina (the 2019 release) or newer macOS... so not a terribly portable compiler :-)

$ brew install tcc
tcc: This formula either does not compile or function as expected on macOS
versions newer than Catalina due to an upstream incompatibility.
Error: tcc: An unsatisfied requirement failed this build.

Can you show me the error message? If it's just that size_t is not supported, I already have a shim for that which can be adjusted on Windows. If it's FILE, then that means the entire C stdio library is unsupported, which is a much bigger problem.

@Kroc
Copy link
Owner Author

Kroc commented Nov 18, 2024

I already added -DNO_CTYPE_H for size_t, but yes, I think it's FILE. I guess I'll have to try a different compiler :P

@gvvaughan
Copy link
Collaborator

gvvaughan commented Nov 18, 2024

Because C headers are organized hysterically, <ctype.h> is a different thing. Hopefully you can fix it by adding -DNO_SIZE_T to get the shim I defined in https://github.com/Kroc/v80/blob/c/c/polyfill/stdio.h#L9 ? (technically I should have put the size_t definition in <stddef.h>, but since there's only one use in zgetdelim I didn't think it was worth an extra header for just that one thing)

The compiler itself might be fine, but you need to point it at the headers for a separate C standard library. I think Windows probably only ships the runtime and not the headers or libraries, so you might need to install VC++ any way? Or there might be some 3rd party libc you can link against on Windows without needing to install MSYS or similar? Do any of the vintage DOS environments like Turbo-C include a compiler and libc? Or I could be talking absolute rubbish after mostly ignoring Windows for so long.

@gvvaughan
Copy link
Collaborator

Actually, looking at your new tcc branch, it looks like there's some kind of linker script for tcc to link against the visual C runtime, and it refers to fopen and friends, which return FILE *: https://github.com/Kroc/v80/blob/tcc/bin/tcc/lib/msvcrt.def#L1185-L1188

Although there's some shenanigans with Annex K of the C23 standard, where Microsoft wants everyone to use their fopen_s (and friends) instead.

Anyway, I feel like you should be able to install MSVC, and have pcc link against that. And that would almost certainly be a lot less work than implementing a shim for fprintf ;-) ref: https://www.ijs.si/software/snprintf/ just reminded me that I once wrote a POSIX compliant printf library, so there might be hope if that's the route you prefer.

@Kroc
Copy link
Owner Author

Kroc commented Nov 18, 2024

Hmm. Thanks for looking into this. I don't know C unfortunately, I can read it, but I can't fathom how the C libraries work, I just assumed TCC included everything needed :/ There is a recommendation that the headers from MinGW can be used, and some kind of hint in "docs/win32-tcc.txt" that msvcrt can be linked with those def files but I don't understand if that requires more. Surely Windows ships with msvcrt.dll???

@gvvaughan
Copy link
Collaborator

You're most welcome!

Generally, to compile a C program you need a C compiler (and possibly additional toolchain binaries like an assembler and linker if the compiler doesn't come with those), and a development environment that includes the libraries themselves (msvcrt.dll and maybe other libs it depends on) & the headers for those libraries that tell the compiler what API symbols can be used by linking against the libraries.

If we can assume that msvcrt.dll provides the runtime library, and pcc provides preprocessor, compiler, assembler and linker, then only the headers are missing from your environment.

From the docs file you linked:

    Header Files:
    -------------
    The system header files (except _mingw.h) are from the MinGW
    distribution:

	http://www.mingw.org/

    From the windows headers, only a minimal set is included.  If you need
    more,  get MinGW's "w32api" package.  Extract the files from "include"
    into your "tcc/include/winapi" directory.

mingw.org is no more, but looks like you might be able to extract the necessary headers from one of the tarballs linked here: https://osdn.net/projects/mingw/releases/74926 ...and then put the headers in tcc/include/winapi to make more progress with your build.

The only other issue I foresee is whether tcc, msvcrt.dll and the mingw32 header files all agree on whether to use a 32 or 64 bit ABI...

(I don't fully understand the instructions about generating def files, that's a tcc peculiarity... but probably necessary to get around Windows not making headers readily available, but leaving enough symbol info in the dll itself that this is a workaround somehow -- but that doesn't explain why the msvcrt.def file isn't enough to replace the necessary headers for a standard compilation run?)

@gvvaughan
Copy link
Collaborator

gvvaughan commented Nov 18, 2024

Wait, no, I take all that back. It seems like there are a standard set of headers provided with tcc here: https://github.com/Kroc/v80/tree/tcc/bin/tcc/include and they seem to include both size_t, FILE and all the usual stdio trimmings. I don't understand enough about .bat files or tcc to see what needs changing to have the compiler use those headers though, sorry!

@Kroc
Copy link
Owner Author

Kroc commented Nov 18, 2024

No, that's quite alright. If I can't get it to work, I can always compile with GCC/VS and include a binary in the repo for Windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants