Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

M68000: implement lineA/lineF opcodes #487

Open
lab313ru opened this issue Apr 21, 2019 · 29 comments
Open

M68000: implement lineA/lineF opcodes #487

lab313ru opened this issue Apr 21, 2019 · 29 comments
Assignees
Labels
Feature: Processor/68000 Status: Internal This is being tracked internally by the Ghidra team Type: Enhancement New feature or request

Comments

@lab313ru
Copy link
Contributor

Sega Mega Drive has special handlers for opcodes that started with 0xF/0xA nibbles. In these cases Sega calls vectors places at 0x0B(LineF) and at 0x0A(LineA) positions in vectors table.

As minimum size of Motorola opcode is 2, and maximum is not defined, handler may parse data that follows after such opcodes in its own way, but, again, minimum size is 2.

Task: implement lineA/lineF opcodes.

@epozzobon
Copy link

I made a small change to 68000.sinc: master...epozzobon:ghidra:68000-lineA-lineF

It's by no means a complete implementation of lineA/lineF opcodes but at least it prevents the disassembler from considering those invalid instructions.

@jduerstock
Copy link

This is needed for m68k "classic" Mac OS binaries as well.

@GhidorahRex GhidorahRex self-assigned this Apr 16, 2024
@eschaton
Copy link

What would be required for at least the A-line portion of @epozzobon's tweak to be merged? It's currently extremely inconvenient to disassemble classic Macintosh code because Ghidra stops disassembly at an A-line instruction, rather than treating it as a single-word instruction that disassembles 0xA123 to something like ATRAP #0x123 and decompiles it to m68k_atrap_0x123();

@jduerstock
Copy link

A little movement on this would be a minimal amount of work and help us retro project people a lot. Thanks.

@hippietrail
Copy link
Contributor

Apple Lisa also uses A-line traps, in a completely different way to how Classic Mac OS uses them. I think I saw them used in another m68k platform like Sinclair QL, Atari ST, or Sharp X68000, but don't quote me on that.

I've done some hacking on Ghidra Loaders and less in Processors. Loaders have programmable options and Processors have a couple of features which might be relevant, such as "conditional compilation" where two related processors are defined in the same .slaspec file, and ability to "include" other processor definition files such that the common part of a family or related processors will be in one file and specialized files for specific family members include the common part and implement the different part.

Based on my limited knowledge I don't think the Loader options will be useful. I think the solution would involve making a new "fork" of the m68k processor for each platform that uses A-line or F-line traps in a different way.

Classic Mac uses them like syscalls but I've read that there are some quirks like the same opcode used in a couple of different ways depending on the value of a register, and of course more were added over time and it's possible some undocumented ones may have changed meaning. I don't remember off the top of my head whether Mac used only the 2 byte of the trap or followed it with more bytes before normal code resumes.

Lisa uses I think four A-line opcodes such as IUJSR where IU stands for "Intrinsic Unit" which is more or less like a segment based on the size of relative jumps on the 68000. Each opcode is 2 bytes and is followed by 2 more bytes which seem to be a parameter.

This all could have an impact on what the disassembly would look like. A fake mnemonic like "ALINE #$0000" would be lowest common denominator. Replacing each call with its own mnemonic might be the ideal but might be difficult or impossible to achieve, or might require too many processor variants.

Ghidra tries to follow the processor spec for its assembly syntax so it might make sense to try to copy what each platform did in their preferred assembler to handle A-line and F-line traps. Then again some platforms already uses really different assembly syntax to that of the official processor docs.

@jduerstock
Copy link

The MacsBug A-trap list is pretty thorough when it comes to how to parse them. How to translate that into something Ghidra understands, well... that's another issue entirely. Still, I'm willing to take a stab at it if I have a few examples.

@hippietrail
Copy link
Contributor

The MacsBug A-trap list is pretty thorough when it comes to how to parse them. How to translate that into something Ghidra understands, well... that's another issue entirely. Still, I'm willing to take a stab at it if I have a few examples.

Have you hacked on the Ghidra codebase before? I'm not a Java guy but I'm finding it not too bad despite having to use Eclipse. I'm having much more of a struggle learning how to hack on processor definitions. The maintainers are very good at helping out through the GitHub discussion section though.

Which part(s) do you want to take a stab at? What kind of examples do you need? My feeling so far is that we can't do much without hacking on the 68000 family processor definitions.

@jduerstock
Copy link

I have not, and Java is a bit alien to me.

I would guess it's also not that far removed from x86 MS-DOS interrupt fix-up/clean-up code, but I have not dug into Ghidra source enough to figure out how/where that is done either.

@hippietrail
Copy link
Contributor

I have not, and Java is a bit alien to me.

I would guess it's also not that far removed from x86 MS-DOS interrupt fix-up/clean-up code, but I have not dug into Ghidra source enough to figure out how/where that is done either.

OK leave a note here when you have a look at it. I'll keep an eye on this thread.

@eschaton
Copy link

eschaton commented Oct 27, 2024

I think one thing that'll be important here is to not let the better be the enemy of the good, or letting the good be the enemy of incremental progress.

That's why I specifically brought up creating a simple "generic" mechanism (e.g. 0xA123ATRAP #0x123m68k_atrap_123();) that would work with most 68K uses of this functionality. Once that's in place, then it makes more sense to start adding things like "Macintosh uses these two thousand traps, each of which takes its arguments in a different way" and "Lisa uses these three traps and they take arguments after the trap."

Eventually of course it'd be great to even have things like Macintosh A-traps decoded to their specific inputs and outputs (e.g. _NewHandle takes the number of bytes in A0 and returns the handle or nil in A0 and an error code in D0), but some of that can be hard to translate to C without faking semantics in ways that don't match the assembly.

For example, the decompiled C could pass a pointer to a local variable to NewHandle representing where the error code goes, despite no local variable or pointer actually being present in the assembly. That would deviate from the traditional C projection of the Macintosh APIs though, plus I'm not even sure Ghidra could represent that.

@eschaton
Copy link

@hippietrail Every Macintosh trap is just a single 16-bit word; the "dispatch" traps you're referring to that have their actual function selected by a value in D0 (or whatever) don't really need to be treated distinctly, since they wouldn't be when authoring or debugging assembly; they only typically had unique handling in C/Pascal for ease of writing code.

@hippietrail
Copy link
Contributor

That's why I specifically brought up creating a simple "generic" mechanism (e.g. 0xA123ATRAP #0x123m68k_atrap_123();) that would work with most 68K uses of this functionality. Once that's in place, then it makes more sense to start adding things like "Macintosh uses these two thousand traps, each of which takes its arguments in a different way" and "Lisa uses these three traps and they take arguments after the trap."

Yep definitely the right approach. The problem for the Lisa is that a disassembler will see two junk instructions. The A-line one and then a random one. I only pushed my basic Lisa Loader a few weeks ago and I'm sure nobody has tried it yet. But let me paste an example:

        4a6ff00c 42 67                                   clr.w           -(SP)=>local_e
        4a6ff00e a0 c0 00 88                             ddw             A0C00088h
        4a6ff012 2f 2d 00 0c                             move.l          (0xc,A5),-(SP)
        4a6ff016 a0 c0 00 80                             ddw             A0C00080h
        4a6ff01a a0 22 02 30                             ddw             A0220230h
        4a6ff01e 4e 5e                                   unlk            A6
        4a6ff020 20 5f                                   movea.l         (SP)+,A0
        4a6ff022 5c 4f                                   addq.w          #0x6,SP
        4a6ff024 4e d0                                   jmp             (A0)

Eventually of course it'd be great to even have things like Macintosh A-traps decoded to their specific inputs and outputs (e.g. _NewHandle takes the number of bytes in A0 and returns the handle or nil in A0 and an error code in D0), but some of that can be hard to translate to C without faking semantics in ways that don't match the assembly.

For example, the decompiled C could pass a pointer to a local variable to NewHandle representing where the error code goes, despite no local variable or pointer actually being present in the assembly. That would deviate from the traditional C projection of the Macintosh APIs though, plus I'm not even sure Ghidra could represent that.

I think things like that would belong at a higher level, probably an Analyser, but I haven't looked into those yet. I just wanted to add some thoughts from the bits I have explored as I'm not sure how much people following this issue know about Ghidra's guts in comparison to how well they know their platform of choice.

Some of this would be quite similar to other processors with syscall type functions except that those probably each have a convention used by all their platforms where A-line traps get used in ad-hoc and incompatible ways.

The easy way to get something working for classic Mac is to just duplicate the m68k processor into a new extension, change the names, and add two instructions. That'd "just work" for disassembly but for decompiling it's going to break I think without some basic P-code for data flow.

@hippietrail
Copy link
Contributor

I went ahead and hacked this up quickly:
image
I'm guessing the first atrap at $1e2 is some kind of "exit" call. Note that as I added no p-code that the decompiler window has no code for this and just runs into the following function.

Let me know if you want me to put this up on GitHub.

@eschaton
Copy link

eschaton commented Oct 28, 2024

0xA9F4 is, in fact, the _ExitToShell trap on Macintosh. Does this patch also ensure automatic disassembly continues through an A-trap? If so, ship it. (IMO anyway.) Generating p-code for m68k_trap_9f4(); or whatever sounds like a good next step, not a blocker for integrating.

@hippietrail
Copy link
Contributor

0xA9F4 is, in fact, the _ExitToShell trap on Macintosh. Does this patch also ensure automatic disassembly continues through an A-trap? If so, ship it. (IMO anyway.) Generating p-code for m68k_trap_9f4(); or whatever sounds like a good next step, not a blocker for integrating.

It basically does nothing other than recognize the bit pattern and convert it to a readable mnemonic and operand. So disassembly continues just as with any other recognized instruction.

I have not even begun to learn Ghidra p-code yet. It looks like more of a mystery than the sleigh for disassembling did before I started, and I've only learned a subset of that so far.

I also haven't learned much git yet so trying to figure out how to make it its own repo that's a proper fork of just the m68k support from Ghidra so they don't drift apart, while also being part of a GhidraDev repo so it can be shipped as a proper Ghidra extension with releases anyone can download and use the regular way.

I think I just read somewhere that there's a special command for unimplemented p-code.

If you can't wait and you're ready to use it straight from a raw git repo let me know and I'll make a quick and dirty temporary one.

@GhidorahRex GhidorahRex added the Status: Prioritize This is currently being prioritized label Oct 28, 2024
@hippietrail
Copy link
Contributor

I've just published the forked 68000 with A-Line trap support on GitHub now. I tried hard to keep the 68000 files from Ghidra properly set up as a git fork with full history and to be able to keep new changes in sync, but it looks like it didn't work. So this might get redone at some point. Git help would be appreciated.

https://github.com/hippietrail/Motorola68000ALine

@hippietrail
Copy link
Contributor

I've added support for Lisa-style A-line traps with 28-bit operand to the Class-Mac-style with 12-bit operand.
image
image
Unfortunately I'm having so much frustration with Git that I can't figure out how to push the changes. It's likely the way it's set up now is broken and I'll have to replace this repo at some point. Oh well as I said for now it's just a hack and a proof-of-concept to show something can be done to address this problem.

@eschaton
Copy link

Thanks, as a proof-of-concept it's fine. I think what's needed is a proper Ghidra fork with a "Macintosh" variant of the 68000/68020/68040, similar to how there are Real Mode and Protected Mode variants for x86. I assume those aren't full clones of the x86 CPU handlers but options that can be passed through the CPU handler mechanism and adjust its behavior.

@eschaton
Copy link

OK, I got set up to build Ghidra and it looks like @epozzobon's tweak to 68000.sinc is sufficient to get the behavior that I (at least) want. Everything beyond that is gravy (for my purposes). :)

@hippietrail
Copy link
Contributor

Thanks, as a proof-of-concept it's fine. I think what's needed is a proper Ghidra fork with a "Macintosh" variant of the 68000/68020/68040, similar to how there are Real Mode and Protected Mode variants for x86. I assume those aren't full clones of the x86 CPU handlers but options that can be passed through the CPU handler mechanism and adjust its behavior.

Yes I have a patch of the built-in 68000 support in Ghidra also working and feedback from the maintainers so far makes it look like they're amenable to it. I'm a bit worried about number of CPU combinations exploding. I've only included A-line traps so far since neither Classic Mac or Lisa use F-line traps that I know of and it could well be that a platform that does might uses them as differently again as those two platforms.

@hippietrail
Copy link
Contributor

Here's some info on how TI's graphing calculators used them. I'll gather more here for other platforms as a handy reference. https://www.omnimaga.org/other-calculator-discussion-and-news/best-ti-89-(titanium)-shell/15/#msg_116730

On the TI-68k, A-Line are used by AMS for throwing exceptions / errors. F-Line are unused from AMS 1.00 to 2.03 and create an error; from AMS 2.04 onwards, F800 + n calls ROM_CALL number n. Lately, other F-Line instructions were added to PreOS / PedroM (F-Line RAM_CALLs, etc.). On all AMS versions, Line 1010 or Line 1111 errors mean that the processor executed an A-Line or F-Line that was not caught by the error handlers, i.e. the calculator completely crashed (it started executing data or whatever).

@eschaton
Copy link

eschaton commented Nov 2, 2024

All of this is why Ghidra CPUs appear to have the concepts of variants and compilers. There shouldn't be an explosion of CPUs, just different variants of the existing CPUs for Macintosh, Lisa, Atari ST, SEGA Genesis, TI 89, etc. and of course a generic variant.

What I'm suggesting here is that Ghidra take the generic variant and then add other variants under other issues as time allows, rather than try to figure out all the variants that are needed and how to fit them all together before taking anything.

And then perhaps we can also start looking at adding the different compilers to ensure the different ABI conventions can be accommodated.

@hippietrail
Copy link
Contributor

In PalmOS the A-line traps were used in combination with the trap instruction with the operand 4F. In Pascal the Trap opcode + operand was named SYSTRAP and the A-line traps were not named but the combinations were inline functions. This may have differed in C or ASM but I haven't found that yet. Here's a pascal file from a Palm SDK: http://zigloo.ch/index.php/PalmAPI2.pas

@hippietrail
Copy link
Contributor

hippietrail commented Nov 2, 2024

What I'm suggesting here is that Ghidra take the generic variant and then add other variants under other issues as time allows, rather than try to figure out all the variants that are needed and how to fit them all together before taking anything.

And then perhaps we can also start looking at adding the different compilers to ensure the different ABI conventions can be accommodated.

Well I'm not a Ghidra employee. I'm just doing this for fun. So I guess we're working asynchronously (Ghidra people and me). I'll continue researching how different platforms used these features. The Ghidra people will decide what goes in the release.

One problem with a generic-variant-only approach is that it will always break decompilation and it will break disassembly on Lisa. That's just for A-line.

The generic approach for F-line is to leave it just for the legit FPU instructions, even on 68000. Ghidra's current approach to 680x0 is that 68040 is basic and there's optional stuff of 68020, 68030, and ColdFire, and always decode F-line instructions as FPU instructions. That breaks X68000, TI, and Atari ST.

I would suggest that even a minimal approach would involve A-line and F-line being separate features.

I just found out that there was a duplicate of this request in 2019 that included valuable notes on how to get A-line traps working even for decompiling for a subset of Apple Macintosh because it turns out some of the bitfields encode semantics of the operations: #140

@hippietrail
Copy link
Contributor

There's a full official Atari document on the A-line traps in the ST line of computers on the web archive: https://archive.org/details/rearc_atari-st-e-tt-toolkit-b1-31-still-another-line-a-document-salad-1987-12-17

@hippietrail
Copy link
Contributor

Here's some decent documentation on the Sharp X68000's use of F-line traps: https://gamesx.com/wiki/doku.php?id=x68000:doscall

@hippietrail
Copy link
Contributor

I've checked all the 680x0 platforms I know of other than Unixes, CP/M, and TRSDOS (maybe Xenix?). And no others use the Apple Lisa style or any other exotic style. I also can't find any substantial evidence of any assemblers using special pseudo-instructions for either, so we can pick arbitrarily.

So I think we just need to be able to opt in to A-line and F-line instructions individually.

When F-line is opted in, we'll probably need to opt out of FPU F-line instructions since they may clash. From what I can see, all FPU F-line instructions are inside @ifdef 68030 or @ifdef 68040 already

When A-line is opted in, we'll have to opt out of ColdFire A-line instructions.

If it turns out some platforms used their F-line extensions on 68020, 68030, ColdFire, or alongside FPU emulation, those would require more work to check for clashes.

@sarnau
Copy link

sarnau commented Nov 4, 2024

BTW: The Classic Mac is particularly bad/complicated, because it uses different Line-A conventions: register based vs. stack based. AND the stack based is using a pascal calling conversion (parameters in reverse order, the return value is also on the stack). AND it optionally can clean up the stack pointer during the call. Thankfully it is all very well documented in the Inside Mac documentation.

@hippietrail
Copy link
Contributor

BTW: The Classic Mac is particularly bad/complicated, because it uses different Line-A conventions: register based vs. stack based.

For normal subroutine-calling opcodes I believe there are calling conventions and those can relate to the "compiler" part of a language spec. But I've been wondering for a while if there is such a thing for trap/svc/swi/int type instructions.

I think this is going to be a problem for decompiling. I'd like to hear from Sleigh experts. Disassembling is the easy part.

@GhidorahRex GhidorahRex added Status: Internal This is being tracked internally by the Ghidra team and removed Status: Prioritize This is currently being prioritized labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Processor/68000 Status: Internal This is being tracked internally by the Ghidra team Type: Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants