-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decompiling DOS MZ EXE to C #1234
Comments
Ghidra has some limitations with real mode, but it is a good option overall. But ghidra is not focused on DOS support or real mode support. |
Thank you.
Do you know what these limitations are? I'm playing with it now and potentially see issues with pointers not being converted correctly. |
IIRC, mainly detecting the real end of functions. Most of the time it works. Sometimes it doesn't. And also this kind of issues (instruction dissasembly in real mode not working sometimes) : |
First off, thanks for posting the issue. At the present time, no decompiler that I'm aware of generates compileable code except for the most simple binary files. Decompilation is much harder than compilation, as it is trying to reconstruct information that is destroyed during the compilation stage. User assistance is critical; user-provided type information in particular vastly improves the output. Reko is maintained by a small number of contributors working in their spare time, which forces us to a "implement on demand model". We depend on a cooperative process of users asking for features and binaries exhibiting problems, and the Reko contributors providing those features "just in time". So if you're finding features missing in Reko, please continue reporting them so we can evaluate and provide implementations.
By default, Reko generates a separate C file for each segment it decompiles. If segments are larger than 64 kiB, it will break them up into 64 kiB chunks. This is done to avoid nightmarishly large files when decompiling large files. You can change this behavior by loading a file, selecting
The #include "basic_types.h"
#include "myexe_seg0800.c"
#include "myexe_seg0C13.c" Inside of typedef short int int16;
typedef int int32;
typedef float real32;
// etc... A case could be made for using the The various unresolved The
These are remnants of the Reko intermediate language that have not been cleaned up. This is very likely due to a failure in the data flow analysis phase, but without seeing the binary it is impossible to determine why. Consider making the binary available so I can debug it.
This is another case of the "implement on demand" process I write about above.
Reko has only implemented the first two -- but it's easy to add support for the others. However, even after implementing those sub-services, Reko requires knowing what the value in Consider providing the names of the missing
The best way is to provide Reko with type information. You can specify the type of global memory variables (select a chunk of memory and select Thanks again for taking the time to write this issue. I will close this issue, and open other more specific and actionable issues. |
Thank you very much for your detailed reply. It's much appreciated. Here is the the EXE for your further analysis:
Sure:
Ok thanks. I will start looking deeper into this. That I guess will need a debugger to work this out given the information at hand. To add to the on demand list:
|
Just some quick experiences with setting the sigs and data types. It would be much better to be able to do this via shortcut or right click context menu in the C decompiler view. Also, it seems that after I set a function sig, I need to re-analyse the project, which is a little unexpected. Many thanks. |
I tried recko to decompile a DOS MZ EXE to C. After loading and analysing the EXE I was able to export the code to C. It didn't come in one C file, which was weird. So I tried merging the C files together. After looking at the code and trying to compile it with gcc I ran into all sorts of issues. First were many types were not defined:
Second I see many function calls in the files that are not defined e.g
SLICE, SEQ, Test etc.
Also many dos_* function calls, including msdos_unknown_2143. Of course I can try to port the msdos_* to POSIX, but msdos_unknown_2143 is very unknown!
Lastly I see many many occurrences of
<type-error>
,<code>
,<invalid>
and<anonymous>
.Could someone kindly please explain the process I should go through to try and get clean code to compile. I'm starting to think I will need to manually RE the whole programme now, which is 15k lines of code from a 53kb exe. Maybe Ghidra would be a better tool for this?
The text was updated successfully, but these errors were encountered: