-
Notifications
You must be signed in to change notification settings - Fork 14
Automatically exported from code.google.com/p/libdasm
License
jtpereyda/libdasm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
libdasm -- simple x86 disassembly library ========================================= Copyright (c) 2004-2007 Jarkko Turkulainen <jt / nologin.org> <turkja / github.com>) Copyright (c) 2005 Ero Carrera Ventura <ero / dkbza.org> (Python binding) Copyright (c) 2006 Matt Miller skape <mmiller / hick.org> (Ruby binding) Copyright (c) 2009-2010 Ange Albertini <angea / github.com> Copyright (c) 2015-2018 Joshua Pereyda <jtpereyda / github.com> 1. Acknowledgments =================== Thanks to skape, thief, spoonm, Silvio Cesare, Georg oxff Wicherski, BeatriX and fine folks @nologin for bug reports, ideas and support. Special thanks to ero for creating and contributing pydasm and to skape for rbdasm. 2. What is libdasm? =================== libdasm is a C-library that tries to provide simple and convenient way to disassemble Intel x86 raw opcode bytes (machine code). It can parse and print out opcodes in AT&T and Intel syntax. The opcodes are based on IA-32 Intel Architecture Software Developer's Manual Volume 2: Instruction Set Reference, order number 243667, year 2004. Non-Intel instructions are not supported atm (also, non-Intel but Intel-compatible cpu extensions, like AMD 3DNow! are not supported). libdasm should compile with all decent C-compilers (only gcc and MSVC tested). 3. How to use libdasm? ====================== Compiling your application with libdasm is very easy. As usual, there are several ways to do it: - Include "libdasm.c" and compile as usual. Remember to copy "libdasm.h" and "opcode_tables.h" in the same directory as they are included by the main c-file. - Include "libdasm.h" and compile with "libdasm.c" (and remember to copy also "opcode_tables.h"). - Compile libdasm as library and link against it statically or dynamically, depending on the system and your needs. Win32 DLL and Unix static/dynamic libraries can be built with the supplied makefiles. See the file LIB.txt for more information. - Compile pydasm and use libdasm as a python module (see directory "pydasm" for more information). - Compile rbdasm and use libdasm as a ruby module (see directory "rbdasm" for more information). For basic disassembling, there are are only one or two libdasm functions you will need. First and the most important function is get_instruction. 3.1. get_instruction ==================== get_instruction analyzes data stream and fills in a structure presenting the instruction. This structure, defined as struct INSTRUCTION, can be later used for formatting the instruction to printable form or for analyzing the instruction contents. It is defined as follows: int get_instruction( INSTRUCTION *inst, // pointer to INSTRUCTION structure BYTE *addr, // data buffer enum Mode mode // mode: MODE_32 or MODE_16 ); First argument is a reference to INSTRUCTION structure. There is no need to initialize the structure prior to function call, get_instruction will take care of filling it. Second argument is an address of code buffer. get_instruction will read data starting from that address and parse a single instruction. INSTRUCTION structure is filled with the components of the returned instruction. Normally you don't need to know about the contents of the structure, but if you need to, read the next chapter. Third argument, the mode is either 32-bit (MODE_32) or 16-bit (MODE_16). This is the desired addressing mode. Note that the instruction might override the mode. get_instruction returns the instruction length. If the returned value is zero, it indicates illegal instruction. When get_instruction returns, you can print the instruction with get_instruction_string or do analysis of the instruction members. When ready, increment data buffer pointer to next instruction and call get_instruction again. Here is pseudo-code presenting this procedure: INSTRUCTION inst; int len, buflen, c = 0; BYTE *buf; do { len = get_instruction(&inst, buf+c, MODE_32); // do something with the instruction c += len; } while (c < buflen); 3.2. get_instruction_string =========================== get_instruction_string parses the instruction structure and fills in a string presenting the instruction in given format. Currently, ATT and Intel formats are supported. The function is defined as: int get_instruction_string( INSTRUCTION *instr, // pointer to INSTRUCTION structure enum Format format, // format: FORMAT_ATT or FORMAT_INTEL DWORD offset, // instruction absolute address char *string, // string buffer int length // string length ); The offset is needed only if you need to make relational offsets look nice (jmp/call/loop etc.). If you are parsing instructions in known virtual address, use the virtual address. Otherwise, you can use zero. DWORD is defined in libdasm.h as unsigned 32-bit number (libdasm only supports IA-32 atm). string is the pointer to instruction buffer, length is the size of the buffer. Note that the text is truncated if it doesn't fit in buffer. get_instruction_string will initialize the string and terminate it correctly for convenience. It returns zero if the operation is not successful. That's it! Check out sample disassembler programs "simple.c" and "das.c" for examples. 3.3 Other libdasm functions =========================== libdasm uses internally lot of useful functions that might help in instruction formatting etc. For example, get_instruction_string calls get_mnemonic_string and get_operand_string for simple instruction formatting. These functions are defined as: int get_mnemonic_string( INSTRUCTION *inst, enum Format format, char *string, int length ); int get_operand_string( INSTRUCTION *inst, OPERAND *op, enum Format format, DWORD offset, char *string, int length ); Both functions initialize and terminate the string buffer and return data formatted as defined in member "format". There are also many useful helper functions defined in libdasm.h for analyzing instruction contents. 4. INSTRUCTION structure ======================== If all you need is to fetch and print out instructions in the data buffer, you can skip this chapter. But if you need to inspect the individual components that make up an instruction, you will need this information. All libdasm functions inspect and/or manipulate INSTRUCTION structure. It is defined as follows: typedef struct _INSTRUCTION { int length; // Instruction length enum Instruction type; // Instruction type enum Mode mode; // Addressing mode BYTE opcode; // Actual opcode BYTE modrm; // MODRM byte BYTE sib; // SIB byte int extindex; // Extension table index int fpuindex; // FPU table index int dispbytes; // Displacement bytes (0 = no displacement) int immbytes; // Immediate bytes (0 = no immediate) int sectionbytes; // Section prefix bytes (0 = no section prefix) OPERAND op1; // First operand (if any) OPERAND op2; // Second operand (if any) OPERAND op3; // Additional operand (if any) int flags; // Instruction flags } INSTRUCTION, *PINSTRUCTION; Most important members are probably "length", "opcode", and the operands. "length" is the instruction size, also returned by get_instruction. If the instruction size is zero, the instruction is illegal. "opcode" is the instruction opcode byte. Some of the most common instructions also have a meaningful "type" member. This member can have one of the following values: INSTRUCTION_TYPE_MOV, INSTRUCTION_TYPE_ADD, INSTRUCTION_TYPE_SUB, INSTRUCTION_TYPE_INC, INSTRUCTION_TYPE_DEC, INSTRUCTION_TYPE_DIV, INSTRUCTION_TYPE_MUL, INSTRUCTION_TYPE_IMUL, INSTRUCTION_TYPE_XOR, INSTRUCTION_TYPE_LEA, INSTRUCTION_TYPE_XCHG, INSTRUCTION_TYPE_CMP, INSTRUCTION_TYPE_TEST, INSTRUCTION_TYPE_PUSH, // includes enter, pusha and pushf INSTRUCTION_TYPE_AND, INSTRUCTION_TYPE_OR, INSTRUCTION_TYPE_POP, // includes popa and popf INSTRUCTION_TYPE_JMP, // includes jmpf INSTRUCTION_TYPE_JMPC, // conditional jump INSTRUCTION_TYPE_LOOP, INSTRUCTION_TYPE_CALL, // includes callf INSTRUCTION_TYPE_RET, // includes leave, retn and retf INSTRUCTION_TYPE_INT, // interrupt INSTRUCTION_TYPE_FPU, // FPU-related instruction INSTRUCTION_TYPE_OTHER, // Other instructions :-) The list above is not complete, check out libdasm.h for complete listing of all possible instruction types. Individual operands can be accessed by the OPERAND structures. All instructions have 0-3 operands which are ordered in INTEL order (op1 is the first operand in INTEL syntax). struct OPERAND is defined as: typedef struct _OPERAND { enum Operand type; // Operand type (register, memory, etc) int reg; // Register (if any) int basereg; // Base register (if any) int indexreg; // Index register (if any) int scale; // Scale (if any) int dispbytes; // Displacement bytes (0 = no displacement) int dispoffset; // Displacement offset (0 = no diplacement) int immbytes; // Immediate bytes (0 = no immediate) int immoffset; // Immediate offset (0 = no immediate) int sectionbytes; // Section prefix bytes (0 = no section prefix) WORD section; // Section prefix value DWORD displacement; // Displacement value DWORD immediate; // Immediate value int flags; // Operand flags } OPERAND, *POPERAND; Operand type is always defined in member "type". This member can have one of the following values: OPERAND_TYPE_NONE OPERAND_TYPE_MEMORY OPERAND_TYPE_REGISTER OPERAND_TYPE_IMMEDIATE If the type is OPERAND_TYPE_NONE, operand is not present in the instruction. If the type is OPERAND_TYPE_REGISTER, OPERAND member "reg" is present. If the type is OPERAND_TYPE_MEMORY, some combination of the members "basereg", "indexreg", "scale", "dispbytes" and "displacement" is present. These members form the memory operand as follows: [ basereg + scale * indexreg + displacement ] (INTEL) displacement(basereg, indexreg, scale) (ATT) If the type is OPERAND_TYPE_IMMEDIATE, some combination of the members "immbytes", "sectionbytes", "section" and "immediate" is present. Section-specific members are used only in far type call/jmp. Member "immediate" is filled with the actual immediate value. Example: in "mov eax, 0x11" second operand "immediate" value is 0x11. If present, register members "reg", "basereg" and "indexreg" can have one of the following values: REGISTER_EAX REGISTER_ECX REGISTER_EDX REGISTER_EBX REGISTER_ESP REGISTER_EBP REGISTER_ESI REGISTER_EDI If registers are not present, they are defined as REGISTER_NOP. Note that the register is not necessarily general purpose register. Only way to detect this is to inspect operand flags. You can also use helper function get_register_type for determining the register type. Register type can be one of the following: REGISTER_TYPE_GEN REGISTER_TYPE_SEGMENT REGISTER_TYPE_DEBUG REGISTER_TYPE_CONTROL REGISTER_TYPE_TEST REGISTER_TYPE_XMM REGISTER_TYPE_MMX REGISTER_TYPE_FPU get_register_type returns some of the values only if the operand type is OPERAND_TYPE_REGISTER. If the operand is OPERAND_TYPE_MEMORY, the registers are always general purpose and for immediate operands, there are of course no registers involved. 5. Miscellaneous notes ====================== 5.1. General output formatting get_instruction_string tries to follow INTEL/ATT conventions but not too strictly. There are some compromises that are made to keep the implementation simple (or because the current implementation is already too complex..). 5.2. Segment prefix formatting Libdasm is modeled after the assumption that there is only one memory operand at maximum in the instruction. If there is segment register override, the segment register is placed in front of the memory operand, like this: mov eax, fs:[0x30] If there are no memory operands, the segment prefix is placed in front of the instruction: fs mov eax, 0x30 Some string instructions are also considered containing no memory operands, like cmps. In reality, it contains two memory operands. So the following: fs cmpsd is equivalent to: cmpsd fs:[esi], es:[edi] And BTW, if you are wondering what are those weird "(bt)" and "(bnt)" prefixes in front of conditional jumps, they are branch hint prefixes ("branch taken" and "branch not taken"). 5.3. Instruction correctness There is not too much sanity checking in current code. So if you feed libdasm enough with random data or illegally constructed instructions it probably gives wrong disassembly at some point. But libdasm should always disassemble correctly "real" code. 5.4. Boundary checks Libdasm will not check for read buffer boundaries. It means that if the opcode requires additional data to be read and that data cannot be accessed, libdasm might access violate, depending on the implementation. There is no platform-independent way of checking this condition, so you better make sure of it by yourself. If the data is real machine code, there is no problem (unless of course there is a bug in libdasm) because libdasm needs to read exactly what the instruction requires and of course the full instruction is in buffer, right? But in some rare cases when disassembling random data this could cause some troubles. 5.5. Endianness Endianness might not be identified correctly on all platforms (see libdasm.h for definition of __LITTLE_ENDIAN__). If you encounter endianness related problems, please report the system and possible workaround for the problem. 5.6. Inline functions Some functions are defined as inline, this might not work for all compilers. Only gcc and MSVC are tested by the author. 5.7. Other issues There are probably MANY unknown bugs in code and in instruction tables. Some known issues are listed in file TODO.txt. 6. Licensing ============ libdasm was originally released as public domain software. You can do whatever you like with it. Due to some legal uncertainty of "public domain" in some countries it was re-licensed to 2-clause BSD type license to allow both commercial and non-commercial use. 7. How to contact the author ============================ If you have bug report or some improvement ideas or want to harass the author for some other reason, please raise the issue on GitHub: https://github.com/jtpereyda/libdasm/issues
About
Automatically exported from code.google.com/p/libdasm
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published