Skip to content
This repository has been archived by the owner on Sep 7, 2020. It is now read-only.

IDA .pat format

JohnDMcMaster edited this page Oct 26, 2010 · 30 revisions

IDA .pat format is part of the FLAIR implementation of FLIRT
This info was gathered from the program rpat, which provides an open source specification of sorts for the .pat format and misc notes and sample .pat files I found across the internet. An improved version of the utility can be found at util/rpat (NOTE: I had some misconceptions of this format when fixing it and may have introduced other errors, use uvobj2pat). Basic idea:

  • One line per main object file module. IDA calls these “modules.” For dynamic libraries (.dll, .so, and such) it seems these will be functions, but if you feed it, say, a object file archive (.a), these will be the .o files within the archive. For the simple case, a single .o file should produce a single output pattern line.
  • Last file line should be a tripe dash (—-) plus line termination
  • Lines
    • Leading bytes: first 32, or less if unavailable, bytes
    • CRC16 bytes: bytes after the first 32 that do not contain relocations
    • Tailing bytes: bytes after the CRC16 bytes
    • Termination: unknown. \r\n is standard, not sure if \n if acceptable
  • General notes
    • Relocations should be marked with .’s in our raw bytes. Some of these will be listed in the referenced names
    • Seems some object formats should have and _ prepended to names?
    • Names with length shorter than 3 are treated specially

All hex digits should be capitalized. I have no idea what tools would break if lowercase. Line is like this:
<leading bytes> <CRC16 len> <CRC16> <total length> <public name(s)> <referenced name(s)> <tailing bytes>

Line parts:

  • start pattern bytes: initial relocation friendly data sequence. Typical length is 32 bytes (64 hex digits). Any locations with relocations should be represented with a dot (.). If our data is shorter than 32 bytes, it should be padded with .
  • CRC16 length: how many bytes after the first 32 without relocations, maximum value of 0xFF (255)
  • CRC16: CRC16 computed on relocationless data after the leading bytes. If not present, should be set to 0
  • total length of module: byte length of start pattern bytes + length of trailing bytes. This is not to exceed 0×8000 (8000 as written in file)
  • public name(s): what this function/symbol is known as, eg printf. The actual format of each entry should be like : where offset is 4 0 padded he digits telling where in library it occurs, and symbol name is the name of the symbol we are trying to create a signature for
  • Public names are the public symbols. Usually they are at offset 0000, but its possible to have more that aren’t
    • Standard entries have a colon followed by 4 zero padded upercase hexadecimal bytes
      • Ex: :0000 bfd_init
    • If no public names are present/valid, public name should be “?” at an offset of 0
      • Ex :0000 ?
      • Happens when say, there is one name that is only 1 or 2 letters long
    • Negative offset public names should have a negative sign
      • Ex: :-1234 bfd_init
      • This seems like an advanced detection mechanism and likely won’t be supported in near future
      • I’m assuming this has PE/COFF relevance I’m not familar with or something
    • Local public names should have a trailing @ on the offset
      • Ex: :1234@ do_bfd_init.
      • ie a static global function
  • Referenced names contain the record marker caret (^) plus the offset and the name
    • Ex: ^00A3 bfd_init
    • Only the first symbol reference for a given symbol should be included
    • Anonymous symbols should not be recorded as relocations
      • Ex: as you’d find in if( blah ) bleh(); yuck(); the jump to the after the if when !blah
  • trailing bytes: any bytes leftover from the original 32 bytes or so. They should be marked with .’s for relocations as needed as well
    • NOTE: this is NOT stored in the .sig file. It is there to simply help you resolve conflicts
  • uvudec (obj2pat) implementation notes
    Maybe out of date, but better than nothing!
    • module vs function distinction makes sense on desktop systems because of the way dynamic libraries are linked. They take all of the object files given and export the functions as symbols in that order. However, embedded systems are static binaries that have been more heavily optimized and this assumption is no longer valid. Take a look at —functions-as-modules (flirt.pattern.functions_as_modules) to control this.
    • I’m considering adding architecture and such annotations since they can usually be derived from source object file. I got the impression that there is a way to do this in FLAIR, but I’m not sure how its done
    • Behavior for short names is loosely defined, so we will probably not keep FLAIR compatibility for this
  • FLAIR implemetnation notes
    • There is always a space after a public name, even if on the end of the line. Its unknown if makesig depends on this.
    • There may be some bugs in FLAIR related to short names, or at least the files I was given for testing which were generated with that particular version had bugs. See uvudec/testing/flirt/pat/short_names.c This has made it difficult to decide what expected output should be for short names
    • It is unknown what sigmake will do if there is data after a - line
Clone this wiki locally