Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ghidra: BinExport: link CallGraph.Vertex messages to Module messages #126

Open
mike-hunhoff opened this issue Apr 9, 2024 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@mike-hunhoff
Copy link
Contributor

Ghidra's extension does not link CallGraph.Vertex messages to Module messages for imported functions e.g. kernel32.ReadFile. Truly I'm not sure if CallGraph.Vertex messages representing imported functions should be linked to Module or Library messages. The IDA plugin links them to Module messages. @cblichmann can you provide additional insight here?

@cblichmann cblichmann added the enhancement New feature or request label Apr 9, 2024
@cblichmann
Copy link
Member

It would be useful to link the functions to Module, but when I initially wrote the Ghidra extension, I did not know enough about the API to extract this information.
I consider the IDA Pro plugin to be the standard, most detailed implementation of BinExport, so we should aim to achieve that level of fidelity.

@mike-hunhoff
Copy link
Contributor Author

closed by 39f6445

@williballenthin
Copy link

@cblichmann I still don't quite understand the difference between a library and a module. Would you please explain a bit when to use one versus the other (or both)?

From the vertex documentation:

      // If this is a library function, what is its index in library arrays.
      optional int32 library_index = 5;

      // If module name, such as class name for DEX files, is present - index in
      // module table.
      optional int32 module_index = 6;

And the definitions for the two message types:

  message Library {
    // If this library is statically linked.
    optional bool is_static = 1;

    // Address where this library was loaded, 0 if unknown.
    optional uint64 load_address = 2 [default = 0];

    // Name of the library (format is platform-dependent).
    optional string name = 3;
  }

  message Module {
    // Name, such as Java class name. Platform-dependent.
    optional string name = 1;
  }

From this, my impression is that a library tracks units of code/data, such as an ELF shared object or PE DLL (when is_static=False) or statically linked code, like zlib (when is_static=True). And that a module maybe describes namespacing of source code, such as a Java namespace and/or class name.

With this in mind, when a PE file refers to kernel32.dll!CreateFileW, I would think the vertex would have a library_index pointing to Library{ is_static=False, name="kernel32.dll" }. Likewise, when some Java code refers to System.out.println, the vertex would have a module_index pointing to Module{ name = "System.out" }. I suppose it would be possible for a vertex to have both a library index and module index, such as a .NET assembly that references a C# namespaced routine in another .NET assembly.

However, as far as I can tell, and corroborated by @mike-hunhoff above, the IDA extractor doesn't appear to use library entries and seems to put dynamically linked library names into module entries:

const std::string module = GetModuleName(address, modules);
if (!module.empty()) {
function.SetType(Function::TYPE_IMPORTED);
function.SetModuleName(module);
}
if (function.GetType() == Function::TYPE_NONE ||
function.GetTypeHeuristic() == Function::TYPE_STANDARD) {
if (function.GetBasicBlocks().empty()) {
function.SetType(Function::TYPE_IMPORTED);
} else {
function.SetType(Function::TYPE_STANDARD);
}

How is my understanding of the types above? Is it correct or are they meant to be used in different ways? And, does the IDA extractor behave as expected?

When this discussion is resolved, I'd be happy to update the protobuf documentation to better explain how producers and consumers should use this data.

@williballenthin
Copy link

@cblichmann polite bump

@mike-hunhoff mike-hunhoff reopened this Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants