Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Introduce binary analysis tool based on BOLT #115330

Merged
merged 5 commits into from
Dec 12, 2024

Conversation

kbeyls
Copy link
Collaborator

@kbeyls kbeyls commented Nov 7, 2024

This initial commit does not add any specific binary analyses yet, it merely contains the boilerplate to introduce a new BOLT-based tool.

This basically combines the 4 first patches from the prototype pac-ret and stack-clash binary analyzer discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and published at main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was proposed and discussed in at least the following places:

This initial commit does not add any specific binary analyses yet,
it merely contains the boilerplate to introduce a new BOLT-based
tool.

This basically combines the 4 first patches from the prototype
pac-ret and stack-clash binary analyzer discussed in RFC
https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148
and published at llvm/llvm-project@main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was
proposed and discussed in at least the following places:
- The RFC pointed to above
- EuroLLVM 2024 round table
  https://discourse.llvm.org/t/summary-of-bolt-as-a-binary-analysis-tool-round-table-at-eurollvm/78441
  The round table showed quite a few people interested in being
  able to build a custom binary analysis quickly with a tool
  like this.
- Also at the US LLVM dev meeting a few weeks ago, I heard
  interest from a few people, asking when the tool would be
  available upstream.
- The presentation "Adding Pointer Authentication ABI support for your ELF platform"
  (https://llvm.swoogo.com/2024devmtg/session/2512720/adding-pointer-authentication-abi-support-for-your-elf-platform)
  explicitly mentioned interest to extend the prototype tool to
  verify correct implementation of pauthabi.
@llvmbot
Copy link
Member

llvmbot commented Nov 7, 2024

@llvm/pr-subscribers-bolt

Author: Kristof Beyls (kbeyls)

Changes

This initial commit does not add any specific binary analyses yet, it merely contains the boilerplate to introduce a new BOLT-based tool.

This basically combines the 4 first patches from the prototype pac-ret and stack-clash binary analyzer discussed in RFC https://discourse.llvm.org/t/rfc-bolt-based-binary-analysis-tool-to-verify-correctness-of-security-hardening/78148 and published at main...kbeyls:llvm-project:bolt-gadget-scanner-prototype

The introduction of such a BOLT-based binary analysis tool was proposed and discussed in at least the following places:


Full diff: https://github.com/llvm/llvm-project/pull/115330.diff

13 Files Affected:

  • (added) bolt/docs/BinaryAnalysis.md (+20)
  • (modified) bolt/include/bolt/Rewrite/RewriteInstance.h (+3)
  • (modified) bolt/include/bolt/Utils/CommandLineOpts.h (+2)
  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+7)
  • (modified) bolt/lib/Utils/CommandLineOpts.cpp (+2)
  • (modified) bolt/test/CMakeLists.txt (+1)
  • (added) bolt/test/binary-analysis/Inputs/dummy.txt (+1)
  • (added) bolt/test/binary-analysis/cmdline-args.test (+36)
  • (added) bolt/test/binary-analysis/lit.local.cfg (+8)
  • (modified) bolt/test/lit.cfg.py (+1)
  • (modified) bolt/tools/CMakeLists.txt (+1)
  • (added) bolt/tools/binary-analysis/CMakeLists.txt (+19)
  • (added) bolt/tools/binary-analysis/binary-analysis.cpp (+108)
diff --git a/bolt/docs/BinaryAnalysis.md b/bolt/docs/BinaryAnalysis.md
new file mode 100644
index 00000000000000..95d0a915c0c8ad
--- /dev/null
+++ b/bolt/docs/BinaryAnalysis.md
@@ -0,0 +1,20 @@
+# BOLT-based binary analysis
+
+As part of post-link-time optimizing, BOLT needs to perform a range of analyses
+on binaries such as recontructing control flow graphs, and more.
+
+The `llvm-bolt-binary-analysis` tool enables running requested binary analyses
+on binaries, and generating reports. It does this by building on top of the
+analyses implementing in the BOLT libraries.
+
+## Which binary analyses are implemented?
+
+At the moment, no binary analyses are implemented.
+
+The goal is to make it easy using a plug-in framework to add you own analyses.
+
+## How to add your own binary analysis
+
+_TODO: this section needs to be written. Ideally, we should have an simple
+"example" or "template" analysis that can be the starting point for implementing
+custom analyses_
diff --git a/bolt/include/bolt/Rewrite/RewriteInstance.h b/bolt/include/bolt/Rewrite/RewriteInstance.h
index e5b7ad63007cab..c30f21793129a5 100644
--- a/bolt/include/bolt/Rewrite/RewriteInstance.h
+++ b/bolt/include/bolt/Rewrite/RewriteInstance.h
@@ -164,6 +164,9 @@ class RewriteInstance {
 
   void preregisterSections();
 
+  /// run analyses requested in binary analysis mode.
+  void runBinaryAnalyses();
+
   /// Run optimizations that operate at the binary, or post-linker, level.
   void runOptimizationPasses();
 
diff --git a/bolt/include/bolt/Utils/CommandLineOpts.h b/bolt/include/bolt/Utils/CommandLineOpts.h
index 04bf7db5de9527..111eb650c37465 100644
--- a/bolt/include/bolt/Utils/CommandLineOpts.h
+++ b/bolt/include/bolt/Utils/CommandLineOpts.h
@@ -18,6 +18,7 @@
 namespace opts {
 
 extern bool HeatmapMode;
+extern bool BinaryAnalysisMode;
 
 extern llvm::cl::OptionCategory BoltCategory;
 extern llvm::cl::OptionCategory BoltDiffCategory;
@@ -27,6 +28,7 @@ extern llvm::cl::OptionCategory BoltOutputCategory;
 extern llvm::cl::OptionCategory AggregatorCategory;
 extern llvm::cl::OptionCategory BoltInstrCategory;
 extern llvm::cl::OptionCategory HeatmapCategory;
+extern llvm::cl::OptionCategory BinaryAnalysisCategory;
 
 extern llvm::cl::opt<unsigned> AlignText;
 extern llvm::cl::opt<unsigned> AlignFunctions;
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index 32ec7abe8b666a..c1a6451358b677 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -698,6 +698,11 @@ Error RewriteInstance::run() {
   if (opts::DiffOnly)
     return Error::success();
 
+  if (opts::BinaryAnalysisMode) {
+    runBinaryAnalyses();
+    return Error::success();
+  }
+
   preregisterSections();
 
   runOptimizationPasses();
@@ -3419,6 +3424,8 @@ void RewriteInstance::runOptimizationPasses() {
   BC->logBOLTErrorsAndQuitOnFatal(BinaryFunctionPassManager::runAllPasses(*BC));
 }
 
+void RewriteInstance::runBinaryAnalyses() {}
+
 void RewriteInstance::preregisterSections() {
   // Preregister sections before emission to set their order in the output.
   const unsigned ROFlags = BinarySection::getFlags(/*IsReadOnly*/ true,
diff --git a/bolt/lib/Utils/CommandLineOpts.cpp b/bolt/lib/Utils/CommandLineOpts.cpp
index de82420a167131..17f090aa61ee9e 100644
--- a/bolt/lib/Utils/CommandLineOpts.cpp
+++ b/bolt/lib/Utils/CommandLineOpts.cpp
@@ -29,6 +29,7 @@ const char *BoltRevision =
 namespace opts {
 
 bool HeatmapMode = false;
+bool BinaryAnalysisMode = false;
 
 cl::OptionCategory BoltCategory("BOLT generic options");
 cl::OptionCategory BoltDiffCategory("BOLTDIFF generic options");
@@ -38,6 +39,7 @@ cl::OptionCategory BoltOutputCategory("Output options");
 cl::OptionCategory AggregatorCategory("Data aggregation options");
 cl::OptionCategory BoltInstrCategory("BOLT instrumentation options");
 cl::OptionCategory HeatmapCategory("Heatmap options");
+cl::OptionCategory BinaryAnalysisCategory("BinaryAnalysis options");
 
 cl::opt<unsigned> AlignText("align-text",
                             cl::desc("alignment of .text section"), cl::Hidden,
diff --git a/bolt/test/CMakeLists.txt b/bolt/test/CMakeLists.txt
index d468ff984840fc..6e18b028bddfcd 100644
--- a/bolt/test/CMakeLists.txt
+++ b/bolt/test/CMakeLists.txt
@@ -37,6 +37,7 @@ list(APPEND BOLT_TEST_DEPS
   lld
   llvm-config
   llvm-bolt
+  llvm-bolt-binary-analysis
   llvm-bolt-heatmap
   llvm-bat-dump
   llvm-dwarfdump
diff --git a/bolt/test/binary-analysis/Inputs/dummy.txt b/bolt/test/binary-analysis/Inputs/dummy.txt
new file mode 100644
index 00000000000000..2995a4d0e74917
--- /dev/null
+++ b/bolt/test/binary-analysis/Inputs/dummy.txt
@@ -0,0 +1 @@
+dummy
\ No newline at end of file
diff --git a/bolt/test/binary-analysis/cmdline-args.test b/bolt/test/binary-analysis/cmdline-args.test
new file mode 100644
index 00000000000000..3ed6c4323bcff8
--- /dev/null
+++ b/bolt/test/binary-analysis/cmdline-args.test
@@ -0,0 +1,36 @@
+# This file tests error messages produced on invalid command line arguments.
+# It also check that help messages are generated as expected.
+
+# Verify that an error message is provided if an input file is missing or incorrect
+
+RUN: not llvm-bolt-binary-analysis 2>&1 | FileCheck -check-prefix=NOFILEARG %s
+NOFILEARG:       llvm-bolt-binary-analysis: Not enough positional command line arguments specified!
+NOFILEARG-NEXT:  Must specify at least 1 positional argument: See: {{.*}}llvm-bolt-binary-analysis --help
+
+RUN: not llvm-bolt-binary-analysis non-existing-file 2>&1 | FileCheck -check-prefix=NONEXISTINGFILEARG %s
+NONEXISTINGFILEARG:       llvm-bolt-binary-analysis: 'non-existing-file': No such file or directory.
+
+RUN: not llvm-bolt-binary-analysis %p/Inputs/dummy.txt 2>&1 | FileCheck -check-prefix=NOELFFILEARG %s
+NOELFFILEARG:       llvm-bolt-binary-analysis: '{{.*}}/Inputs/dummy.txt': The file was not recognized as a valid object file.
+
+RUN: %clang %cflags %p/../Inputs/asm_foo.s %p/../Inputs/asm_main.c -o %t.exe
+RUN: llvm-bolt-binary-analysis %t.exe 2>&1 | FileCheck -check-prefix=VALIDELFFILEARG --allow-empty %s
+# Check that there are no BOLT-WARNING or BOLT-ERROR output lines
+VALIDELFFILEARG:     BOLT-INFO:
+VALIDELFFILEARG-NOT: BOLT-WARNING:
+VALIDELFFILEARG-NOT: BOLT-ERROR:
+
+# Check --help output
+
+RUN: llvm-bolt-binary-analysis --help 2>&1 | FileCheck -check-prefix=HELP %s
+
+HELP:       OVERVIEW: BinaryAnalysis
+HELP-EMPTY:
+HELP-NEXT:  USAGE: llvm-bolt-binary-analysis [options] <executable>
+HELP-EMPTY:
+HELP-NEXT:  OPTIONS:
+HELP-EMPTY:
+HELP-NEXT:  Generic Options:
+
+
+
diff --git a/bolt/test/binary-analysis/lit.local.cfg b/bolt/test/binary-analysis/lit.local.cfg
new file mode 100644
index 00000000000000..f3a023296dde56
--- /dev/null
+++ b/bolt/test/binary-analysis/lit.local.cfg
@@ -0,0 +1,8 @@
+# FIXME: should we instead create a binary-analysis/AArch64 sub-directory to put most tests in?
+if "AArch64" not in config.root.targets:
+    config.unsupported = True
+
+flags = "--target=aarch64-linux-gnu -nostartfiles -nostdlib -ffreestanding -Wl,--emit-relocs"
+
+config.substitutions.insert(0, ("%cflags", f"%cflags {flags}"))
+config.substitutions.insert(0, ("%cxxflags", f"%cxxflags {flags}"))
diff --git a/bolt/test/lit.cfg.py b/bolt/test/lit.cfg.py
index da3ae34ba3bddb..0d05229be2bf3a 100644
--- a/bolt/test/lit.cfg.py
+++ b/bolt/test/lit.cfg.py
@@ -110,6 +110,7 @@
     ),
     ToolSubst("llvm-boltdiff", unresolved="fatal"),
     ToolSubst("llvm-bolt-heatmap", unresolved="fatal"),
+    ToolSubst("llvm-bolt-binary-analysis", unresolved="fatal"),
     ToolSubst("llvm-bat-dump", unresolved="fatal"),
     ToolSubst("perf2bolt", unresolved="fatal"),
     ToolSubst("yaml2obj", unresolved="fatal"),
diff --git a/bolt/tools/CMakeLists.txt b/bolt/tools/CMakeLists.txt
index 22ea3b9bd805f3..3383902cffc405 100644
--- a/bolt/tools/CMakeLists.txt
+++ b/bolt/tools/CMakeLists.txt
@@ -7,3 +7,4 @@ add_subdirectory(llvm-bolt-fuzzer)
 add_subdirectory(bat-dump)
 add_subdirectory(merge-fdata)
 add_subdirectory(heatmap)
+add_subdirectory(binary-analysis)
diff --git a/bolt/tools/binary-analysis/CMakeLists.txt b/bolt/tools/binary-analysis/CMakeLists.txt
new file mode 100644
index 00000000000000..841fc5b3711859
--- /dev/null
+++ b/bolt/tools/binary-analysis/CMakeLists.txt
@@ -0,0 +1,19 @@
+set(LLVM_LINK_COMPONENTS
+  ${LLVM_TARGETS_TO_BUILD}
+  MC
+  Object
+  Support
+  )
+
+add_bolt_tool(llvm-bolt-binary-analysis
+  binary-analysis.cpp
+  DISABLE_LLVM_LINK_LLVM_DYLIB
+  )
+
+target_link_libraries(llvm-bolt-binary-analysis
+  PRIVATE
+  LLVMBOLTRewrite
+  LLVMBOLTUtils
+  )
+
+add_dependencies(bolt llvm-bolt-binary-analysis)
diff --git a/bolt/tools/binary-analysis/binary-analysis.cpp b/bolt/tools/binary-analysis/binary-analysis.cpp
new file mode 100644
index 00000000000000..750eed224f1412
--- /dev/null
+++ b/bolt/tools/binary-analysis/binary-analysis.cpp
@@ -0,0 +1,108 @@
+#include "bolt/Rewrite/RewriteInstance.h"
+#include "bolt/Utils/CommandLineOpts.h"
+#include "llvm/MC/TargetRegistry.h"
+#include "llvm/Object/Binary.h"
+#include "llvm/Object/ELFObjectFile.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/Support/Errc.h"
+#include "llvm/Support/ManagedStatic.h"
+#include "llvm/Support/PrettyStackTrace.h"
+#include "llvm/Support/Program.h"
+#include "llvm/Support/Signals.h"
+#include "llvm/Support/TargetSelect.h"
+#include "llvm/Support/VirtualFileSystem.h"
+
+#define DEBUG_TYPE "bolt"
+
+using namespace llvm;
+using namespace object;
+using namespace bolt;
+
+namespace opts {
+
+static cl::OptionCategory *BinaryAnalysisCategories[] = {
+    &BinaryAnalysisCategory};
+
+static cl::opt<std::string> InputFilename(cl::Positional,
+                                          cl::desc("<executable>"),
+                                          cl::Required,
+                                          cl::cat(BinaryAnalysisCategory),
+                                          cl::sub(cl::SubCommand::getAll()));
+
+} // namespace opts
+
+static StringRef ToolName = "llvm-bolt-binary-analysis";
+
+static void report_error(StringRef Message, std::error_code EC) {
+  assert(EC);
+  errs() << ToolName << ": '" << Message << "': " << EC.message() << ".\n";
+  exit(1);
+}
+
+static void report_error(StringRef Message, Error E) {
+  assert(E);
+  errs() << ToolName << ": '" << Message << "': " << toString(std::move(E))
+         << ".\n";
+  exit(1);
+}
+
+void ParseCommandLine(int argc, char **argv) {
+  cl::HideUnrelatedOptions(ArrayRef(opts::BinaryAnalysisCategories));
+  // Register the target printer for --version.
+  cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);
+
+  cl::ParseCommandLineOptions(argc, argv, "BinaryAnalysis\n");
+}
+
+static std::string GetExecutablePath(const char *Argv0) {
+  SmallString<256> ExecutablePath(Argv0);
+  // Do a PATH lookup if Argv0 isn't a valid path.
+  if (!llvm::sys::fs::exists(ExecutablePath))
+    if (llvm::ErrorOr<std::string> P =
+            llvm::sys::findProgramByName(ExecutablePath))
+      ExecutablePath = *P;
+  return std::string(ExecutablePath.str());
+}
+
+int main(int argc, char **argv) {
+  // Print a stack trace if we signal out.
+  sys::PrintStackTraceOnErrorSignal(argv[0]);
+  PrettyStackTraceProgram X(argc, argv);
+
+  std::string ToolPath = GetExecutablePath(argv[0]);
+
+  llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.
+
+  // Initialize targets and assembly printers/parsers.
+  llvm::InitializeAllTargetInfos();
+  llvm::InitializeAllTargetMCs();
+  llvm::InitializeAllAsmParsers();
+  llvm::InitializeAllDisassemblers();
+
+  llvm::InitializeAllTargets();
+  llvm::InitializeAllAsmPrinters();
+
+  ParseCommandLine(argc, argv);
+
+  opts::BinaryAnalysisMode = true;
+
+  if (!sys::fs::exists(opts::InputFilename))
+    report_error(opts::InputFilename, errc::no_such_file_or_directory);
+
+  Expected<OwningBinary<Binary>> BinaryOrErr =
+      createBinary(opts::InputFilename);
+  if (Error E = BinaryOrErr.takeError())
+    report_error(opts::InputFilename, std::move(E));
+  Binary &Binary = *BinaryOrErr.get().getBinary();
+
+  if (auto *e = dyn_cast<ELFObjectFileBase>(&Binary)) {
+    auto RIOrErr = RewriteInstance::create(e, argc, argv, ToolPath);
+    if (Error E = RIOrErr.takeError())
+      report_error(opts::InputFilename, std::move(E));
+    RewriteInstance &RI = *RIOrErr.get();
+    if (Error E = RI.run())
+      report_error(opts::InputFilename, std::move(E));
+  }
+
+  return EXIT_SUCCESS;
+}

@kbeyls
Copy link
Collaborator Author

kbeyls commented Nov 7, 2024

FYI @ilovepi @asl

@asl
Copy link
Collaborator

asl commented Nov 7, 2024

Tagging @atrosinenko

@ilovepi
Copy link
Contributor

ilovepi commented Nov 7, 2024

Thanks for posting this. I'll try to play with the prototype sometime next week and see if I can get something hopefully simple working like dumping the stack layout or printing info on function args, so I can give you some feedback.

bolt/docs/BinaryAnalysis.md Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 21, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

bolt/docs/BinaryAnalysis.md Outdated Show resolved Hide resolved
bolt/docs/BinaryAnalysis.md Outdated Show resolved Hide resolved
Comment on lines +37 to +39
static cl::OptionCategory *BinaryAnalysisCategories[] = {
&BinaryAnalysisCategory};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like BoltCategory is also useful, at least its -print-* options. Though it contains a number of options not relevant to the "read-only" use case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out adding the BoltCategory too.
Before doing so, the output of ./bin/llvm-bolt-binary-analysis --help is:

OVERVIEW: BinaryAnalysis

USAGE: llvm-bolt-binary-analysis [options] <executable>

OPTIONS:

Generic Options:

  --help      - Display available options (--help-hidden for more)
  --help-list - Display list of available options (--help-list-hidden for more)
  --version   - Display the version of this program

After doing so, the output for the same command is:

OVERVIEW: BinaryAnalysis

USAGE: llvm-bolt-binary-analysis [options] <executable>

OPTIONS:

BOLT generic options:

  --bolt-id=<string>                            - add any string to tag this execution in the output binary via bolt info section
  --create-debug-names-section                  - Creates .debug_names section, if the input binary doesn't have it already, for DWARF5 CU/TUs.
  --debug-thread-count=<uint>                   - specifies thread count for the multithreading for updating DWO debug info
  --dump-cg=<string>                            - dump callgraph to the given file
  --dwarf-output-path=<string>                  - Path to where .dwo files will be written out to.
  --dyno-stats                                  - print execution info based on profile
  --enable-bat                                  - write BOLT Address Translation tables
  --hot-data                                    - hot data symbols support (relocation mode)
  --hot-functions-at-end                        - if reorder-functions is used, order functions putting hottest last
  --hot-text                                    - Generate hot text symbols. Apply this option to a precompiled binary that manually calls into hugify, such that at runtime hugify call will put hot code into 2M pages. This requires relocation.
  --hot-text-move-sections=<sec1,sec2,sec3,...> - list of sections containing functions used for hugifying hot text. BOLT makes sure these functions are not placed on the same page as the hot text. (default='.stub,.mover').
  --insert-retpolines                           - run retpoline insertion pass
  --lite                                        - skip processing of cold functions
  --no-threads                                  - disable multithreading
  --print-profile-stats                         - print profile quality/bias analysis
  --r11-availability=<value>                    - determine the availability of r11 before indirect branches
    =never                                      -   r11 not available
    =always                                     -   r11 available before calls and jumps
    =abi                                        -   r11 available before calls but not before jumps
  --relocs                                      - use relocations in the binary (default=autodetect)
  --remove-symtab                               - Remove .symtab section
  --strict                                      - trust the input to be from a well-formed source
  --tasks-per-thread=<uint>                     - number of tasks to be created per thread
  --thread-count=<uint>                         - number of threads
  --update-debug-sections                       - update DWARF debug sections of the executable
  --use-gnu-stack                               - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy)
  --use-old-text                                - re-use space in old .text if possible (relocation mode)
  -v <uint>                                     - set verbosity level for diagnostic output

Generic Options:

  --help                                        - Display available options (--help-hidden for more)
  --help-list                                   - Display list of available options (--help-list-hidden for more)
  --version                                     - Display the version of this program

It seems the majority of the options in the BoltCategory aren't applicable (at the moment).
But it is a good point that some options are applicable/useful, so ought to be shown with --help. I guess this will require splitting the BoltCategory into 2 sets: one with options useful for both llvm-bolt and llvm-bolt-binary-analysis and one with options that are only useful for llvm-bolt.
I think doing so is best left for a separate PR though....
Maybe we should create an issue with the "good first issue" label for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, it looks like most of the options relevant for the analysis use-case are only printed with --help-hidden instead of --help:

OVERVIEW: BinaryAnalysis

USAGE: llvm-bolt-binary-analysis [options] <executable>

OPTIONS:

BOLT generic options:

  --align-text=<uint>                                       - alignment of .text section
  --allow-stripped                                          - allow processing of stripped binaries
  --alt-inst-feature-size=<uint>                            - size of feature field in .altinstructions
  --alt-inst-has-padlen                                     - specify that .altinstructions has padlen field
  --asm-dump[=<dump folder>]                                  - dump function into assembly
  --bolt-id=<string>                                        - add any string to tag this execution in the output binary via bolt info section
  --break-funcs=<func1,func2,func3,...>                     - list of functions to core dump on (debugging)
  --check-encoding                                          - perform verification of LLVM instruction encoding/decoding. Every instruction in the input is decoded and re-encoded. If the resulting bytes do not match the input, a warning message is printed.
  --comp-dir-override=<string>                              - overrides DW_AT_comp_dir, and provides an alternative base location, which is used with DW_AT_dwo_name to construct a path to *.dwo files.
  --compact-code-model                                      - generate code for binaries <128MB on AArch64
  --create-debug-names-section                              - Creates .debug_names section, if the input binary doesn't have it already, for DWARF5 CU/TUs.
  --cu-processing-batch-size=<uint>                         - Specifies the size of batches for processing CUs. Higher number has better performance, but more memory usage. Default value is 1.
  --debug-skeleton-cu                                       - prints out offsets for abbrev and debug_info of Skeleton CUs that get patched.
  --debug-thread-count=<uint>                               - specifies thread count for the multithreading for updating DWO debug info
  --dot-tooltip-code                                        - add basic block instructions as tool tips on nodes
  --dump-alt-instructions                                   - dump Linux alternative instructions info
  --dump-cg=<string>                                        - dump callgraph to the given file
  --dump-data                                               - dump parsed bolt data for debugging
  --dump-dot-all                                            - dump function CFGs to graphviz format after each stage;enable '-print-loops' for color-coded blocks
  --dump-linux-exceptions                                   - dump Linux kernel exception table
  --dump-orc                                                - dump raw ORC unwind information (sorted)
  --dump-para-sites                                         - dump Linux kernel paravitual patch sites
  --dump-pci-fixups                                         - dump Linux kernel PCI fixup table
  --dump-smp-locks                                          - dump Linux kernel SMP locks
  --dump-static-calls                                       - dump Linux kernel static calls
  --dump-static-keys                                        - dump Linux kernel static keys jump table
  --dwarf-output-path=<string>                              - Path to where .dwo files will be written out to.
  --dwp=<string>                                            - Path and name to DWP file.
  --dyno-stats                                              - print execution info based on profile
  --dyno-stats-all                                          - print dyno stats after each stage
  --dyno-stats-scale=<uint>                                 - scale to be applied while reporting dyno stats
  --enable-bat                                              - write BOLT Address Translation tables
  --force-data-relocations                                  - force relocations to data sections to always be processed
  --force-patch                                             - force patching of original entry points
  --funcs=<func1,func2,func3,...>                           - limit optimizations to functions from the list
  --funcs-file=<string>                                     - file with list of functions to optimize
  --funcs-file-no-regex=<string>                            - file with list of functions to optimize (non-regex)
  --funcs-no-regex=<func1,func2,func3,...>                  - limit optimizations to functions from the list (non-regex)
  --hot-data                                                - hot data symbols support (relocation mode)
  --hot-functions-at-end                                    - if reorder-functions is used, order functions putting hottest last
  --hot-text                                                - Generate hot text symbols. Apply this option to a precompiled binary that manually calls into hugify, such that at runtime hugify call will put hot code into 2M pages. This requires relocation.
  --hot-text-move-sections=<sec1,sec2,sec3,...>             - list of sections containing functions used for hugifying hot text. BOLT makes sure these functions are not placed on the same page as the hot text. (default='.stub,.mover').
  --insert-retpolines                                       - run retpoline insertion pass
  --keep-aranges                                            - keep or generate .debug_aranges section if .gdb_index is written
  --keep-tmp                                                - preserve intermediate .o file
  --lite                                                    - skip processing of cold functions
  --long-jump-labels                                        - always use long jumps/nops for Linux kernel static keys
  --max-data-relocations=<uint>                             - maximum number of data relocations to process
  --max-funcs=<uint>                                        - maximum number of functions to process
  --no-huge-pages                                           - use regular size pages for code alignment
  --no-threads                                              - disable multithreading
  --pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>        - list of functions to pad with amount of bytes
  --pad-funcs-before=<func1:pad1,func2:pad2,func3:pad3,...> - list of functions to pad with amount of bytes
  --print-aliases                                           - print aliases when printing objects
  --print-all                                               - print functions after each stage
  --print-cfg                                               - print functions after CFG construction
  --print-debug-info                                        - print debug info when printing functions
  --print-disasm                                            - print function after disassembly
  --print-dyno-opcode-stats=<uint>                          - print per instruction opcode dyno stats and the functionnames:BB offsets of the nth highest execution counts
  --print-dyno-stats-only                                   - while printing functions output dyno-stats and skip instructions
  --print-exceptions                                        - print exception handling data
  --print-globals                                           - print global symbols after disassembly
  --print-jump-tables                                       - print jump tables
  --print-loops                                             - print loop related information
  --print-mem-data                                          - print memory data annotations when printing functions
  --print-normalized                                        - print functions after CFG is normalized
  --print-only=<func1,func2,func3,...>                      - list of functions to print
  --print-orc                                               - print ORC unwind information for instructions
  --print-profile                                           - print functions after attaching profile
  --print-profile-stats                                     - print profile quality/bias analysis
  --print-pseudo-probes=<value>                             - print pseudo probe info
    =decode                                                 -   decode probes section from binary
    =address_conversion                                     -   update address2ProbesMap with output block address
    =encoded_probes                                         -   display the encoded probes in binary section
    =all                                                    -   enable all debugging printout
  --print-relocations                                       - print relocations when printing functions/objects
  --print-reordered-data                                    - print section contents after reordering
  --print-retpoline-insertion                               - print functions after retpoline insertion pass
  --print-sdt                                               - print all SDT markers
  --print-sections                                          - print all registered sections
  --print-unknown                                           - print names of functions with unknown control flow
  --profile-format=<value>                                  - format to dump profile output in aggregation mode, default is fdata
    =fdata                                                  -   offset-based plaintext format
    =yaml                                                   -   dense YAML representation
  --r11-availability=<value>                                - determine the availability of r11 before indirect branches
    =never                                                  -   r11 not available
    =always                                                 -   r11 available before calls and jumps
    =abi                                                    -   r11 available before calls but not before jumps
  --relocs                                                  - use relocations in the binary (default=autodetect)
  --remove-symtab                                           - Remove .symtab section
  --reorder-skip-symbols=<symbol1,symbol2,symbol3,...>      - list of symbol names that cannot be reordered
  --reorder-symbols=<symbol1,symbol2,symbol3,...>           - list of symbol names that can be reordered
  --retpoline-lfence                                        - determine if lfence instruction should exist in the retpoline
  --skip-funcs=<func1,func2,func3,...>                      - list of functions to skip
  --skip-funcs-file=<string>                                - file with list of functions to skip
  --strict                                                  - trust the input to be from a well-formed source
  --tasks-per-thread=<uint>                                 - number of tasks to be created per thread
  --terminal-trap                                           - Assume that execution stops at trap instruction
  --thread-count=<uint>                                     - number of threads
  --time-build                                              - print time spent constructing binary functions
  --time-rewrite                                            - print time spent in rewriting passes
  --top-called-limit=<uint>                                 - maximum number of functions to print in top called functions section
  --trap-avx512                                             - in relocation mode trap upon entry to any function that uses AVX-512 instructions
  --trap-old-code                                           - insert traps in old function bodies (relocation mode)
  --update-debug-sections                                   - update DWARF debug sections of the executable
  --use-gnu-stack                                           - use GNU_STACK program header for new segment (workaround for issues with strip/objcopy)
  --use-old-text                                            - re-use space in old .text if possible (relocation mode)
  -v <uint>                                                 - set verbosity level for diagnostic output

Generic Options:

  -h                                                        - Alias for --help
  --help                                                    - Display available options (--help-hidden for more)
  --help-hidden                                             - Display all available options
  --help-list                                               - Display list of available options (--help-list-hidden for more)
  --help-list-hidden                                        - Display list of all available options
  --print-all-options                                       - Print all option values after command line parsing
  --print-options                                           - Print non-default options after command line parsing
  --version                                                 - Display the version of this program

Maybe creating an issue is a good idea - it should be rather straightforward to implement this but still requires some amount of familiarization with BOLT to better understand which option is relevant for the "read-only" use case and which only influences the code generation.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my perspective.

bolt/test/binary-analysis/cmdline-args.test Outdated Show resolved Hide resolved
bolt/test/binary-analysis/lit.local.cfg Outdated Show resolved Hide resolved
@maksfb maksfb changed the title [bolt] Introduce binary analysis tool based on BOLT. [BOLT] Introduce binary analysis tool based on BOLT Dec 11, 2024
@kbeyls kbeyls merged commit ceb7214 into llvm:main Dec 12, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants