diff --git a/bolt/docs/CommandLineArgumentReference.md b/bolt/docs/CommandLineArgumentReference.md
new file mode 100644
index 0000000000000..1951ad5a2dc5e
--- /dev/null
+++ b/bolt/docs/CommandLineArgumentReference.md
@@ -0,0 +1,1213 @@
+# BOLT - a post-link optimizer developed to speed up large applications
+
+## SYNOPSIS
+
+`llvm-bolt <executable> [-o outputfile] <executable>.bolt [-data=perf.fdata] [options]`
+
+## OPTIONS
+
+### Generic options
+
+- `-h`
+
+  Alias for `--help`
+
+- `--help`
+
+  Display available options (`--help-hidden` for more).
+
+- `--help-hidden`
+
+  Display all available options.
+
+- `--help-list`
+
+  Display list of available options (`--help-list-hidden` for more).
+
+- `--help-list-hidden`
+
+  Display list of all available options.
+
+- `--print-all-options`
+
+  Print all option values after command line parsing.
+
+- `--print-options`
+
+  Print non-default options after command line parsing.
+
+- `--version`
+
+  Display the version of this program.
+
+### Output options
+
+- `-o <string>`
+
+  output file
+
+- `-w <string>`
+
+  Save recorded profile to a file
+
+### BOLT generic options
+
+- `--align-text=<uint>`
+
+  Alignment of .text section
+
+- `--allow-stripped`
+
+  Allow processing of stripped binaries
+
+- `--asm-dump[=<dump folder>]`
+
+  Dump function into assembly
+
+- `-b`
+
+  Alias for -data
+
+- `--bolt-id=<string>`
+
+  Add any string to tag this execution in the output binary via bolt info section
+
+- `--break-funcs=<func1,func2,func3,...>`
+
+  List of functions to core dump on (debugging)
+
+- `--check-encoding`
+
+  Perform verification of LLVM instruction encoding/decoding. Every instruction
+  in the input is decoded and re-encoded. If the resulting bytes do not match
+  the input, a warning message is printed.
+
+- `--cu-processing-batch-size=<uint>`
+
+  Specifies the size of batches for processing CUs. Higher number has better
+  performance, but more memory usage. Default value is 1.
+
+- `--data=<string>`
+
+  <data file>
+
+- `--debug-skeleton-cu`
+
+  Prints out offsets for abbrev and debug_info of Skeleton CUs that get patched.
+
+- `--deterministic-debuginfo`
+
+  Disables parallel execution of tasks that may produce nondeterministic debug info
+
+- `--dot-tooltip-code`
+
+  Add basic block instructions as tool tips on nodes
+
+- `--dump-cg=<string>`
+
+  Dump callgraph to the given file
+
+- `--dump-data`
+
+  Dump parsed bolt data for debugging
+
+- `--dump-dot-all`
+
+  Dump function CFGs to graphviz format after each stage; enable '-print-loops'
+  for color-coded blocks
+
+- `--dump-orc`
+
+  Dump raw ORC unwind information (sorted)
+
+- `--dwarf-output-path=<string>`
+
+  Path to where .dwo files or dwp file will be written out to.
+
+- `--dwp=<string>`
+
+  Path and name to DWP file.
+
+- `--dyno-stats`
+
+  Print execution info based on profile
+
+- `--dyno-stats-all`
+
+  Print dyno stats after each stage
+
+- `--dyno-stats-scale=<uint>`
+
+  Scale to be applied while reporting dyno stats
+
+- `--enable-bat`
+
+  Write BOLT Address Translation tables
+
+- `--force-data-relocations`
+
+  Force relocations to data sections to always be processed
+
+- `--force-patch`
+
+  Force patching of original entry points
+
+- `--funcs=<func1,func2,func3,...>`
+
+  Limit optimizations to functions from the list
+
+- `--funcs-file=<string>`
+
+  File with list of functions to optimize
+
+- `--funcs-file-no-regex=<string>`
+
+  File with list of functions to optimize (non-regex)
+
+- `--funcs-no-regex=<func1,func2,func3,...>`
+
+  Limit optimizations to functions from the list (non-regex)
+
+- `--hot-data`
+
+  Hot data symbols support (relocation mode)
+
+- `--hot-functions-at-end`
+
+  If reorder-functions is used, order functions putting hottest last
+
+- `--hot-text`
+
+  Generate hot text symbols. Apply this option to a precompiled binary that
+  manually calls into hugify, such that at runtime hugify call will put hot
+  code into 2M pages. This requires relocation.
+
+- `--hot-text-move-sections=<sec1,sec2,sec3,...>`
+
+  List of sections containing functions used for hugifying hot text. BOLT makes
+  sure these functions are not placed on the same page as the hot text.
+  (default='.stub,.mover').
+
+- `--insert-retpolines`
+
+  Run retpoline insertion pass
+
+- `--keep-aranges`
+
+  Keep or generate .debug_aranges section if .gdb_index is written
+
+- `--keep-tmp`
+
+  Preserve intermediate .o file
+
+- `--lite`
+
+  Skip processing of cold functions
+
+- `--max-data-relocations=<uint>`
+
+  Maximum number of data relocations to process
+
+- `--max-funcs=<uint>`
+
+  Maximum number of functions to process
+
+- `--no-huge-pages`
+
+  Use regular size pages for code alignment
+
+- `--no-threads`
+
+  Disable multithreading
+
+- `--pad-funcs=<func1:pad1,func2:pad2,func3:pad3,...>`
+
+  List of functions to pad with amount of bytes
+
+- `--profile-format=<value>`
+
+  Format to dump profile output in aggregation mode, default is fdata
+  - `=fdata`: offset-based plaintext format
+  - `=yaml`: dense YAML representation
+
+- `--r11-availability=<value>`
+
+  Determine the availability of r11 before indirect branches
+  - `=never`: r11 not available
+  - `=always`: r11 available before calls and jumps
+  - `=abi`r11 available before calls but not before jumps
+
+- `--relocs`
+
+  Use relocations in the binary (default=autodetect)
+
+- `--remove-symtab`
+
+  Remove .symtab section
+
+- `--reorder-skip-symbols=<symbol1,symbol2,symbol3,...>`
+
+  List of symbol names that cannot be reordered
+
+- `--reorder-symbols=<symbol1,symbol2,symbol3,...>`
+
+  List of symbol names that can be reordered
+
+- `--retpoline-lfence`
+
+  Determine if lfence instruction should exist in the retpoline
+
+- `--skip-funcs=<func1,func2,func3,...>`
+
+  List of functions to skip
+
+- `--skip-funcs-file=<string>`
+
+  File with list of functions to skip
+
+- `--strict`
+
+  Trust the input to be from a well-formed source
+
+- `--tasks-per-thread=<uint>`
+
+  Number of tasks to be created per thread
+
+- `--thread-count=<uint>`
+
+  Number of threads
+
+- `--top-called-limit=<uint>`
+
+  Maximum number of functions to print in top called functions section
+
+- `--trap-avx512`
+
+  In relocation mode trap upon entry to any function that uses AVX-512 instructions
+
+- `--trap-old-code`
+
+  Insert traps in old function bodies (relocation mode)
+
+- `--update-debug-sections`
+
+  Update DWARF debug sections of the executable
+
+- `--use-gnu-stack`
+
+  Use GNU_STACK program header for new segment (workaround for issues with
+  strip/objcopy)
+
+- `--use-old-text`
+
+  Re-use space in old .text if possible (relocation mode)
+
+- `-v <uint>`
+
+  Set verbosity level for diagnostic output
+
+- `--write-dwp`
+
+  Output a single dwarf package file (dwp) instead of multiple non-relocatable
+  dwarf object files (dwo).
+
+### BOLT optimization options
+
+- `--align-blocks`
+
+  Align basic blocks
+
+- `--align-blocks-min-size=<uint>`
+
+  Minimal size of the basic block that should be aligned
+
+- `--align-blocks-threshold=<uint>`
+
+  Align only blocks with frequency larger than containing function execution
+  frequency specified in percent. E.g. 1000 means aligning blocks that are 10
+  times more frequently executed than the containing function.
+
+- `--align-functions=<uint>`
+
+  Align functions at a given value (relocation mode)
+
+- `--align-functions-max-bytes=<uint>`
+
+  Maximum number of bytes to use to align functions
+
+- `--assume-abi`
+
+  Assume the ABI is never violated
+
+- `--block-alignment=<uint>`
+
+  Boundary to use for alignment of basic blocks
+
+- `--bolt-seed=<uint>`
+
+  Seed for randomization
+
+- `--cg-from-perf-data`
+
+  Use perf data directly when constructing the call graph for stale functions
+
+- `--cg-ignore-recursive-calls`
+
+  Ignore recursive calls when constructing the call graph
+
+- `--cg-use-split-hot-size`
+
+  Use hot/cold data on basic blocks to determine hot sizes for call graph functions
+
+- `--cold-threshold=<uint>`
+
+  Tenths of percents of main entry frequency to use as a threshold when
+  evaluating whether a basic block is cold (0 means it is only considered
+  cold if the block has zero samples). Default: 0
+
+- `--elim-link-veneers`
+
+  Run veneer elimination pass
+
+- `--eliminate-unreachable`
+
+  Eliminate unreachable code
+
+- `--equalize-bb-counts`
+
+  Use same count for BBs that should have equivalent count (used in non-LBR
+  and shrink wrapping)
+
+- `--execution-count-threshold=<uint>`
+
+  Perform profiling accuracy-sensitive optimizations only if function execution
+  count >= the threshold (default: 0)
+
+- `--fix-block-counts`
+
+  Adjust block counts based on outgoing branch counts
+
+- `--fix-func-counts`
+
+  Adjust function counts based on basic blocks execution count
+
+- `--force-inline=<func1,func2,func3,...>`
+
+  List of functions to always consider for inlining
+
+- `--frame-opt=<value>`
+
+  Optimize stack frame accesses
+  - `none`: do not perform frame optimization
+  - `hot`: perform FOP on hot functions
+  - `all`: perform FOP on all functions
+
+- `--frame-opt-rm-stores`
+
+  Apply additional analysis to remove stores (experimental)
+
+- `--function-order=<string>`
+
+  File containing an ordered list of functions to use for function reordering
+
+- `--generate-function-order=<string>`
+
+  File to dump the ordered list of functions to use for function reordering
+
+- `--generate-link-sections=<string>`
+
+  Generate a list of function sections in a format suitable for inclusion in a
+  linker script
+
+- `--group-stubs`
+
+  Share stubs across functions
+
+- `--hugify`
+
+  Automatically put hot code on 2MB page(s) (hugify) at runtime. No manual call
+  to hugify is needed in the binary (which is what --hot-text relies on).
+
+- `--icf`
+
+  Fold functions with identical code
+
+- `--icp`
+
+  Alias for --indirect-call-promotion
+
+- `--icp-calls-remaining-percent-threshold=<uint>`
+
+  The percentage threshold against remaining unpromoted indirect call count
+  for the promotion for calls
+
+- `--icp-calls-topn`
+
+  Alias for --indirect-call-promotion-calls-topn
+
+- `--icp-calls-total-percent-threshold=<uint>`
+
+  The percentage threshold against total count for the promotion for calls
+
+- `--icp-eliminate-loads`
+
+  Enable load elimination using memory profiling data when performing ICP
+
+- `--icp-funcs=<func1,func2,func3,...>`
+
+  List of functions to enable ICP for
+
+- `--icp-inline`
+
+  Only promote call targets eligible for inlining
+
+- `--icp-jt-remaining-percent-threshold=<uint>`
+
+  The percentage threshold against remaining unpromoted indirect call count for
+  the promotion for jump tables
+
+- `--icp-jt-targets`
+
+  Alias for --icp-jump-tables-targets
+
+- `--icp-jt-topn`
+
+  Alias for --indirect-call-promotion-jump-tables-topn
+
+- `--icp-jt-total-percent-threshold=<uint>`
+
+  The percentage threshold against total count for the promotion for jump tables
+
+- `--icp-jump-tables-targets`
+
+  For jump tables, optimize indirect jmp targets instead of indices
+
+- `--icp-mp-threshold`
+
+  Alias for --indirect-call-promotion-mispredict-threshold
+
+- `--icp-old-code-sequence`
+
+  Use old code sequence for promoted calls
+
+- `--icp-top-callsites=<uint>`
+
+  Optimize hottest calls until at least this percentage of all indirect calls
+  frequency is covered. 0 = all callsites
+
+- `--icp-topn`
+
+  Alias for --indirect-call-promotion-topn
+
+- `--icp-use-mp`
+
+  Alias for --indirect-call-promotion-use-mispredicts
+
+- `--indirect-call-promotion=<value>`
+
+  Indirect call promotion
+  - `none`: do not perform indirect call promotion
+  - `calls`: perform ICP on indirect calls
+  - `jump-tables`: perform ICP on jump tables
+  - `all`: perform ICP on calls and jump tables
+
+- `--indirect-call-promotion-calls-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion on
+  calls. 0 = no limit
+
+- `--indirect-call-promotion-jump-tables-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion on
+  jump tables. 0 = no limit
+
+- `--indirect-call-promotion-mispredict-threshold=<uint>`
+
+  Misprediction threshold for skipping ICP on an indirect call
+
+- `--indirect-call-promotion-topn=<uint>`
+
+  Limit number of targets to consider when doing indirect call promotion.
+  0 = no limit
+
+- `--indirect-call-promotion-use-mispredicts`
+
+  Use misprediction frequency for determining whether or not ICP should be
+  applied at a callsite. The `-indirect-call-promotion-mispredict-threshold`
+  value will be used by this heuristic
+
+- `--infer-fall-throughs`
+
+  Infer execution count for fall-through blocks
+
+- `--infer-stale-profile`
+
+  Infer counts from stale profile data.
+
+- `--inline-all`
+
+  Inline all functions
+
+- `--inline-ap`
+
+  Adjust function profile after inlining
+
+- `--inline-limit=<uint>`
+
+  Maximum number of call sites to inline
+
+- `--inline-max-iters=<uint>`
+
+  Maximum number of inline iterations
+
+- `--inline-memcpy`
+
+  Inline memcpy using 'rep movsb' instruction (X86-only)
+
+- `--inline-small-functions`
+
+  Inline functions if increase in size is less than defined by `-inline-small-functions-bytes`
+
+- `--inline-small-functions-bytes=<uint>`
+
+  Max number of bytes for the function to be considered small for inlining purposes
+
+- `--instrument`
+
+  Instrument code to generate accurate profile data
+
+- `--iterative-guess`
+
+  In non-LBR mode, guess edge counts using iterative technique
+
+- `--jt-footprint-optimize-for-icache`
+
+  With jt-footprint-reduction, only process PIC jumptables and turn off other
+  transformations that increase code size
+
+- `--jt-footprint-reduction`
+
+  Make jump tables size smaller at the cost of using more instructions at jump
+  sites
+
+- `-jump-tables=<value>`
+
+  Jump tables support (default=basic)
+  - `none`: do not optimize functions with jump tables
+  - `basic`: optimize functions with jump tables
+  - `move`: move jump tables to a separate section
+  - `split`: split jump tables section into hot and cold based on function
+  execution frequency
+  - `aggressive`: aggressively split jump tables section based on usage of the
+  tables
+
+- `--keep-nops`
+
+  Keep no-op instructions. By default they are removed.
+
+- `--lite-threshold-count=<uint>`
+
+  Similar to '-lite-threshold-pct' but specify threshold using absolute function
+  call count. I.e. limit processing to functions executed at least the specified
+  number of times.
+
+- `--lite-threshold-pct=<uint>`
+
+  Threshold (in percent) for selecting functions to process in lite mode. Higher
+  threshold means fewer functions to process. E.g threshold of 90 means only top
+  10 percent of functions with profile will be processed.
+
+- `--mcf-use-rarcs`
+
+  In MCF, consider the possibility of cancelling flow to balance edges
+
+- `--memcpy1-spec=<func1,func2:cs1:cs2,func3:cs1,...>`
+
+  List of functions with call sites for which to specialize memcpy() for size 1
+
+- `--min-branch-clusters`
+
+  Use a modified clustering algorithm geared towards minimizing branches
+
+- `--no-inline`
+
+  Disable all inlining (overrides other inlining options)
+
+- `--no-scan`
+
+  Do not scan cold functions for external references (may result in slower binary)
+
+- `--peepholes=<value>`
+
+  Enable peephole optimizations
+  - `none`: disable peepholes
+  - `double-jumps`: remove double jumps when able
+  - `tailcall-traps`: insert tail call traps
+  - `useless-branches`: remove useless conditional branches
+  - `all`: enable all peephole optimizations
+
+- `--plt=<value>`
+
+  Optimize PLT calls (requires linking with -znow)
+  - `none`: do not optimize PLT calls
+  - `hot`: optimize executed (hot) PLT calls
+  - `all`: optimize all PLT calls
+
+- `--preserve-blocks-alignment`
+
+  Try to preserve basic block alignment
+
+- `--profile-ignore-hash`
+
+  Ignore hash while reading function profile
+
+- `--profile-use-dfs`
+
+  Use DFS order for YAML profile
+
+- `--reg-reassign`
+
+  Reassign registers so as to avoid using REX prefixes in hot code
+
+- `--reorder-blocks=<value>`
+
+  Change layout of basic blocks in a function
+  - `none`: do not reorder basic blocks
+  - `reverse`: layout blocks in reverse order
+  - `normal`: perform optimal layout based on profile
+  - `branch-predictor`: perform optimal layout prioritizing branch predictions
+  - `cache`: perform optimal layout prioritizing I-cache behavior
+  - `cache+`: perform layout optimizing I-cache behavior
+  - `ext-tsp`: perform layout optimizing I-cache behavior
+  - `cluster-shuffle`: perform random layout of clusters
+
+- `--reorder-data=<section1,section2,section3,...>`
+
+  List of sections to reorder
+
+- `--reorder-data-algo=<value>`
+
+  Algorithm used to reorder data sections
+  - `count`: sort hot data by read counts
+  - `funcs`: sort hot data by hot function usage and count
+
+- `--reorder-data-inplace`
+
+  Reorder data sections in place
+
+- `--reorder-data-max-bytes=<uint>`
+
+  Maximum number of bytes to reorder
+
+- `--reorder-data-max-symbols=<uint>`
+
+  Maximum number of symbols to reorder
+
+- `--reorder-functions=<value>`
+
+  Reorder and cluster functions (works only with relocations)
+  - `none`: do not reorder functions
+  - `exec-count`: order by execution count
+  - `hfsort`: use hfsort algorithm
+  - `hfsort+`: use hfsort+ algorithm
+  - `cdsort`: use cache-directed sort
+  - `pettis-hansen`: use Pettis-Hansen algorithm
+  - `random`: reorder functions randomly
+  - `user`: use function order specified by -function-order
+
+- `--reorder-functions-use-hot-size`
+
+  Use a function's hot size when doing clustering
+
+- `--report-bad-layout=<uint>`
+
+  Print top <uint> functions with suboptimal code layout on input
+
+- `--report-stale`
+
+  Print the list of functions with stale profile
+
+- `--runtime-hugify-lib=<string>`
+
+  Specify file name of the runtime hugify library
+
+- `--runtime-instrumentation-lib=<string>`
+
+  Specify file name of the runtime instrumentation library
+
+- `--sctc-mode=<value>`
+
+  Mode for simplify conditional tail calls
+  - `always`: always perform sctc
+  - `preserve`: only perform sctc when branch direction is preserved
+  - `heuristic`: use branch prediction data to control sctc
+
+- `--sequential-disassembly`
+
+  Performs disassembly sequentially
+
+- `--shrink-wrapping-threshold=<uint>`
+
+  Percentage of prologue execution count to use as threshold when evaluating
+  whether a block is cold enough to be profitable to move eligible spills there
+
+- `--simplify-conditional-tail-calls`
+
+  Simplify conditional tail calls by removing unnecessary jumps
+
+- `--simplify-rodata-loads`
+
+  Simplify loads from read-only sections by replacing the memory operand with
+  the constant found in the corresponding section
+
+- `--split-align-threshold=<uint>`
+
+  When deciding to split a function, apply this alignment while doing the size
+  comparison (see -split-threshold). Default value: 2.
+
+- `--split-all-cold`
+
+  Outline as many cold basic blocks as possible
+
+- `--split-eh`
+
+  Split C++ exception handling code
+
+- `--split-functions`
+
+  Split functions into fragments
+
+- `--split-strategy=<value>`
+
+  Strategy used to partition blocks into fragments
+
+  - `profile2`: split each function into a hot and cold fragment using
+  profiling information
+  - `cdsplit`: split each function into a hot, warm, and cold fragment using
+  profiling information
+  - `random2`: split each function into a hot and cold fragment at a randomly
+  chosen split point (ignoring any available profiling information)
+  - `randomN`: split each function into N fragments at randomly chosen split
+  points (ignoring any available profiling information)
+  - `all`: split all basic blocks of each function into fragments such that
+  each fragment contains exactly a single basic block
+
+- `--split-threshold=<uint>`
+
+  Split function only if its main size is reduced by more than given amount of
+  bytes. Default value: 0, i.e. split iff the size is reduced. Note that on
+  some architectures the size can increase after splitting.
+
+- `--stale-matching-max-func-size=<uint>`
+
+  The maximum size of a function to consider for inference.
+
+- `--stale-threshold=<uint>`
+
+  Maximum percentage of stale functions to tolerate (default: 100)
+
+- `--stoke`
+
+  Turn on the stoke analysis
+
+- `--strip-rep-ret`
+
+  Strip 'repz' prefix from 'repz retq' sequence (on by default)
+
+- `--tail-duplication=<value>`
+
+  Duplicate unconditional branches that cross a cache line
+
+  - `none` do not apply
+  - `aggressive` aggressive strategy
+  - `moderate` moderate strategy
+  - `cache` cache-aware duplication strategy
+
+- `--tsp-threshold=<uint>`
+
+  Maximum number of hot basic blocks in a function for which to use a precise TSP solution while re-ordering basic blocks
+
+- `--use-aggr-reg-reassign`
+
+  Use register liveness analysis to try to find more opportunities for -reg-reassign optimization
+
+- `--use-compact-aligner`
+
+  Use compact approach for aligning functions
+
+- `--use-edge-counts`
+
+  Use edge count data when doing clustering
+
+- `--verify-cfg`
+
+  Verify the CFG after every pass
+
+- `--x86-align-branch-boundary-hot-only`
+
+  Only apply branch boundary alignment in hot code
+
+- `--x86-strip-redundant-address-size`
+
+  Remove redundant Address-Size override prefix
+
+### BOLT options in relocation mode
+
+- `-align-macro-fusion=<value>`
+
+  Fix instruction alignment for macro-fusion (x86 relocation mode)
+
+  - `none`: do not insert alignment no-ops for macro-fusion
+  - `hot`: only insert alignment no-ops on hot execution paths (default)
+  - `all`: always align instructions to allow macro-fusion
+
+### BOLT instrumentation options
+
+`llvm-bolt <executable> -instrument [-o outputfile] <instrumented-executable>`
+
+- `--conservative-instrumentation`
+
+  Disable instrumentation optimizations that sacrifice profile accuracy (for
+  debugging, default: false)
+
+- `--instrument-calls`
+
+  Record profile for inter-function control flow activity (default: true)
+
+- `--instrument-hot-only`
+
+  Only insert instrumentation on hot functions (needs profile, default: false)
+
+- `--instrumentation-binpath=<string>`
+
+  Path to instrumented binary in case if /proc/self/map_files is not accessible
+  due to access restriction issues
+
+- `--instrumentation-file=<string>`
+
+  File name where instrumented profile will be saved (default: /tmp/prof.fdata)
+
+- `--instrumentation-file-append-pid`
+
+  Append PID to saved profile file name (default: false)
+
+- `--instrumentation-no-counters-clear`
+
+  Don't clear counters across dumps (use with `instrumentation-sleep-time` option)
+
+- `--instrumentation-sleep-time=<uint>`
+
+  Interval between profile writes (default: 0 = write only at program end).
+  This is useful for service workloads when you want to dump profile every X
+  minutes or if you are killing the program and the profile is not being
+  dumped at the end.
+
+- `--instrumentation-wait-forks`
+
+  Wait until all forks of instrumented process will finish (use with
+  `instrumentation-sleep-time` option)
+
+### Data aggregation options (perf2bolt)
+
+`perf2bolt -p perf.data [-o outputfile] perf.fdata <executable>`
+
+- `--autofdo`
+
+  Generate autofdo textual data instead of bolt data
+
+- `--filter-mem-profile`
+
+  If processing a memory profile, filter out stack or heap accesses that won't
+  be useful for BOLT to reduce profile file size
+
+- `--ignore-build-id`
+
+  Continue even if build-ids in input binary and perf.data mismatch
+
+- `--ignore-interrupt-lbr`
+
+  Ignore kernel interrupt LBR that happens asynchronously
+
+- `--itrace=<string>`
+
+  Generate LBR info with perf itrace argument
+
+- `--nl`
+
+  Aggregate basic samples (without LBR info)
+
+- `--pa`
+
+  Skip perf and read data from a pre-aggregated file format
+
+- `--perfdata=<string>`
+
+  Data file
+
+- `--pid=<ulong>`
+
+  Only use samples from process with specified PID
+
+- `--time-aggr`
+
+  Time BOLT aggregator
+
+- `--use-event-pc`
+
+  Use event PC in combination with LBR sampling
+
+### BOLT printing options
+
+#### Generic options
+
+- `--print-aliases`
+
+  Print aliases when printing objects
+
+- `--print-all`
+
+  Print functions after each stage
+
+- `--print-cfg`
+
+  Print functions after CFG construction
+
+- `--print-debug-info`
+
+  Print debug info when printing functions
+
+- `--print-disasm`
+
+  Print function after disassembly
+
+- `--print-dyno-opcode-stats=<uint>`
+
+  Print per instruction opcode dyno stats and the functionnames:BB offsets of
+  the nth highest execution counts
+
+- `--print-dyno-stats-only`
+
+  While printing functions output dyno-stats and skip instructions
+
+- `--print-exceptions`
+
+  Print exception handling data
+
+- `--print-globals`
+
+  Print global symbols after disassembly
+
+- `--print-jump-tables`
+
+  Print jump tables
+
+- `--print-loops`
+
+  Print loop related information
+
+- `--print-mem-data`
+
+  Print memory data annotations when printing functions
+
+- `--print-normalized`
+
+  Print functions after CFG is normalized
+
+- `--print-only=<func1,func2,func3,...>`
+
+  List of functions to print
+
+- `--print-orc`
+
+  Print ORC unwind information for instructions
+
+- `--print-profile`
+
+  Print functions after attaching profile
+
+- `--print-profile-stats`
+
+  Print profile quality/bias analysis
+
+- `--print-pseudo-probes=<value>`
+
+  Print pseudo probe info
+  - `=decode`: decode probes section from binary
+  - `=address_conversion`: update address2ProbesMap with output block address
+  - `=encoded_probes`: display the encoded probes in binary section
+  - `=all`: enable all debugging printout
+
+- `--print-relocations`
+
+  Print relocations when printing functions/objects
+
+- `--print-reordered-data`
+
+  Print section contents after reordering
+
+- `--print-retpoline-insertion`
+
+  Print functions after retpoline insertion pass
+
+- `--print-sdt`
+
+  Print all SDT markers
+
+- `--print-sections`
+
+  Print all registered sections
+
+- `--print-unknown`
+
+  Print names of functions with unknown control flow
+
+- `--time-opts`
+
+  Print time spent in each optimization
+
+#### Optimization options
+
+- `--print-after-branch-fixup`
+
+  Print function after fixing local branches
+
+- `--print-after-jt-footprint-reduction`
+
+  Print function after jt-footprint-reduction pass
+
+- `--print-after-lowering`
+
+  Print function after instruction lowering
+
+- `--print-cache-metrics`
+
+  Calculate and print various metrics for instruction cache
+
+- `--print-clusters`
+
+  Print clusters
+
+- `--print-finalized`
+
+  Print function after CFG is finalized
+
+- `--print-fix-relaxations`
+
+  Print functions after fix relaxations pass
+
+- `--print-fix-riscv-calls`
+
+  Print functions after fix RISCV calls pass
+
+- `--print-fop`
+
+  Print functions after frame optimizer pass
+
+- `--print-function-statistics=<uint>`
+
+  Print statistics about basic block ordering
+
+- `--print-icf`
+
+  Print functions after ICF optimization
+
+- `--print-icp`
+
+  Print functions after indirect call promotion
+
+- `--print-inline`
+
+  Print functions after inlining optimization
+
+- `--print-longjmp`
+
+  Print functions after longjmp pass
+
+- `--print-optimize-bodyless`
+
+  Print functions after bodyless optimization
+
+- `--print-output-address-range`
+
+  Print output address range for each basic block in the function
+  whenBinaryFunction::print is called
+
+- `--print-peepholes`
+
+  Print functions after peephole optimization
+
+- `--print-plt`
+
+  Print functions after PLT optimization
+
+- `--print-regreassign`
+
+  Print functions after regreassign pass
+
+- `--print-reordered`
+
+  Print functions after layout optimization
+
+- `--print-reordered-functions`
+
+  Print functions after clustering
+
+- `--print-sctc`
+
+  Print functions after conditional tail call simplification
+
+- `--print-simplify-rodata-loads`
+
+  Print functions after simplification of RO data loads
+
+- `--print-sorted-by=<value>`
+
+  Print functions sorted by order of dyno stats
+  - `executed-forward-branches`: executed forward branches
+  - `taken-forward-branches`: taken forward branches
+  - `executed-backward-branches`: executed backward branches
+  - `taken-backward-branches`: taken backward branches
+  - `executed-unconditional-branches`: executed unconditional branches
+  - `all-function-calls`: all function calls
+  - `indirect-calls`: indirect calls
+  - `PLT-calls`: PLT calls
+  - `executed-instructions`: executed instructions
+  - `executed-load-instructions`: executed load instructions
+  - `executed-store-instructions`: executed store instructions
+  - `taken-jump-table-branches`: taken jump table branches
+  - `taken-unknown-indirect-branches`: taken unknown indirect branches
+  - `total-branches`: total branches
+  - `taken-branches`: taken branches
+  - `non-taken-conditional-branches`: non-taken conditional branches
+  - `taken-conditional-branches`: taken conditional branches
+  - `all-conditional-branches`: all conditional branches
+  - `linker-inserted-veneer-calls`: linker-inserted veneer calls
+  - `all`: sorted by all names
+
+- `--print-sorted-by-order=<value>`
+
+  Use ascending or descending order when printing functions ordered by dyno stats
+
+- `--print-split`
+
+  Print functions after code splitting
+
+- `--print-stoke`
+
+  Print functions after stoke analysis
+
+- `--print-uce`
+
+  Print functions after unreachable code elimination
+
+- `--print-veneer-elimination`
+
+  Print functions after veneer elimination pass
+
+- `--time-build`
+
+  Print time spent constructing binary functions
+
+- `--time-rewrite`
+
+  Print time spent in rewriting passes