Releases: koute/bytehound
Releases · koute/bytehound
0.11.0
0.10.0
Major changes:
- Performance improvements; CPU overhead of allocation-heavy heavily multithreaded programs was cut down by up to ~80%
- You can now control whether child processes are profiled with the
MEMORY_PROFILER_TRACK_CHILD_PROCESSES
environment variable (disabled by default) - The fragmentation timeline was removed from the UI
mmap
/munmap
calls are now gathered by default (you can disable this withMEMORY_PROFILER_GATHER_MAPS
)- Total actual memory usage is now gathered by periodically polling
/proc/self/smaps
- Maps can now be browsed in the UI and analyzed through the scripting API
- Maps are now named according to their source using
PR_SET_VMA_ANON_NAME
(Linux 5.17 or newer; on older kernels this is emulated in user space) - Glibc-internal
__mmap
and__munmap
are now hooked into - Bytehound-internal allocations now exclusively use mimalloc as their allocator
- New scripting APIs:
AllocationList::only_alive_at
AllocationList::only_from_maps
Graph::start_at
Graph::end_at
Graph::show_address_space
Graph::show_rss
MapList
Map
- Removed scripting APIs:
AllocationList::only_not_deallocated_after_at_least
AllocationList::only_not_deallocated_until_at_most
Graph::truncate_until
Graph::extend_until
- Removed lifetime filters in the UI:
only_not_deallocated_in_current_range
,only_deallocated_in_current_range
- Fixed a rare crash when profiling programs using jemalloc
- Added support for
aligned_alloc
- Added support for
memalign
- Relative scale in the generated graphs is now always relative to the start of profiling
- Gathered backtraces will now include an extra Bytehound-specific frame on the bottom to indicate which function was called
- Minor improvements to the UI
0.9.0
Major changes:
- Deallocation backtraces are now gathered by default; you can use the
MEMORY_PROFILER_GRAB_BACKTRACES_ON_FREE
environment variable to turn this off - Deallocation backtraces are now shown in the GUI for each allocation
- Allocations can now be filtered according to where exactly they were deallocated
- Allocations can now be filtered according to whether the last allocation in their realloc chain was leaked or not
- Profiling of executables larger than 4GB is now supported
- Profiling of executables using unprefixed jemalloc is now supported
- New scripting APIs:
AllocationList::only_matching_deallocation_backtraces
AllocationList::only_not_matching_deallocation_backtraces
AllocationList::only_position_in_chain_at_least
AllocationList::only_position_in_chain_at_most
AllocationList::only_chain_leaked
- The
server
subcommand of the CLI should now use less memory when loading large data files - The behavior of
malloc_usable_size
when called with aNULL
argument now matches glibc - At minimum Rust 1.62 is now required to build the crates; older versions might still work, but will not be supported
- The way the profiler is initialized was reworked; this should increase compatibility and might fix some of the crashes seen when trying to profile certain programs
0.8.0
Major changes:
- Significantly lower CPU usage when temporary allocation culling is turned on
- Each thread has now its own first-level backtrace cache; this might result in higher memory usage when profiling
- The
MEMORY_PROFILER_BACKTRACE_CACHE_SIZE
environment variable knob was replaced withMEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_1
andMEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_2
to control the size of the per-thread caches and the global cache respectively - The
MEMORY_PROFILER_PRECISE_TIMESTAMPS
environment variable knob was removed (always gathering precise timestamps is fast enough on amd64) - The default value of
MEMORY_PROFILER_TEMPORARY_ALLOCATION_PENDING_THRESHOLD
is now unset, which means that the allocations will be buffered indefinitely until they're either culled or until they'll live long enough to not be eligible for culling (might increase memory usage in certain cases) - Backtraces are now not emitted for allocations which were completely culled
- You can now see whether a given allocation was made through jemalloc, and filter according to that
- You can now see when a given allocation group reached its maximum memory usage was, and filter according to that
- New scripting APIs:
Graph::show_memory_usage
Graph::show_live_allocations
Graph::show_new_allocations
Graph::show_deallocations
AllocationList::only_group_max_total_usage_first_seen_at_least
AllocationList::only_jemalloc
- New subcommand:
extract
(will unpack all of the files embedded into a given data file) - The
strip
subcommand will now not buffer allocations indefinitely when using the--threshold
option, which results in a significantly lower memory usage when stripping huge data files from long profiling runs malloc_usable_size
now works properly when compiled with thejemalloc
featurereallocarray
doesn't segfault anymore- The compilation should now work on distributions with an ancient version of Yarn
0.7.0
Major changes:
- The project was rebranded from
memory-profiler
tobytehound
- Profiling of applications using jemalloc is now fully supported (AMD64-only,
jemallocator
crate only) - Added built-in scripting capabilities which can be used for automated analysis and report generation; those can be accessed through the
script
subcommand - Added a scripting console to the GUI
- Added the ability to define programmatic filters in the GUI
- Allocation graphs are now shown in the GUI when browsing through the allocations grouped by backtraces
- Improved support for tracking and analyzing reallocations
- Improved paralellization of the analyzer's internals, which should result in snappier behavior on modern multicore machines
- The cutoff point for determining allocations' lifetime is now the end of profiling for those allocations which were never deallocated
- The
squeeze
subcommand was renamed tostrip
- You can now use the
strip
subcommand to strip away only a subset of temporary allocations - Information about allocations culled at runtime is now emitted on a per-backtrace basis during profiling
- Fixed an issue where the shadow stack based unwinding was incompatible with Rust's ABI in certain rare cases
mmap
calls are now always gathered in order (if you have enabled their gathering)- Improved runtime backtrace deduplication which should result in smaller datafiles
- Many other miscellaneous bugfixes
0.6.1
0.6.0
Major changes:
- Added a runtime backtrace cache; backtraces are now deduplicated when profiling, which results in less data being generated.
- Added automatic culling of temporary allocations when running with
MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS
set to1
. - Added support for
reallocarray
. - Added support for unwinding through JITed code, provided the JIT compiler registers its unwinding tables through
__register_frame
. - Added support for unwinding through frames which require arbitrary DWARF expressions to be evaluated when resolving register values.
- Added support for DWARF expressions that fetch memory.
- Allocations are not tracked by their addresses anymore; they're now tracked by unique IDs, which fixes a race condition when multiple threads are simultaneously allocating and deallocating memory in quick succession.
mmap
calls are now not gathered by default.- Rewrote TLS state management; some deallocations from TLS destructors which were previously missed by the profiler are now gathered.
- When profiling is disabled at runtime the profiler doesn't completely shutdown anymore, and will keep on gathering data for those allocations which were made before it was disabled; when reenabled it won't create a new file anymore and instead it will keep on writing to the same file as it did before it was disabled.
- The profiler now requires Rust nightly to compile.
0.5.0
Major changes:
- Shadow stack based unwinding is now supported on stable Rust and turned on by default.
- Systems where
perf_event_open
is unavailable (e.g. unpatched MIPS64 systems, docker containers, etc.) are now supported. - The mechanism for exception handling when using shadow stack based unwinding was completely rewritten using proper landing pads.
- Programs which call
longjmp
/setjmp
are now partially supported when using shadow stack based unwinding. - Shared objects dynamically loaded through
dlopen
are now properly handled. - Rust symbol demangling is now supported.
- Fixed an issue where calling
backtrace
on certain architectures while using shadow stack based unwinding would crash the program. - The profiler can now be compiled with the
jemalloc
feature to use jemalloc instead of the system allocator. - The profiler can now be started and stopped programmatically through
memory_profiler_start
andmemory_profiler_stop
functions exported bylibmemory_profiler.so
. Those are equivalent to controlling the profiler through signals.
0.4.0
Major changes:
- The profiler can now be compiled on Rust stable, with the caveat that the shadow stack based unwinding will be then disabled.
- The profiler is now fully lazily initialized; if disabled with
MEMORY_PROFILER_DISABLE_BY_DEFAULT
the profiler will not initialize itself nor create an output file. - The signal handler registration can now be disabled with
MEMORY_PROFILER_REGISTER_SIGUSR1
andMEMORY_PROFILER_REGISTER_SIGUSR2
. - When the profiling is disabled at runtime it will more thoroughly deinitialize itself, and when reenabled it will create a new output file instead of continuing to write data to the old one.
- The embedded server is now disabled by default and can be reenabled with the
MEMORY_PROFILER_ENABLE_SERVER
environment variable. - The base port of the embedded server can now be set with the
MEMORY_PROILER_BASE_SERVER_PORT
environment variable. - The
MEMORY_PROFILER_OUTPUT
now supports an%n
placeholder. - The GUI has now a graph which shows allocations and deallocations per second.
0.3.0
Major changes:
- More performance improvements. In the average case the cost per a single allocation was cut down to approximately 75%. Every thread has now its own unwind context, so stack traces can be now gathered in parallel.
- The profiler should no longer crash on systems with a recent version of
libstdc++
when a C++ exception is thrown.