-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SONiC core dump utility #3499
base: master
Are you sure you want to change the base?
SONiC core dump utility #3499
Conversation
- Install systemd-coredump in base o/s - Remove existing simple coredump facility - Enable persistent journald to store coredump history - Minimal default coredump configuration - Added a coredump-config service to generate coredump configuration Core files generated by kernel are created by the host o/s and also stored on host o/s. This applies for processes running inside container as well. Containers do not have access to journal as well as core files. Containers are supposed to have limited access to host o/s and core files and journal may contain sensitive information. Toimprove debugging of crashes inside a container following changes are made: - when INSTALL_DEBUG_TOOLS=y is set in the build. systemd-coredump tool is installed in all containers beside gdb - When SONIC_DEBUGGING_ON=y is set in the build, /var/log/journal and /var/lib/systemd/coredump are mapped inside container To inspect a core file, from a container shell, issue below commands docker exec -ti <container-name> /bin/bash
Refer to |
slave.mk
Outdated
@@ -637,6 +637,7 @@ $(addprefix $(TARGET_PATH)/, $(SONIC_INSTALLERS)) : $(TARGET_PATH)/% : \ | |||
export sonic_asic_platform="$(patsubst %-$(CONFIGURED_ARCH),%,$(CONFIGURED_PLATFORM))" | |||
export enable_organization_extensions="$(ENABLE_ORGANIZATION_EXTENSIONS)" | |||
export enable_dhcp_graph_service="$(ENABLE_DHCP_GRAPH_SERVICE)" | |||
export sonic_debugging_on="$(SONIC_DEBUGGING_ON)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason to separate this from INSTALL_DEBUG_TOOLS
? Maybe we can combine them both into a BUILD_DEBUG_IMAGE
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will always be cases where we would want to do something separate for INSTALL_DEBUG_TOOLS compared to SONIC_DEBUGGING_ON. The idea I followed was, INSTALL_DEBUG_TOOLS will install additional debug tools (coredumpctl) installed with in the container. This adds some size to the container. So users may not want to do this bit and do it if they really want to debug using gdb.
SONIC_DEBUGGING_ON, will allow the container to have access to /var/log/journal so that it can find matching core reports.
You are correct we will need both for running the gdb, but wanted to keep them separate to give flexibility to the user.
|
||
DISABLE_COREDUMP_CONF="/etc/sysctl.d/50-disable-coredump.conf" | ||
|
||
if [ "$(redis-cli -n 4 HGET "COREDUMP|config" "enabled")" = "false" ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If key is present && carry value as false, then we disable. In other words the default behavior is "enabled=true". To disable, one has to create this key explicitly.
Instead, why not require a key for disabling, which would imply default is enabled.
"COREDUMP|disabled" == true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is common place to see configuration knobs with a positive intent. So if we flip this around, it may confuse the user with other usages of true/false kind of configurations.
DISABLE_COREDUMP_CONF="/etc/sysctl.d/50-disable-coredump.conf" | ||
|
||
if [ "$(redis-cli -n 4 HGET "COREDUMP|config" "enabled")" = "false" ] ; then | ||
echo "kernel.core_pattern=" > ${DISABLE_COREDUMP_CONF} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean ?
Can you please explain the impact of disable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the coredump admin mode is disabled in config db, core files will not be generated. We are creating a sysctl entry to disable core dump.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give me a use case, where you would not want core file dump ?
Core file dump implies that some unexpected error occurred or user explicitly creating one with kill for a reason. In either case, the dump is required for analysis.
If this is the only purpose of coredump-config.service, I don't see a need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- corefiles can be a space hog
- Multiple corefiles may be generated if a process is in infinite loop
- corefiles may contain sensitive information, so some applications may not want it to be recorded.
- We plan to re-use coredump-config.service for enable/disable of kernel coredump as well. Also there may be additional parameters that you would like to configure w.r.t core files (e.g limit on the size of core file). Current mode of operation is we chose some fixed numbers. But future extensions may make them configurable. To start with, we are providing a framework to enable/disable the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already another initiative to do core file rotation and limit the count of core files at any time per process. Turning off is definitely not the solution.
The limitation on count / size is also from Broadcom only. I need to look for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact you are referring that PR #468, which has the following.
" a. Support per-process core file rotation and archiving to optimize disk space "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am referring to various configurations provided by systemd-coredump. Below is a link to it.
https://www.freedesktop.org/software/systemd/man/coredump.conf.html
User's might want some bits of it be part of ConfigDB.
The PR I was referring to is PR#729 which enables kernel core dump feature. For this feature, it is desirable that users have an enable/disable knob as kdump requires dedicated 512MB of memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me add Guohan to this thread. What we need is the "requirement/use case"? I don't see one.
I find it rather risky to have a DB variable to control, as it could get saved and persist across reboots, transparently, which is a big risk as it is likely to block core dumps unintentionally.
I would rather have this ability as a CLI tool, which disables temporarily and all should be back to default/enabled state upon reboot.
What we do need is the ability to limit count of cores per process and overall disk size taken by cores.
@@ -11,7 +11,9 @@ VIM = vim | |||
OPENSSH = openssh-client | |||
SSHPASS = sshpass | |||
STRACE = strace | |||
$(DOCKER_BASE_STRETCH)_DBG_IMAGE_PACKAGES += $(GDB) $(GDBSERVER) $(VIM) $(OPENSSH) $(SSHPASS) $(STRACE) | |||
SYSTEMD_COREDUMP = systemd-coredump |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W/o this package installed, will there be core dumps created?
Plus this is already installed in build_debian.sh unconditionally. Can you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W/O this package installed inside the container, core dumps will still be created.
Core files are always generated on host o/s and stored in the /var/lib/systemd/coredump directory.
The application that has crashed may be part of a container. So if you want to run gdb using the coredumpctl gdb command, it will not find the application binary when executed on host o/s.
So, we map the /var/lib/systemd/coredump directory to the containers (see change in docker_image_ctl.j2) and also install the coredumpctl tool here. Now, the corefile and the handy coredumpctl tool are ready for debugging the application inside the container.
retest vsimage please |
- What I did
Added new way to collect and manage application core files.
- How I did it
Core files generated by kernel are created by the host o/s and also stored
on host o/s. This applies for processes running inside container as well.
Containers do not have access to journal as well as core files. Containers
are supposed to have limited access to host o/s and core files and journal
may contain sensitive information.
To improve debugging of crashes inside a container following changes are made:
Install systemd-coredump in base o/s
Remove existing simple coredump facility
Enable persistent journald to store coredump history
Minimal default coredump configuration
Added a coredump-config service to generate coredump configuration
when INSTALL_DEBUG_TOOLS=y is set in the build. systemd-coredump tool
is installed in all containers beside gdb
When SONIC_DEBUGGING_ON=y is set in the build,
/var/log/journal and /var/lib/systemd/coredump are mapped inside container
To inspect a core file, from a container shell, issue below commands
docker exec -ti coredumpctl
- How to verify it
kill -ABRT
coredumpctl list
coredumpctl info
- Description for the changelog
Use systemd-coredump for core file management in SONiC
- A picture of a cute animal (not mandatory but encouraged)