Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for agent self-restarting: Development Phase #386

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lchico
Copy link
Member

@lchico lchico commented Dec 4, 2024

Related issue
#77

Description

This PR introduces the initial development phase for implementing self-restarting functionality in the agent. The feature involves the agent detecting a failure in its processes and triggering a self-restart to ensure continuous operation. The core of the implementation relies on enhancing the existing task and module management systems and integrating a monitoring process to handle agent failures.

Implementation Details:

The self-restarting feature follows this high-level process flow:

  • Agent communicates with the Task Manager, which in turn interacts with various components such as the Module Manager, Communicator, Command Handler, and the new Self-Restart component.
  • Upon failure of a monitored process, the Self-Restart component triggers the monitoring process that checks if the process is unresponsive for more than 30 seconds.
  • If the process remains unresponsive, the Kill Blocked Process mechanism is activated, which kills the current agent process and starts a new instance to ensure recovery.

Test

Server

I used the Python mock server

wazuh-agent#  python3 src/agent/testtool/mock_server/mock_server.py --protocol http

With the following command, I initialize the self-restart.

wazuh-agent/# echo '{"commands": [{"document_id": "1a96a0ab", "action": {"name": "restart", "version": "v5.0.0", "args": ""}, "target": {"type": "agent", "id": "1a96a0ab-5bef-415c-bb3c-ea3e294215a0"}, "status": "sent"}]}' > src/agent/testtool/mock_server/responses/commands.json

Agent

Configure the agent to use the server address:

sed -i 's|server_url:.*|server_url: http://192.168.100.250:27000 |' /etc/wazuh-agent/wazuh-agent.yml

# Register the agent
/usr/share/wazuh-agent/bin/wazuh-agent --user wazuh --password wazuh --url http://192.168.100.250:27000/ --name wazuh --register

# Run the agent
/usr/share/wazuh-agent/bin/wazuh-agent

Logs


[2024-12-24 03:49:57.112] [wazuh-agent] [info] [INFO] [communicator.cpp:27] [SendAuthenticationRequest] Successfully authenticated with the manager.
[2024-12-24 03:49:59.401] [wazuh-agent] [info] [INFO] [agent.cpp:164] [operator()] Restart: Initiating self-restart
[2024-12-24 03:49:59.402] [wazuh-agent] [info] [INFO] [inventory.cpp:62] [Stop] Inventory module stopped.
[2024-12-24 03:49:59.402] [wazuh-agent] [info] [INFO] [logcollector.cpp:64] [Stop] Logcollector module stopped.
[2024-12-24 03:49:59.402] [wazuh-agent] [info] [INFO] [inventoryImp.cpp:981] [Scan] Starting evaluation.
[2024-12-24 03:49:59.402] [wazuh-agent] [info] [INFO] [inventoryImp.cpp:993] [Scan] Evaluation finished.
[2024-12-24 03:49:59.403] [wazuh-agent] [info] [INFO] [inventory.cpp:36] [Start] Inventory module finished.
[2024-12-24 03:49:59.403] [wazuh-agent] [info] [INFO] [command_handler.hpp:98] [operator()] Done processing command: restart(restart)
[2024-12-24 03:50:00.404] [wazuh-agent] [info] [INFO] [process_options_unix.cpp:86] [StartAgent] Starting wazuh-agent
[2024-12-24 03:50:00.406] [wazuh-agent] [info] [INFO] [communicator.hpp:55] [Communicator] Using insecure connection.
[2024-12-24 03:50:00.407] [wazuh-agent] [info] [INFO] [communicator.cpp:27] [SendAuthenticationRequest] Successfully authenticated with the manager.
[2024-12-24 03:50:00.407] [wazuh-agent] [info] [INFO] [inventory.cpp:17] [Start] Inventory module started.
[2024-12-24 03:50:00.407] [wazuh-agent] [info] [INFO] [logcollector.cpp:25] [Start] Logcollector module started.
[2024-12-24 03:50:00.407] [wazuh-agent] [info] [INFO] [inventoryImp.cpp:998] [SyncLoop] Module started.
[2024-12-24 03:50:00.407] [wazuh-agent] [info] [INFO] [inventoryImp.cpp:981] [Scan] Starting evaluation.
[2024-12-24 03:50:00.518] [wazuh-agent] [info] [INFO] [inventoryImp.cpp:993] [Scan] Evaluation finished.
[2024-12-24 03:50:58.408] [wazuh-agent] [info] [INFO] [communicator.cpp:27] [SendAuthenticationRequest] Successfully authenticated with the manager.

Minor changes:

  • The SIGTERM name for the signal was not recognized, so I changed the name to the corresponding value to avoid the following error:
# apt remove wazuh-agent
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following package was automatically installed and is no longer required:
  lsb-release
Use 'apt autoremove' to remove it.
The following packages will be REMOVED:
  wazuh-agent
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 15.5 MB disk space will be freed.
Do you want to continue? [Y/n] 
(Reading database ... 9917 files and directories currently installed.)
Removing wazuh-agent (5.0.0-0) ...
Call pid 4119 with sigterm to stop the service
/var/lib/dpkg/info/wazuh-agent.prerm: 43: kill: Illegal option -S
dpkg: error processing package wazuh-agent (--remove):
 installed wazuh-agent package pre-removal script subprocess returned error exit status 2
dpkg: too many errors, stopping
Errors were encountered while processing:
 wazuh-agent
Processing was halted because there were too many errors.
E: Sub-process /usr/bin/dpkg returned an error code (1)	

Pending Tasks:

Improve Signal Handler: Refactor the signal handler to use a unified class structure for better maintainability and extensibility.
Test Self-Restart using systemctl: Add tests to verify the self-restarting feature works as expected when controlled by systemctl.
Refactor Agent Startup: Simplify the fork logic in the agent’s startup sequence by creating a dedicated function for process creation, improving readability and reducing complexity.

  • Status Validations After Self-Restart: Implement checks to validate the agent's status after the self-restart to ensure that it is functioning correctly and properly recovered from failure.

@lchico lchico linked an issue Dec 4, 2024 that may be closed by this pull request
3 tasks
@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch from e65a3ff to d0089ef Compare December 4, 2024 14:20
@lchico lchico linked an issue Dec 4, 2024 that may be closed by this pull request
@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch 4 times, most recently from fdda710 to c009b4a Compare December 10, 2024 01:33
@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch 9 times, most recently from 6124ac7 to 3a71fae Compare December 20, 2024 00:09
@@ -65,7 +65,6 @@ target_link_libraries(Agent
MultiTypeQueue
ModuleManager
ModuleCommand
CentralizedConfiguration
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was duplicated, so I removed it.

@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch 4 times, most recently from 26d4599 to d9ff744 Compare December 20, 2024 03:48
@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch 4 times, most recently from 7cb6b33 to 6a2deb8 Compare December 24, 2024 01:20
@lchico lchico force-pushed the enhancement/77-add-support-agent-self-restarting branch from 6a2deb8 to 3ca559f Compare December 24, 2024 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for agent self-restarting: Development Phase Add support for agent self-restarting
1 participant