Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dist-upgrade process stuck - migration not progressing #384

Open
SC-JBG opened this issue Oct 30, 2024 · 10 comments
Open

Dist-upgrade process stuck - migration not progressing #384

SC-JBG opened this issue Oct 30, 2024 · 10 comments

Comments

@SC-JBG
Copy link

SC-JBG commented Oct 30, 2024

Hi,

I have the following issue:

I am migrating from CentOS 7.9 to Alma Linux using the latest (as of Oct. 30, 2024) centos2alma. After cleaning up a few issues during the prepare stage I was able to get the tool running and the preparation stage looked fine, it also moved into the conversion stage fairly quickly and did the first reboot.

But then it got stuck. I could not SSH into the server for over an hour, so I decided to give the server a manual reboot. After the reboot I was able to SSH into the server again and saw the following welcome message:

Message from the Plesk dist-upgrader tool:
The server is being converted to AlmaLinux 8. Please wait. During the conversion the
server may reboot itself a few times.
To see the current conversion status, run the '/root/centos2alma --status' command.
To monitor the conversion progress in real time, run the '/root/centos2alma --monitor' command.

When I ran ./centos2alma --status I got the following message:


The dist-upgrade process is taking too long. It may be stuck. Please verify if the process is
still running by checking if logfile /var/log/plesk/centos2alma.log continues to update.
It is safe to interrupt the process with Ctrl+C and restart it from the same stage.


So I checked with ps -aux | grep centos2alma and found a process, which I killed with kill -9 <PID> as the message above clearly said it would be OK to interrupt the process and the log file did not update for a while.
After that I did a reboot of the server.

When I SSH'ed back into the server I saw that no process of centos2alma was running anymore, but still, when I ran ./centos2alma --status I got the same message with the dist-upgrade is taking too long. It may be stuck..., even though ps -aux | grep centos2alma returns no running processes anymore.

Restarting the tool via ./centos2alma & does not work as well as it only gives the following message:

[root@server ~]# ./centos2alma &
[1] 6721
[root@server ~]# centos2alma is ongoing. To check its status, please use --monitor or --status

even though no process is running anymore.

Any idea what could have happened and how I can fix the stuck-while-not-running process?
I attached the feedback archive as well.

Thank you for your support - I am running out of ideas what to do now.

@SandakovMM
Copy link
Collaborator

Hello.
Could you please supply a feedback archive, or at least the log file from /var/log/plesk/centos2alma?
Please note that on AlmaLinux 8, you should run ./centos2alma --resume.

@SC-JBG
Copy link
Author

SC-JBG commented Oct 31, 2024

Hi,

sorry, I thought I attached the archive already to my initial post. Let me attach it to this comment here.
I will also restore from a snapshot and restart the migration with the --resume argument. I will keep you posted.

Thank you for the swift reply, its really appreciated!

PS: The archive is too big to be attached to this comment directly, I uploaded it to WeTransfer - the link expires in 7 days: https://we.tl/t-tgcHyvxp5d

@SandakovMM
Copy link
Collaborator

I noticed unusual time behavior on the server on finishing stage:

2024-10-30 20:56:37,745 - DEBUG - Adopt repository with id 'plesk-ext-grafana' is extracted.
2024-10-30 20:56:37,745 - INFO - Running: ['/usr/bin/dnf', '-y', 'update']. Output:
2024-10-30 21:57:05,034 - INFO - stdout: Extra Packages for Enterprise Linux 7 - x86_64  4.9 MB/s |  17 MB     00:03    
2024-10-30 21:57:47,482 - INFO - stdout: Alma grafana extension repository               5.1 kB/s | 108 kB     00:21    
2024-10-30 21:58:35,767 - INFO - stdout: Alma docker extension repository                2.7 kB/s |  66 kB     00:24
...
2024-10-30 22:01:36,997 - INFO - Going to obtain lockfile '/usr/local/psa/var/centos2alma/centos2alma.lock'...
2024-10-30 22:01:36,997 - INFO - Lock already obtained by another process
2024-10-30 22:01:36,998 - ERROR - centos2alma is ongoing. To check its status, please use `--monitor` or `--status`
2024-10-30 21:03:22,263 - INFO - Started with arguments ['/root/centos2alma', '--state-dir', '/usr/local/psa/var/centos2alma', '--resume', '--verbose', '--log-file', '/var/log/plesk/centos2alma.log']
2024-10-30 21:03:22,266 - DEBUG - Resuming with command-line arguments ['./centos2alma']
2024-10-30 21:03:22,266 - DEBUG - Detected current OS distribution as CentOS 7
2024-10-30 21:03:22,266 - DEBUG - Current system: CentOS 7
2024-10-30 21:03:22,266 - DEBUG - Available upgraders: [Centos2AlmaConverterFactory(upgrader_name=Plesk::Centos2AlmaConverter)]
2024-10-30 21:03:22,266 - DEBUG - Looking for upgrader by the name 'Plesk::Centos2AlmaConverter'
2024-10-30 21:03:22,267 - DEBUG - Found upgraders: [Centos2AlmaConverterFactory(upgrader_name=Plesk::Centos2AlmaConverter)]
2024-10-30 21:03:22,268 - INFO - Selected upgrader: Centos2AlmaConverter (1.4.3-2fe72842)
2024-10-30 21:03:22,268 - DEBUG - Upgrader Centos2AlmaConverter support of your system: as source = True, as target = False
2024-10-30 21:03:22,269 - INFO - Create signals handler with keep MOTD: True
2024-10-30 21:03:22,269 - INFO - Going to obtain lockfile '/usr/local/psa/var/centos2alma/centos2alma.lock'...
2024-10-30 21:03:22,269 - INFO - Lock already obtained by another process
2024-10-30 21:03:22,270 - ERROR - centos2alma is ongoing. To check its status, please use `--monitor` or `--status`
2024-10-30 22:05:31,923 - INFO - Started with arguments ['./centos2alma']
2024-10-30 22:05:31,923 - DEBUG - Detected current OS distribution as CentOS 7
2024-10-30 22:05:31,923 - DEBUG - Current system: CentOS 7
2024-10-30 22:05:31,923 - DEBUG - Available upgraders: [Centos2AlmaConverterFactory(upgrader_name=Plesk::Centos2AlmaConverter)]
2024-10-30 22:05:31,923 - DEBUG - Looking for upgrader from CentOS 7
2024-10-30 22:05:31,923 - DEBUG - Found upgraders: [Centos2AlmaConverterFactory(upgrader_name=Plesk::Centos2AlmaConverter)]
2024-10-30 22:05:31,923 - INFO - Selected upgrader: Centos2AlmaConverter (1.4.3-2fe72842)
2024-10-30 22:05:31,924 - DEBUG - Upgrader Centos2AlmaConverter support of your system: as source = True, as target = False

As you can see, during the final stage of the conversion, the time switched from 20 to 21, 21 to 22, and even 22 back to 21. It seems you might have a remote NTP device providing the current time, or something similar. Is it possible the connection to the device is unstable? Or perhaps you have two different NTP servers giving different time values?

The only issue this causes is with the progress bar of the centos2alma conversion. We currently rely on the server's time, which leads to these issues. It might be a good idea to calculate time manually or perhaps disable any NTP synchronization during the conversion. I will have to consider this further.
Anyway, you can still perform the conversion, just don't rely on the progress bar. Instead, you can use tail -f /var/log/plesk/centos2alma.log to monitor the progress. Although it may not be as convenient as the progress bar, it still works.

@SC-JBG
Copy link
Author

SC-JBG commented Oct 31, 2024

Hi,

following your latest response I reran the migration after restoring an earlier snapshot of my server (VPS).

This time I had to again manually reboot the server. After doing the preparation checks from centos2alma the migration started as usual and all steps/actions were listed and progressing nicely. After the conversion step the server rebooted automatically but did not boot up properly and I was again not able to SSH into my server. So after giving it about 30 minutes of time I decided to reboot the server manually, after which I was able to SSH back into my server.

A quick ./centos2alma --status returned the same message as last time:


The dist-upgrade process is taking too long. It may be stuck. Please verify if the process is
still running by checking if logfile /var/log/plesk/centos2alma.log continues to update.
It is safe to interrupt the process with Ctrl+C and restart it from the same stage.


So I did as you said and ran a tail -f /var/log/plesk/centos2alma.log and saw that the log is still being updated, so I waited until the server was again automatically rebooted through centos2alma. The reboot went fine and I could SSH back into my server no problems and was greeted with a new welcome message:

===============================================================================
Message from the Plesk dist-upgrader tool:
The dovecot configuration '/etc/dovecot/dovecot.conf' has been restored from original distro. Modern configuration was > placed in '/usr/local/psa/var/centos2alma/dovecot.conf.conversion.bak'.
The logrotate configuration for rsyslog has been updated. The old configuration has been saved as /usr/local/psa/var/centos2alma/syslog.logrotate.bak
The server has been upgraded to AlmaLinux 8.
You can remove this message from the /etc/motd file.
===============================================================================

Also ./centos2alma --status did not return anything anymore.

Then I ran some further commands to check if the migration went successfully and all of them returned that I am still on CentOS 7.9. Also the Plesk Panel still shows CentOS 7.9 as my current OS.

[root@server ~]# uname
Linux

[root@server ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

[root@server ~]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

[root@server ~]# hostnamectl
Static hostname: <anonymized>
Icon name: computer-vm
Chassis: VM
Machine ID: <anonymized>
Boot ID: <anonymized>
Virtualization: kvm
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-1160.119.1.el7.tuxcare.els10.x86_64
Architecture: x86-64

So I am a bit confused now. Apparently the tool just ran as intended but my OS is not being recognized as AlmaLinux. Any ideas what might cause this?

I have attached a new feedback archive from the now apparently successful run.

centos2alma_feedback.zip

Thank you so much for your support!

PS: I just saw that the log file seems to append every call / run. The one that finished this time started at 11:46 and finishes at 12:33 in the logs.

@SandakovMM
Copy link
Collaborator

This seems awkward. I can only assume that the temporary container did not have enough time to install the el8 packages instead of the el7 packages before you restarted the server
Is there any chance you have a console connected to the server so you can monitor the process inside the temporary container?

@SC-JBG
Copy link
Author

SC-JBG commented Oct 31, 2024

Hey, no, I don't have a console connected to it. It's a virtual private server hosted by a hosting company. I can only SSH or VNC into the server.

I will start over again and give it more time. It is just weird because after it reboots for the first time automatically, it looks like the server isn't getting up at all. I cannot ping its IP, SSH into it, or anything. I left it in that state for about 30 minutes the last time before I decided to give it a manual reboot, but I will give it more time now and see what happens...

@SandakovMM
Copy link
Collaborator

SandakovMM commented Oct 31, 2024

It is just weird because after it reboots for the first time automatically, it looks like the server isn't getting up at all. I cannot ping its IP, SSH into it

Yes, unfortunately, this is the expected behavior of the temporary container that reinstalls packages. I hope to find a way to start the network interface for it in the future or encourage the AlmaLinux/leapp framework developers to address it.

@SC-JBG
Copy link
Author

SC-JBG commented Oct 31, 2024

Yes, unfortunately, this is the expected behavior of the temporary container that reinstalls packages.

But is it also normal for it to never come back by itself? I had that situation already yesterday where I left it running overnight and still had to reboot in the morning. Or is the normal behavior that it reboots automatically into an SSH-able state after the temporary container OS did its work? Because then that's a state my server never seems to reach by itself due to some reason.

@SC-JBG
Copy link
Author

SC-JBG commented Oct 31, 2024

OK, I re-did it now again from a fresh snapshot and it went exactly like in this comment here: [https://github.com//issues/384#issuecomment-2449675216](earlier comment)

The only difference is, that I left it in the dist-upgrade stage for 80 minutes before forcing the reboot, so it had plenty of time to finish all tasks in the temporary OS container.

The results are still the same, the OS still comes back as CentOS 7.9, even after centos2alma reported it finished. The feedback report archive from this run is again attached to this ticket.
centos2alma_feedback.zip

I am really clueless. Any ideas what I could do here?

Thank you for your ongoing support!

@SandakovMM
Copy link
Collaborator

I had that situation already yesterday where I left it running overnight and still had to reboot in the morning. Or is the normal behavior that it reboots automatically into an SSH-able state after the temporary container OS did its work?

Normally, the temporary container should reboot into SSH-able state. I believe, To move forward, we need to determine the exact reason why the temporary container is stuck.

Could you please execute centos2alma --no-reboot from the snapshot and provide the following files:

  • /var/log/leapp/leapp-report.txt
  • /var/log/leapp/leapp-report.json
  • /var/log/leapp/leapp-upgrade.log

We might be able to identify something unusual in these reports. At the moment, I have no other suggestions on how to proceed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants