-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore failed on Oracle Database Server 12c R2 #412
Comments
Attached the restore.log |
1)@dineshputchala, which kernel version do you use? |
From the shell version and the used glibc and other information in the restore.log this could be CentOS or RHEL. Which CRIU version are you using? Strange that the CRIU version is not visible in the restore.log. We should also put the kernel version in the dump and restore log. CRIU on CentOS/RHEL needs an extra patch if build from sources: https://git.centos.org/blob/rpms!criu.git/c7/SOURCES!aio-fix.patch |
My tests with migrating the oracle database have always failed probably due to problems with monotonic time. |
Or better: Migration works, but the database shuts down after migration. |
There is because the formula in kernel has changed. We do not support old kernels as Pasha said: So, if the kernel is really old, we do not support it. |
Docker host details: bash-4.2$ docker -v bash-4.2$ uname -a bash-4.2$ cat /etc/oracle-release OS is Oracle Linux 7.3 CRIU version on docker host: criu-2.12-2.el7.x86_64 Its easy to reproduce , just we need to start the db container and checkpoint it. Try restore using checkpoint and it is reproduced every time. |
@dineshputchala could you try the same with criu 3.6? |
In this bug you met another issue, but it is very probable that #296 will be the next one. |
So this is interesting. If oracle linux uses the RHEL criu package on a newer kernel with the special AIO patch I added for RHEL it will not work. @dineshputchala you need to talk to your vendor and tell them that their criu package is wrong. |
Installed CRIU 3.6 on my machine by building it as this package was not available in my repos. It took some time due to resolving lot of dependencies while building CRIU code and it was not easy and straight forward ! Hurray ... Finally I could install CRIU 3.6 version ! |
@avagin Attempted checkpoint/restore expt again on Oracle Database Server 12c R2 with latest CRIU version (3.6) ! This time, its different story ... I was able to do checkpoint and restore did not throw any error but the db inside the container was not brought up successfully. bash-4.2$ docker checkpoint create cont_criu3 cont_criu3_chk bash-4.2$ docker checkpoint ls cont_criu3 bash-4.2$ docker start --checkpoint cont_criu3_chk cont_criu3 I checked the alert logs and I could see below errors and warnings, Error attempting to elevate VKTM's priority: no further priority changes will be attempted for this process Warning: VKTM detected a forward time drift. This seems to be same issue as observed in #296 |
@dineshputchala nice, now you need to talk to oracle that they should support migration. The oracle database seems to have problems if the time changes. This is expected as the time will keep on running as long as your container is stopped. It is even worse for migration as the kernel timers on the destination system will be completely different. So this is unrelated to CRIU and needs to be changed in the database. A time namespace in the kernel could be a solution to handle this but this needs to be implemented in the kernel. |
Similar issue is observed in bug #296 which requires changes in kernel and then in CRIU. |
From alert log, Warning: VKTM detected a forward time drift. Same issue is observed in Bug #296 which requires changes in kernel and then in CRIU. |
Any update on this time-namespace feature implementation in kernel and CRIU ? |
Andrey will say better about criu status, since he is diving into this at the moment. But I want to touch another direction. @dineshputchala, have you tried to request Oracle to workaround this issue for a while, before we have solution in kernel and criu? |
@adrianreber @avagin Any update on implementation of time-namespace ? |
@dineshputchala we are going to send RFC next week: |
Any update on this time-namespace feature implementation in kernel ? Any update on CRIU changes for supporting this ? |
@dineshputchala We sent the rfc version: then we discussed it on LCP: And now we are working on the second version of these patches. We are going to post them this month. |
@avagin ..after kernel changes , CRIU also needs to do changes right to use this feature right ? |
@dineshputchala yes, we will need to add some code in CRIU to support time namespaces. But this should not be hard. |
@avagin Which kernel version has the support for time namespaces ? CRIU support for time namespaces is done ? |
@dineshputchala the patch series for time namespace is not merged upstream yet. The link above is to the latest version of this patch series. |
@dineshputchala Time namespaces have been merged in v5.6, but the current implementation doesn't fix this issue. We need to save/restore start_time for processes to fix this issue. |
@avagin @dineshputchala Just in case it would help someone to Here is how we do it in Virtuozzo criu+kernel (sadly without time namespaces yet). criu patch: I don't say that this is a right way, I understand that time namespace way is the right one, but hope it can help. |
A friendly reminder that this issue had no activity for 30 days. |
@avagin Is time namespaces feature is implemented completely in kernel ? If yes, which version of OracleLinux 7.x OS/OracleLinux 8 has this feature ? |
@dineshputchala I am pretty sure nobody here knows which OracleLinux kernel has which feature. CRIU's CI is enabling time namespace tests on anything >= 5.11. So you need to figure out if OracleLinux has all the time namespace patches from 5.11. |
Oracle Linux 8.4 maybe?
https://docs.oracle.com/en/operating-systems/oracle-linux/8/relnotes8.4/ol8-features-changes.html |
There is CRIU bug on OracleLinux8.4 and looks like this needs to be fixed in OL8. |
Trying to verify the checkpoint/restore feature on Oracle Database Server 12c R2
There was similar issue (#255) last year when I tried on some non-production docker-1.10.0-dev version.
This time I tried on latest version as this checkpoint/restore is enabled in experimental version of regular release of docker.
Steps followed are:
Enabled experimental flag on "Docker version 17.06.2-ol, build d02b7ab"
bash-4.2$ docker run -d --env-file db_env.dat -p :1521 -p :5500 --name tc --security-opt seccomp:unconfined store/oracle/database-enterprise:12.2.0.1
b1ed6b3ff854241230e357432e779238e4b0a14a32ea9b0661f87697161ac51c
Created checkpoint once the db came up,
bash-4.2$ docker checkpoint create tc tc_ck1
tc_ck1
bash-4.2$ docker checkpoint ls tc
CHECKPOINT NAME
tc_ck1
Trying to start the container again using checkpoint,
bash-4.2$ docker start --checkpoint tc_ck1 tc
Error response from daemon: oci runtime error: criu failed: type NOTIFY errno 0
log file: /var/lib/docker/containers/b1ed6b3ff854241230e357432e779238e4b0a14a32ea9b0661f87697161ac51c/checkpoints/tc_ck1/criu.work/restore-2017-11-17T02:06:14.324615919-08:00/restore.log
The text was updated successfully, but these errors were encountered: