Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System backup fails with error E_COPY_DATA #38

Open
tbertels opened this issue Dec 28, 2021 · 22 comments
Open

System backup fails with error E_COPY_DATA #38

tbertels opened this issue Dec 28, 2021 · 22 comments

Comments

@tbertels
Copy link

The btrfs-auto-snapshot.sh script (https://github.com/xbianonpi/xbian-package-config-shell/blob/master/content/usr/local/sbin/xbian-storager) stops with error E_COPY_DATA

Version:
5.10.82+ #1 SMP PREEMPT Wed Dec 1 19:48:08 CET 2021 armv7l
XBian 20210911-0 - Bleeding Edge

This seems to be a known problem: http://forum.xbian.org/thread-4106-post-36879.html

Home backup works fine on the same NFS network share.

Log excerpt:

Create a readonly snapshot of '/run/xbian-storage/root/@' in '/run/xbian-storage/root/@.ro'
Cloning root/@, 241130903 of 1148242399, 21% done
Cloning root/@, 562638775 of 1148242399, 49% done
Cloning root/@, 918593919 of 1148242399, 80% done
Cloning root/@, 1136759975 of 1148242399, 99% done
Cloning subvolume root snapshot @ failed
Operation failed with error E_COPY_DATA!

I can see the backup file being created on the network share before the error.

Would any other log help?

@tbertels
Copy link
Author

Would using the much simpler script at https://github.com/hunleyd/btrfs-auto-snapshot as a base make it any easier to debug?

@mkreisl
Copy link
Contributor

mkreisl commented Dec 28, 2021

I vaguely remember the problem, but it was more with backuphome at the time. That was about two and a half years ago, I still have a test procedure for it.
The error comes from the fact that in very rare cases the procedure that is responsible for the progress display returns an error (why this is so I can unfortunately not say). in the backuphome function this was corrected, but unfortunately this seems to be forgotten in the xbiancopy function. With me this has never occurred all the years, neither with backuphome nor with xbiancopy.

Anyway you can try to bring the following change in xbian-storager and test it again

diff --git a/content/usr/local/sbin/xbian-storager b/content/usr/local/sbin/xbian-storager
index beda774..5eedb10 100755
--- a/content/usr/local/sbin/xbian-storager
+++ b/content/usr/local/sbin/xbian-storager
@@ -1403,7 +1403,7 @@ EOF
                                                fi
                                        }
                                        ar=0
-                                       ( copy_volume $opt_cpmode ) 2>&1 | \
+                                       ( copy_volume $opt_cpmode ) 2>&1 | { \
                                                while read a; do
                                                        if [ -n "${a##*[!0-9]*}" ]; then
                                                                (( $a < $ar )) && a=100
@@ -1414,7 +1414,8 @@ EOF
                                                                fi
                                                                ar=$a
                                                        fi
-                                               done
+                                               done || :
+                                       }
                                        [ "$?" -eq 0 ] || { [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed"; exit $E_COPY_DATA; }
                                        if [ "$opt_fstype" = btrfs ]; then
                                                if [ "$opt_cpmode" = tar ]; then

If this works for you without errors, i will apply the patch

@tbertels
Copy link
Author

No luck with that change. :(
What should I remove to try without that progress display procedure? Or is that change supposed to do that?
By the way, the error happens when starting the system backup manually through the GUI. I've added a daily automatic system backup to see if that works.

@mkreisl
Copy link
Contributor

mkreisl commented Dec 28, 2021

No luck with that change. :(

Strange

What should I remove to try without that progress display procedure? Or is that change supposed to do that?

Something like this:

                                   ( copy_volume $opt_cpmode ) >/dev/null 2>&1
                                   [ "$?" -eq 0 ] || { [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed"; exit $E_COPY_DATA; }

By the way, the error happens when starting the system backup manually through the GUI. I've added a daily automatic system backup to see if that works.

This should make no difference, as it runs in the background in both cases

@tbertels
Copy link
Author

So that confirms it: the progress display procedure isn't the cause. Without that change, no progress display is shown but still the same error:

Done processing subvolume modules
Create subvolume '/run/root-dest/root'
Create a readonly snapshot of '/run/xbian-storage/root/@' in '/run/xbian-storage/root/@.ro'
Cloning subvolume root snapshot @ failed
Operation failed with error E_COPY_DATA!

I tried a sudo btrfs scrub start / just in case, but it reported no error.

So I guess the error comes from the function copy_volume.
Is there a command that I can try to check that the btrfs send | btrfs receive works as intended (and maybe get a more detailed error)?

@mkreisl
Copy link
Contributor

mkreisl commented Dec 28, 2021

Here is my next patch:

diff --git a/content/usr/local/sbin/xbian-storager b/content/usr/local/sbin/xbian-storager
index beda774..d0c9b66 100755
--- a/content/usr/local/sbin/xbian-storager
+++ b/content/usr/local/sbin/xbian-storager
@@ -1396,11 +1396,12 @@ EOF
                                                                to='--warning=none --acls --xattrs'
                                                                nice -n 10 tar c $to -C ${vs}/$s.ro . | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 tar x $to -C "${vd}/$s"
                                                        else
-                                                               nice -n 10 btrfs send "${vs}/$s.ro" | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 btrfs receive "${vd}"
+                                                               nice -n 10 btrfs send 2>/tmp/btrfs.send.log "${vs}/$s.ro" | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 btrfs receive 2>/tmp/btrfs.receive.log "${vd}"
                                                        fi
                                                else
                                                        nice -n 10 zfs send --props $v@ro | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 zfs receive -u "$opt_clonename/${v#*/}"
                                                fi
+                                               echo "ps=\"${PIPESTATUS[*]}\"" >/run/ps.xbiancopy
                                        }
                                        ar=0
                                        ( copy_volume $opt_cpmode ) 2>&1 | \
@@ -1415,7 +1416,14 @@ EOF
                                                                ar=$a
                                                        fi
                                                done
-                                       [ "$?" -eq 0 ] || { [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed"; exit $E_COPY_DATA; }
+                                       . /run/ps.xbiancopy && rm -f /run/ps.xbiancopy
+                                       for RC in $ps; do
+                                               if [ "$RC" -ne 0 ]; then
+                                                       print_log error "error(s) happened while copying data ($ps)"
+                                                       break
+                                               fi
+                                       done
+                                       [ "$RC" -eq 0 ] || { [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed (ps=$ps)"; exit $E_COPY_DATA; }
                                        if [ "$opt_fstype" = btrfs ]; then
                                                if [ "$opt_cpmode" = tar ]; then
                                                        eval $opt_remotecmd nice -n 10 btrfs fi sync "${vd}/$s" && sync && btrfs sub delete "${vs}/$s.ro" || exit $E_SUB_DELETE

Reverted the previous patch. Now, in case of error the pipe return values are in the error message (ps=X X X)

And, for debugging btrfs send and btrfs receive prints output into file /tmp/btrfs.send.log resp /tmp/btrfs.receive.log

But I can't say if there are reasonable error messages in case of an error. I never had any errors there before

@tbertels
Copy link
Author

tbertels commented Dec 28, 2021

Nice!
I think the problem seems obvious now:

xbiancopy.log:

Create subvolume '/run/root-dest/root'
Create a readonly snapshot of '/run/xbian-storage/root/@' in '/run/xbian-storage/root/@.ro'
Cloning root/@, 241138429 of 1148278236, 21% done
Cloning root/@, 528207988 of 1148278236, 46% done
Cloning root/@, 895657024 of 1148278236, 78% done
Cloning root/@, 1136795453 of 1148278236, 99% done
Error: error(s) happened while copying data (141 0 1)
Cloning subvolume root snapshot @ failed (ps=141 0 1)
Operation failed with error E_COPY_DATA!

btrfs.send.log:

At subvol /run/xbian-storage/root/@.ro

btrfs.receive.log:

At subvol @.ro
ERROR: writing to var/cache/apt/srcpkgcache.bin failed: No space left on device

Here's the result of df -h:

Filesystem      Size  Used Avail Use% Mounted on
udev            1.6G  4.0K  1.6G   1% /dev
tmpfs           360M  1.6M  358M   1% /run
/dev/mmcblk0p2   30G  2.3G   27G   8% /
/dev/mmcblk0p2   30G  2.3G   27G   8% /lib/modules
/dev/mmcblk0p2   30G  2.3G   27G   8% /home
/dev/mmcblk0p2   30G  2.3G   27G   8% /xbmc-backup
/dev/mmcblk0p1  100M   46M   55M  46% /boot
none            4.0K     0  4.0K   0% /sys/fs/cgroup

@mkreisl
Copy link
Contributor

mkreisl commented Dec 29, 2021

That is not yet obvious to me
Either the image file is created too small or no space available on the NFS share

How big is the image file? Should have at least 2.3GB (minus used space by snapshots)

@tbertels
Copy link
Author

tbertels commented Dec 29, 2021

Here's what df- h returns just before the error:

Filesystem                                                          Size  Used Avail Use% Mounted on
udev                                                                1.6G  4.0K  1.6G   1% /dev
tmpfs                                                               360M  1.6M  358M   1% /run
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /lib/modules
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /home
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /xbmc-backup
/dev/mmcblk0p1                                                      100M   46M   55M  46% /boot
none                                                                4.0K     0  4.0K   0% /sys/fs/cgroup
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /run/xbian-storage
192.168.1.5:/volume1/3 Tera Backup/Backup/Raspberry-backups/Bureau  2.7T  2.5T  251G  91% /run/kodimap
/dev/mapper/loop0p2                                                 1.4G  1.3G  2.6M 100% /run/root-dest

And the usage of /dev/mapper/loop0p2 increases until the error.

@tbertels
Copy link
Author

tbertels commented Dec 29, 2021

Here's the full xbiancopy.log

As I understand it, /dev/mapper/loop0p2 is the snapshot, so it's normal that it gets full.
I still wonder why I get "No space left on device" type errors in btrfs.receive.log:

ERROR: writing to usr/include/linux/pkt_sched.h failed: No space left on device

or

ERROR: writing to usr/include/linux/sysctl.h failed: No space left on device

@mkreisl
Copy link
Contributor

mkreisl commented Dec 29, 2021

Yes, /dev/mapper/loop0p2 is the container for the image.
But as you can see it is practically full. So btrfs is probably right with its error message.
The question now is, why is the container so tight for you. For me this is never the case

btrfs receive writes this error message because this command writes into the container. That's correct

@tbertels
Copy link
Author

tbertels commented Dec 29, 2021

Since it seems to need just a little more space, I tried enabling zlib compression, and it worked.

Here's the df -h just before the end:

Filesystem                                                          Size  Used Avail Use% Mounted on
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /
devtmpfs                                                            1.6G  4.0K  1.6G   1% /dev
none                                                                360M  1.5M  358M   1% /run
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /lib/modules
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /home
/dev/mmcblk0p1                                                      100M   46M   55M  46% /boot
none                                                                4.0K     0  4.0K   0% /sys/fs/cgroup
/dev/mmcblk0p2                                                       30G  2.3G   27G   8% /run/xbian-storage
192.168.1.5:/volume1/3 Tera Backup/Backup/Raspberry-backups/Bureau  2.7T  2.5T  251G  91% /run/kodimap
/dev/mapper/loop0p2                                                 1.4G  814M  500M  62% /run/root-dest

And the xbiancopy.log
The btrfs.send.log and btrfs.receive.log contain no error.

@mkreisl
Copy link
Contributor

mkreisl commented Dec 29, 2021

The only question is how much, I checked today at noon, currently there is 20% more space than actually needed

https://github.com/xbianonpi/xbian-package-config-shell/blob/master/content/usr/local/sbin/xbian-storager#L1188

You could add a startvalue for variable size in bytes as shown below

					size=<startvalueinbytes>
					for v in $FS_LOCAL_LIST; do
						[ -n "$v" ] || continue
						for s in $snapextra; do
							sv=$(btrfs-auto-snapshot volsize "$v/$s")
							size=$((size+sv))
						done
					done

this is how it looks with a backup on one of my Pi's

/dev/mapper/loop0p2 5921792 5269068 536660 91% /tmp/m

@tbertels
Copy link
Author

tbertels commented Dec 29, 2021

With size=51200000, I get

/dev/mapper/loop0p2 1515520 1378832 10992 100% /run/root-dest

and it works.

@mkreisl
Copy link
Contributor

mkreisl commented Dec 31, 2021

I have now made a kind of dynamic adjustment of the correction size. works here so far quite well

diff --git a/content/usr/local/sbin/xbian-storager b/content/usr/local/sbin/xbian-storager
index beda774..b2a5b13 100755
--- a/content/usr/local/sbin/xbian-storager
+++ b/content/usr/local/sbin/xbian-storager
@@ -45,6 +45,7 @@ opt_name=''
 opt_exclude="XXXXXXXX"
 opt_quiet=''
 opt_size=''
+opt_sizeadj='300M'
 
 # Default dynamcally set program options/variables for BTRFS/ZFS
 
@@ -1177,15 +1178,17 @@ EOF
                                        exit $E_SAME_FS
                                fi
 
+                               size=$(numfmt --from=iec $opt_sizeadj)
+                               size_fix=105
                                if [ -z "$opt_size" ]; then
                                        for v in $FS_LOCAL_LIST; do
                                                [ -n "$v" ] || continue
                                                for s in $snapextra; do
-                                                       sv=$(btrfs-auto-snapshot volsize "$v/$s")
+                                                       sv=$(xbian-storager volsize "$v/$s")
                                                        size=$((size+sv))
                                                done
                                        done
-                                       opt_size="$(($size*12/10/1024/1024))M"
+                                       opt_size="$(($size*$size_fix/100/1024/1024))M"
                                fi
                                size="$(numfmt --from=iec $opt_size)"
 
@@ -1365,6 +1368,33 @@ EOF
                                fi
                        fi
 
+                       optimize_size () {
+                               if [ "$opt_fstype" = btrfs ]; then
+                                       size=$(df -h | grep -m1 "$root_dest" | awk '{print $4}')
+                               else
+                                       size=$(zfs list -Ho space $opt_clonename | awk '{print $2}')
+                               fi
+                               [ "$BACKGROUND" == 1 ] && printf "Space remaining in %s: %s (%s)\n" "$(basename "$fn")" $size $opt_sizeadj
+
+                               size_avail=$(numfmt --from=iec $size)
+
+                               opt_sizeadj=$(numfmt --from=iec $opt_sizeadj)
+                               if [ "$size_avail" -gt "$(numfmt --from=iec 150M)" ]; then
+                                       opt_sizeadj=$((opt_sizeadj-50*1024*1024))
+                               elif [ "$size_avail" -lt "$(numfmt --from=iec 50M)" ]; then
+                                       opt_sizeadj=$((opt_sizeadj+50*1024*1024))
+                               else
+                                       opt_sizeadj=''
+                               fi
+                               if [ -n "$opt_sizeadj" ] && [ -e /etc/default/xbian-internals ]; then
+                                       if grep -q "OPT_SIZEADJ=" /etc/default/xbian-internals; then
+                                               sed -i "s/OPT_SIZEADJ=.*/OPT_SIZEADJ=$(numfmt --to=iec $opt_sizeadj)/g" /etc/default/xbian-internals
+                                       else
+                                               echo "OPT_SIZEADJ=$(numfmt --to=iec $opt_sizeadj)" >> /etc/default/xbian-internals
+                                       fi
+                               fi
+                       }
+
                        set -o pipefail
                        for v in $FS_LOCAL_LIST; do
                                if [ -z "$v" ] || echo "$v" | grep -qE "\.delete$|\.zlib$|\.lzo$|\.lz4$|\.zstd$"; then
@@ -1396,11 +1426,12 @@ EOF
                                                                to='--warning=none --acls --xattrs'
                                                                nice -n 10 tar c $to -C ${vs}/$s.ro . | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 tar x $to -C "${vd}/$s"
                                                        else
-                                                               nice -n 10 btrfs send "${vs}/$s.ro" | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 btrfs receive "${vd}"
+                                                               nice -n 10 btrfs send 2>/tmp/btrfs.send.log "${vs}/$s.ro" | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 btrfs receive 2>/tmp/btrfs.receive.log "${vd}"
                                                        fi
                                                else
                                                        nice -n 10 zfs send --props $v@ro | pv -i $ival -ns $siz | eval $opt_remotecmd nice -n 10 zfs receive -u "$opt_clonename/${v#*/}"
                                                fi
+                                               echo "ps=\"${PIPESTATUS[*]}\"" >/run/ps.xbiancopy
                                        }
                                        ar=0
                                        ( copy_volume $opt_cpmode ) 2>&1 | \
@@ -1415,7 +1446,13 @@ EOF
                                                                ar=$a
                                                        fi
                                                done
-                                       [ "$?" -eq 0 ] || { [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed"; exit $E_COPY_DATA; }
+                                       . /run/ps.xbiancopy && rm -f /run/ps.xbiancopy
+                                       for RC in $ps; do
+                                               if [ "$RC" -ne 0 ]; then
+                                                       optimize_size
+                                                       [ "$BACKGROUND" == 1 ] && printf "%s\n" "$wmsg subvolume $v snapshot $s failed (ps=$ps)"; exit $E_COPY_DATA
+                                               fi
+                                       done
                                        if [ "$opt_fstype" = btrfs ]; then
                                                if [ "$opt_cpmode" = tar ]; then
                                                        eval $opt_remotecmd nice -n 10 btrfs fi sync "${vd}/$s" && sync && btrfs sub delete "${vs}/$s.ro" || exit $E_SUB_DELETE
@@ -1435,12 +1472,13 @@ EOF
                                [ "$BACKGROUND" == 1 ] && printf "%s\n" "Done processing subvolume $v"
                        done
 
+                       optimize_size
+
                        if [ "$opt_fstype" = zfs ]; then
                                vdr="$(zpool get -H bootfs -o value "$opt_rpool" | sed "s%$opt_rpool%$opt_clonename%g")"
                                eval $opt_remotecmd zpool set bootfs=$vdr $opt_clonename
                                echo "vdr=$vdr" >>/run/vars.xbiancopy.$$
                        fi
-
                        if [ -n "$opt_recompress" ]; then
                                for v in $FS_LOCAL_LIST; do
                                        if [ -z "$v" ] || echo "$v" | grep -qE "\.delete$|\.zlib$|\.lzo$|\.lz4$|\.zstd$"; then
@@ -1813,7 +1851,8 @@ esac
 [ -e /etc/default/xbian-internals ] && . /etc/default/xbian-internals
 [ -z "$OPT_COPY_MODE" ] || opt_cpmode="$OPT_COPY_MODE"
 [ -z "$OPT_COMPRESS" ]  || opt_compress="$OPT_COMPRESS"
-[ -z "$OPT_REMOTECMD" ] || opt_remotecmd="OPT_REMOTECMD"
+[ -z "$OPT_REMOTECMD" ] || opt_remotecmd="$OPT_REMOTECMD"
+[ -z "$OPT_SIZEADJ" ]   || opt_sizeadj="$OPT_SIZEADJ"
 
 if [ "$1" != '//' -a "${1%%-*}" != '' ]; then
        opt_command="$1"
@@ -2165,7 +2204,7 @@ elif [ "$opt_command" = volsize ]; then
                do_run mount -t btrfs $opt_dev $md 2>/dev/null
                [ ! -d $md/$1 ] || printf "$(du -xsb $md/$1 | awk '{print $1}')\n"
        else
-               do_run zfs list -Hpo used ${1%%/@} 2>/dev/null
+               do_run zfs list -Hpo space ${1%%/@} 2>/dev/null | awk '{printf "%d\n", $5}'
        fi
        exit 0
 elif [ "$opt_command" = fstype ]; then

Since it is nearly impossible to determine the exact size of the
required image due to metadata and other imponderables, a dynamic
adjustment is used. The value is set for each run so that the
free space in the image is around 100M

@tbertels
Copy link
Author

tbertels commented Jan 2, 2022

I'm having a hard time merging those changes on Windows (I did it manually for the other ones). Could you attach the modified file?

@mkreisl
Copy link
Contributor

mkreisl commented Jan 2, 2022

This should be much easier:

  1. download the patch
  2. ssh into xbian
  3. cd /usr/local/sbin
  4. sudo patch -lp5 < /pathtopatch/downloadedpatch

Oh I forgot, of course restore the original state with sudo apt-get install --reinstall xbian-package-config-shell first

@tbertels
Copy link
Author

tbertels commented Jan 2, 2022

Thanks, the patching works fine with that command, even on Windows (at least with the patch tool from Git, not the one from GnuWin32).

It works here too, with 77M (300M) remaining.

@mkreisl
Copy link
Contributor

mkreisl commented Jan 2, 2022

Great, so it seems that 300M start value will be a good choice.

@tbertels
Copy link
Author

tbertels commented Jan 6, 2022

It works fine too on an other rpi4, with 62M remaining there.

@mkreisl
Copy link
Contributor

mkreisl commented Mar 3, 2022

After a longer test and some corrections, this should now be fixed with this commit: 97e54ab

@tbertels
Copy link
Author

tbertels commented Mar 3, 2022

An adaptive size change, it should be future proof!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants