Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Initial sync crashes with std.utf.UTFException: Invalid UTF-8 sequence #2829

Closed
phlibi opened this issue Sep 19, 2024 · 4 comments · Fixed by #2816 or #2851
Closed

Bug: Initial sync crashes with std.utf.UTFException: Invalid UTF-8 sequence #2829

phlibi opened this issue Sep 19, 2024 · 4 comments · Fixed by #2816 or #2851
Labels
Bug Something isn't working Duplicate This issue or pull request already exists
Milestone

Comments

@phlibi
Copy link

phlibi commented Sep 19, 2024

Describe the bug

This has been mentioned in #2813 already and might be related. It happened at the end of an initial sync of a Sharepoint folder. Re-running the exact same process (also with --resync --resync-auth) then completed normally.

Operating System Details

Debian Bookworm (12) with backports enabled
Linux phiptp 6.10.6+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.10.6-1~bpo12+1 (2024-08-26) x86_64 GNU/Linux

Client Installation Method

From Source

OneDrive Account Type

SharePoint

What is your OneDrive Application Version

v2.5.0-6-g280d369 (with PR 2816)

What is your OneDrive Application Configuration

$ ./onedrive --sync --confdir=/home/phip/.config/onedrive/swxxxod --verbose --download-only --resync --resync-auth --display-config
Reading configuration file: /home/phip/.config/onedrive/swxxxod/config
Configuration file successfully loaded
Using 'user' configuration path for application config and state data: /home/phip/.config/onedrive/swxxxod
Application version                          = onedrive v2.5.0-6-g280d369
Compiled with                                = DMD 2109
User Application Config path                 = /home/phip/.config/onedrive/swxxxod
System Application Config path               = /etc/onedrive
Applicable Application 'config' location     = /home/phip/.config/onedrive/swxxxod/config
Configuration file found in config location  = true - using 'config' file values to override application defaults
Applicable 'sync_list' location              = /home/phip/.config/onedrive/swxxxod/sync_list
Applicable 'items.sqlite3' location          = /home/phip/.config/onedrive/swxxxod/items.sqlite3
Config option 'drive_id'                     = b!ozVsZqWFpU.........b5nb-SaXtsp
Config option 'sync_dir'                     = ~/phipsfiles/swxxx/swxxxod
Config option 'enable_logging'               = false
Config option 'log_dir'                      = /var/log/onedrive
Config option 'disable_notifications'        = false
Config option 'skip_dir'                     = 
Config option 'skip_dir_strict_match'        = false
Config option 'skip_file'                    = ~*|.~*|*.tmp|*.swp|*.partial
Config option 'skip_dotfiles'                = false
Config option 'skip_symlinks'                = true
Config option 'monitor_interval'             = 300
Config option 'monitor_log_frequency'        = 12
Config option 'monitor_fullscan_frequency'   = 12
Config option 'read_only_auth_scope'         = false
Config option 'dry_run'                      = false
Config option 'upload_only'                  = false
Config option 'download_only'                = true
Config option 'local_first'                  = false
Config option 'check_nosync'                 = false
Config option 'check_nomount'                = false
Config option 'resync'                       = true
Config option 'resync_auth'                  = true
Config option 'cleanup_local_files'          = false
Config option 'classify_as_big_delete'       = 1000
Config option 'disable_upload_validation'    = false
Config option 'disable_download_validation'  = false
Config option 'bypass_data_preservation'     = false
Config option 'no_remote_delete'             = false
Config option 'remove_source_files'          = false
Config option 'sync_dir_permissions'         = 700
Config option 'sync_file_permissions'        = 600
Config option 'space_reservation'            = 52428800
Config option 'application_id'               = d50ca740-c83f-4d1b-b616-12c519384f0c
Config option 'azure_ad_endpoint'            = 
Config option 'azure_tenant_id'              = 
Config option 'user_agent'                   = ISV|abraunegg|OneDrive Client for Linux/v2.5.0-6-g280d369
Config option 'force_http_11'                = false
Config option 'debug_https'                  = false
Config option 'rate_limit'                   = 0
Config option 'operation_timeout'            = 3600
Config option 'dns_timeout'                  = 60
Config option 'connect_timeout'              = 10
Config option 'data_timeout'                 = 60
Config option 'ip_protocol_version'          = 0
Config option 'threads'                      = 8
Compile time option --enable-notifications   = false

Selective sync 'sync_list' configured        = false

Config option 'sync_business_shared_items'   = false

Config option 'webhook_enabled'              = false

What is your 'curl' version

curl 8.9.1 (x86_64-pc-linux-gnu) libcurl/8.9.1 GnuTLS/3.7.9 zlib/1.2.13 brotli/1.0.9 zstd/1.5.4 libidn2/2.3.3 libpsl/0.21.2 libssh2/1.10.0 nghttp2/1.52.0 ngtcp2/1.6.0 nghttp3/1.4.0 librtmp/2.3 OpenLDAP/2.5.13
Release-Date: 2024-07-31, security patched: 8.9.1-2~bpo12+1
Protocols: dict file ftp ftps gopher gophers http https imap imaps ipfs ipns ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp ws wss
Features: alt-svc AsynchDNS brotli GSS-API HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM PSL SPNEGO SSL threadsafe TLS-SRP UnixSockets zstd

Where is your 'sync_dir' located

Local

What are all your system 'mount points'

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=15715820k,nr_inodes=3928955,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3152504k,mode=755,inode64)
zroot/ROOT/debian on / type zfs (rw,relatime,xattr,noacl,casesensitive)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=11726)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,nosuid,nodev,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
none on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
zroot on /zroot type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home on /home type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/scratch on /scratch type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles on /home/phip/phipsfiles type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/developing on /home/phip/phipsfiles/developing type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/documents on /home/phip/phipsfiles/documents type zfs (rw,relatime,xattr,noacl,casesensitive)
zroot/data/home/phipsfiles/swxxx on /home/phip/phipsfiles/swxxx type zfs (rw,relatime,xattr,noacl,casesensitive)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)

What are all your local file system partition types

All local data is stored on ZFS

How do you use 'onedrive'

The folder is a company share, about 10 other people have access. Changes are rare, though. It is rather unlikely that a file was changed online during the run.

Steps to reproduce the behaviour

Although not currently tried, I could possibly remove all local files and state to trigger the same failure again. I can do this if requested, but since all this might be closely related to #2813, this will probably not provide much more insight.

Complete Verbose Log Output

NOTE: Stripped log, as all this is already being handled by abraunegg.

$ ./onedrive --sync --confdir=/home/phip/.config/onedrive/swxxxod --verbose --download-only --resync --resync-auth
...
Processing: heliumv/bestellungen/2020
The directory has not changed
Attempting to perform a database vacuum to optimise database
Database vacuum is complete
std.utf.UTFException@/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d(1556): Invalid UTF-8 sequence (at index 1)
----------------
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d:1594 pure dchar std.utf.decodeImpl!(true, 0, const(char)[]).decodeImpl(ref const(char)[], ref ulong) [0x5627578b3090]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/utf.d:1186 pure @trusted dchar std.utf.decode!(0, const(char)[]).decode(scope ref const(char)[], ref ulong) [0x5627578b3003]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/ir.d:827 pure @safe bool std.regex.internal.ir.Input!(char).Input.nextChar(ref dchar, ref ulong) [0x562757888cb2]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/thompson.d:789 pure @trusted bool std.regex.internal.thompson.ThompsonMatcher!(char, std.regex.internal.ir.Input!(char).Input).ThompsonMatcher.next() [0x56275788ccc4]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/thompson.d:943 pure @trusted int std.regex.internal.thompson.ThompsonMatcher!(char, std.regex.internal.ir.Input!(char).Input).ThompsonMatcher.match(std.regex.internal.ir.Group!(ulong).Group[]) [0x5627578910d1]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:775 pure void std.regex.RegexMatch!(immutable(char)[]).RegexMatch.__ctor!(std.regex.internal.ir.Regex!(char).Regex).__ctor(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex).__lambda4!(std.regex.internal.ir.Group!(ulong).Group[]).__lambda4(std.regex.internal.ir.Group!(ulong).Group[]) [0x5627578a7050]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/internal/ir.d:1122 pure void std.regex.internal.ir.SmallFixedArray!(std.regex.internal.ir.Group!(ulong).Group, 3u).SmallFixedArray.mutate(scope void delegate(std.regex.internal.ir.Group!(ulong).Group[]) pure) [0x56275789819a]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:775 ref @trusted std.regex.RegexMatch!(immutable(char)[]).RegexMatch std.regex.RegexMatch!(immutable(char)[]).RegexMatch.__ctor!(std.regex.internal.ir.Regex!(char).Regex).__ctor(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex) [0x5627578a6fb6]
/home/phip/dlang/dmd-2.109.1/linux/bin64/../../src/phobos/std/regex/package.d:1013 @safe std.regex.RegexMatch!(immutable(char)[]).RegexMatch std.regex.match!(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex).match(immutable(char)[], std.regex.internal.ir.Regex!(char).Regex) [0x5627578a6e1d]
src/util.d:521 bool util.isValidUTCDateTime(immutable(char)[]) [0x5627578ad9ee]
src/itemdb.d:701 itemdb.Item itemdb.ItemDatabase.buildItem(sqlite.Statement.Result) [0x56275790b920]
src/itemdb.d:505 itemdb.Item[] itemdb.ItemDatabase.selectChildren(const(char)[], const(char)[]) [0x562757909db7]
src/sync.d:3371 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1f4b]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3373 void syncEngine.SyncEngine.checkDirectoryDatabaseItemForConsistency(itemdb.Item, immutable(char)[]) [0x5627578e1fce]
src/sync.d:3217 void syncEngine.SyncEngine.checkDatabaseItemForConsistency(itemdb.Item) [0x5627578e0eb1]
src/sync.d:3132 void syncEngine.SyncEngine.performDatabaseConsistencyAndIntegrityCheck() [0x5627578e09c2]
src/main.d:763 _Dmain [0x562757788393]

Screenshots

No response

Other Log Information or Details

No response

Additional context

#2816
Client compiled from source with --enable-debug
Total synchronized data is about 2.4GB in 5300 files

@phlibi phlibi added the Bug Something isn't working label Sep 19, 2024
@abraunegg abraunegg added this to the v2.5.1 milestone Sep 19, 2024
@abraunegg abraunegg linked a pull request Sep 20, 2024 that will close this issue
@abraunegg
Copy link
Owner

@phlibi
I have updated the #2816 PR with a number of changes this morning.

Please can you rebuild your client using this PR, to validate the fix for this issue.

@phlibi
Copy link
Author

phlibi commented Sep 22, 2024

I close this as a duplicate of #2813. Although a different exception is reported, it appears that the actual cause (corruption of the mtime field in the items.sqlite3 DB) happens to randomly either result in a TimeException or a UTFException.

@abraunegg
Copy link
Owner

Agreed

abraunegg added a commit that referenced this issue Sep 26, 2024
* Add isValidUTCDateTime function to validate timestamp as received from OneDrive API to ensure it is valid
* Use new function before attempting to call SysTime.fromISOExtString to ensure this call will be successful
* If there is no timestamp in the JSON, set it to the system time
* Add assertion when building an item from DB data
* Add new function (isValidUTF8) to check UTF-8 validity of a string before timestamp regex check
* In a --resync scenario, if the file hash is the same, use the online timestamp as source of truth
* Ensure that the session URL data is a valid JSON response before use
* Ensure a local time in UTC is being used if the JSON data has no date
* Ensure the DB is opened in the most threadsafe manner possible
* Add patch provided by @phlibi to add synchronized() around DB access methods
* Align timestamp creation method with itemdb if element is missing
@abraunegg abraunegg reopened this Sep 26, 2024
@abraunegg abraunegg linked a pull request Sep 26, 2024 that will close this issue
@abraunegg
Copy link
Owner

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Repository owner locked as resolved and limited conversation to collaborators Oct 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Something isn't working Duplicate This issue or pull request already exists
Projects
None yet
2 participants