add path predicates for various cloud systems #109

BurntSushi · 2022-07-06T13:02:07Z

I read and understand you want bstr to allow for more possibilities around bytes:
-string of bytes conforming to utf-8
-string of bytes non-conforming to utf-8, but it's encoding is yet to be determined.

Yesterday, an important and lengthy cloud backup failed to complete. Why? Because the name of either the file or directory contained MACOS emojis and the backup's destination service does not like those kinds of characters(non-conformant). It's related because the file system on MACOS/UNIX/BSD accepts those character ranges be it visible or invisible. Linux xfs/btrfs/ext4 don't have any issues with those character ranges either.

Here is short specific example of an error caused by non-conformance. It may not seem like a big thing, but it is considering it's for a company's backup of data and it delays the backup for the entire set of data. Human intervention is necessary at this point.

rsync --archive someSourceDir/ /mnt/tapedrive/someDestinationDir/

within some subdirectory however there is a file called "blah:foo" notice the colon in there. The backup continues for hours then errors out on the ':' character and does not backup the file in question. If it's a directory name with a colon, it does not backup the directory in question. In other words it complicates matters. Why is that? LTFS formatted tape file systems don't like that ':' colon character in file/directory names. So the use case scenario with an emoji in a filename or directory name brought up another error but this time not on tape backups, but on cloud backups. These are big deals on traditional UNIX-based file systems, but it's a big deal on anything that isn't. I'm guessing AWS S3 and other cloud services have issues like this.

The above brings me to suggest these in the bstr API for helping determine if filenames and directory names are going to be ok before they land on said backup storage destination:
bstr::Is_LTFS_Conformant()
bstr::Is_S3_Conformant()
bstr::Is_Google_Cloud_BitbucketName_Conformant()
bstr::Is_Optical_UDF_FS_Conformant()

Oddly enough the only way I've found to circumvent these issues is by tar'ing someSourceDir and then rsync'ing the tar to said destination backup storage, but that's not always the ideal situation. It would be better to find those non-conformant filenames/directory names beforehand and propose some auto-correcting measures to conform to the respective storage range of characters and such.

Why am I mentioning it here? It's because strings, bstr, path are the usual go-to types for holding filenames and directory names. It would be good to have such helper tools available to help everybody save time when doing their own backups without experiencing any errors.

Also to further prevent these kinds of interoperability issues, why not place in the filesystem drivers new constraints that disallow emojis and other unwanted character ranges as filenames and directory names from the beginning. i.e. the openfile/createfile would disallow ':', emojis and such on every operating system. Get all the operating system and storage providers to collaborate on this once and for all.

Thank you for listening. Cheers.

Originally posted by @omac777 in #40 (comment)

The text was updated successfully, but these errors were encountered:

BurntSushi · 2022-07-06T13:04:51Z

@omac777 I'm pretty sure bstr is not the right place for weird cloud specific path validations. Now, bstr should let you write those path validations, but I don't think there's anything obviously actionable here. So I'm going to close this issue.

BurntSushi mentioned this issue Jul 6, 2022

RFC: 1.0 release? #40

Closed

BurntSushi closed this as completed Jul 6, 2022

BurntSushi added the wontfix This will not be worked on label Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add path predicates for various cloud systems #109

add path predicates for various cloud systems #109

BurntSushi commented Jul 6, 2022

BurntSushi commented Jul 6, 2022

add path predicates for various cloud systems #109

add path predicates for various cloud systems #109

Comments

BurntSushi commented Jul 6, 2022

BurntSushi commented Jul 6, 2022