Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add path predicates for various cloud systems #109

Closed
BurntSushi opened this issue Jul 6, 2022 · 1 comment
Closed

add path predicates for various cloud systems #109

BurntSushi opened this issue Jul 6, 2022 · 1 comment
Labels
wontfix This will not be worked on

Comments

@BurntSushi
Copy link
Owner

I read and understand you want bstr to allow for more possibilities around bytes:
-string of bytes conforming to utf-8
-string of bytes non-conforming to utf-8, but it's encoding is yet to be determined.

Yesterday, an important and lengthy cloud backup failed to complete. Why? Because the name of either the file or directory contained MACOS emojis and the backup's destination service does not like those kinds of characters(non-conformant). It's related because the file system on MACOS/UNIX/BSD accepts those character ranges be it visible or invisible. Linux xfs/btrfs/ext4 don't have any issues with those character ranges either.

Here is short specific example of an error caused by non-conformance. It may not seem like a big thing, but it is considering it's for a company's backup of data and it delays the backup for the entire set of data. Human intervention is necessary at this point.

rsync --archive someSourceDir/ /mnt/tapedrive/someDestinationDir/

within some subdirectory however there is a file called "blah:foo" notice the colon in there. The backup continues for hours then errors out on the ':' character and does not backup the file in question. If it's a directory name with a colon, it does not backup the directory in question. In other words it complicates matters. Why is that? LTFS formatted tape file systems don't like that ':' colon character in file/directory names. So the use case scenario with an emoji in a filename or directory name brought up another error but this time not on tape backups, but on cloud backups. These are big deals on traditional UNIX-based file systems, but it's a big deal on anything that isn't. I'm guessing AWS S3 and other cloud services have issues like this.

The above brings me to suggest these in the bstr API for helping determine if filenames and directory names are going to be ok before they land on said backup storage destination:
bstr::Is_LTFS_Conformant()
bstr::Is_S3_Conformant()
bstr::Is_Google_Cloud_BitbucketName_Conformant()
bstr::Is_Optical_UDF_FS_Conformant()

Oddly enough the only way I've found to circumvent these issues is by tar'ing someSourceDir and then rsync'ing the tar to said destination backup storage, but that's not always the ideal situation. It would be better to find those non-conformant filenames/directory names beforehand and propose some auto-correcting measures to conform to the respective storage range of characters and such.

Why am I mentioning it here? It's because strings, bstr, path are the usual go-to types for holding filenames and directory names. It would be good to have such helper tools available to help everybody save time when doing their own backups without experiencing any errors.

Also to further prevent these kinds of interoperability issues, why not place in the filesystem drivers new constraints that disallow emojis and other unwanted character ranges as filenames and directory names from the beginning. i.e. the openfile/createfile would disallow ':', emojis and such on every operating system. Get all the operating system and storage providers to collaborate on this once and for all.

Thank you for listening. Cheers.

Originally posted by @omac777 in #40 (comment)

@BurntSushi
Copy link
Owner Author

@omac777 I'm pretty sure bstr is not the right place for weird cloud specific path validations. Now, bstr should let you write those path validations, but I don't think there's anything obviously actionable here. So I'm going to close this issue.

@BurntSushi BurntSushi added the wontfix This will not be worked on label Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant