From 6e0ba1be0f370d0ad44bd6ff572e235c3f70f532 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Szczepanik?= Date: Mon, 13 Dec 2021 00:21:00 +0100 Subject: [PATCH] Update filenaming.rst --- docs/intro/filenaming.rst | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/intro/filenaming.rst b/docs/intro/filenaming.rst index 8c7904ea8..6b24e6788 100644 --- a/docs/intro/filenaming.rst +++ b/docs/intro/filenaming.rst @@ -8,7 +8,7 @@ One of the most fundamental data management tasks is naming files. .. figure:: https://imgs.xkcd.com/comics/documents.png This may sound mundane to some --- "yeah, I guess I can't manage data if it doesn't have a file name, but what's the big deal? I already know that `I shouldn't use spaces in file names `_". -Other people take it to the edge of `bikeshedding `_ an beyond: "Let's start each file name with the ISO 8601 date format of its first modification for better sorting!" +Other people take it to the edge of `bikeshedding `_ and beyond: "Let's start each file name with the ISO 8601 date format of its first modification for better sorting!" But between the two extremes, there are a surprising amount of *interoperability* considerations --- practices and tips to ensure cross-platform compatibility --- in order to create file names that work on your own as well as on anyone else's machine. This matters because the files, directories, or repositories that you may at some point share with friends, colleagues, or the entire internet should successfully make it to those other people's computers, and on those computers they should be usable for their intended purpose. @@ -62,7 +62,7 @@ If you can, try to avoid trouble with unicode Lucky are the people with boring names without accents and special characters. The others may have an extra bit of fun in their lives when software can not handle their names. -Even though certain names look identical across file system or operating systems, their underlying unicode character sequences can differ. +Even though certain names look identical across file systems or operating systems, their underlying unicode character sequences can differ. For example, the character "é" can be represented as the single Unicode character u+00E9 (latin small letter e with acute), or as the two Unicode characters u+0065 and u+0301 (the letter "e" plus a combining acute symbol). This is called `canonical equivalence `_ and can be confusing: While file names are visually indistinguishable, certain tools, operating systems, or file systems can normalize their underlying unicode differently and cause errors in the process. It becomes a problem, potentially even leading to permanent data loss, when `one tool or filesystem won't recognize a file anymore that has been normalized by a different tool or filesystem `_. @@ -84,8 +84,8 @@ And places that ban certain file names will not be able to create, receive, or u Avoid illegal characters ======================== -Different operating system disallow certain characters in file names, and things will be messy if you were to share a file with a character that works on your machine with a machine that regards it as illegal. -Let's start easy and with characters that you can actually find on your keyboard: +Different operating systems disallow certain characters in file names, and things will be messy if you were to share a file with a character that works on your machine with a machine that regards it as illegal. +Let's start easy, and with characters that you can actually find on your keyboard... On Unix systems, the forward slash ``/`` can not be used in file names. This is because this character is used to denote directory boundaries. @@ -106,7 +106,7 @@ On Windows systems, on the other hand, there is quite a long list of characters: And in addition, its also not possible to end a file name with a period (``.``) or a space on Windows. Especially Unix users can thus inadvertently create files that a Windows system couldn't handle. -But in the case of files with illegal names being committed into Git repositories, the consequences are actually more sever than just a single file that can't be used or copied. +But in the case of files with illegal names being committed into Git repositories, the consequences are actually more severe than just a single file that can't be used or copied. A dataset with a file with invalid characters likely fails to be cloned, because a checkout of that file will fail. If a file with an invalid character exists on the non-default :term:`branch`, then that branch likely can't be checked out. So while having invalid characters in your files is 1) a considerably convoluted way of keeping a Git repository private from that one co-worker who uses Windows, it is mostly 2) a `major interoperability hassle `_, even more so in the context of version control. @@ -149,15 +149,15 @@ Here are more examples on how to be nice to the system that receives your files: Keep file names below 260 characters ==================================== -If you want to annoy a foe with a Windows-based machine, or convince a Windows-friend to switch to Unix, send them files that exceed Window's (default) maximum path length limitation of 260 characters, or make them copy files with acceptable length into a too long, nested directory hierarchy [#f3]_. +If you want to annoy a foe with a Windows-based machine, or convince a Windows-friend to switch to Unix, send them files that exceed Windows' (default) maximum path length limitation of 260 characters, or make them copy files with acceptable length into a too long, nested directory hierarchy [#f3]_. This limit exists for all machines running Windows before Windows 10 version 1607, and for all later versions of Windows if the maximum path limit has not been manually removed in the settings. For more gotchas that Windows users may run into, take a look at the section :ref:`ohnowindows`. Prevent paths to be interpreted as command line arguments ========================================================= -While its not "illegal" to start a directory of file name with a hyphen (``-``) its a bad idea and disallowed by certain tools due to security risks. -In theory, a file name with a hyphen can clash with a command line argument, and a tool called to operate on that file may then misinterpret it as an argument. +While it's not "illegal" to start a directory of file name with a hyphen (``-``), it's a bad idea, and doing so is disallowed by certain tools due to security risks. +In theory, a file name starting with a hyphen can clash with a command line argument, and a tool called to operate on that file may then misinterpret it as an argument name. If you were to create a file called ``-n`` on a Unix system, an ``ls`` or ``cat`` on this file (unless you would add a ``./`` prefix to indicate a file in the current directory) would behave different than expected, parametrizing the command line tool instead of displaying any file information. Because this can be a security hazard, for example leading to remote code execution, `Git will refuse to operate on submodules that start with a hyphen (CVE-2018-17456) `_. @@ -175,7 +175,7 @@ Much information and some general structure of this page is taken from `RFC 8493 The links used throughout this overview provide details and further information for particular issues. A good general overview on how to name files can be found at `psychoinformatics-de.github.io/rdm-course/02-structuring-data/index.html `_. -Do you know more? `We're eager to include your advice `_! +Do you know more? `We're eager to include your advice `_! .. rubric:: Footnotes @@ -183,4 +183,4 @@ Do you know more? `We're eager to include your advice