-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
272 additions
and
247 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
Typing `man 7 glob` (on Debian) will tell you that long ago there was a | ||
program called `/etc/glob` that would expand wildcard patterns. | ||
This chapter will explain **file globbing**. Typing `man 7 glob` (on Debian) will tell you that long ago there was a program called `/etc/glob` that would expand *wildcard patterns*. Soon afterward, this became a shell built-in. | ||
|
||
Today the shell is responsible for `file globbing` (or | ||
dynamic filename generation). This chapter will explain `file globbing`. | ||
A string is a wildcard pattern if it contains `?`, `*` or `[`. *Globbing* (or dynamic filename generation) is the operation that expands a wildcard pattern into a list of pathnames that match the pattern. | ||
|
305 changes: 164 additions & 141 deletions
305
modules/shell_globbing/030_shell_file_globbing_theory.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,145 +1,168 @@ | ||
## \* asterisk | ||
|
||
The asterisk `*` is interpreted by the shell as a sign to | ||
generate filenames, matching the asterisk to any combination of | ||
characters (even none). When no path is given, the shell will use | ||
filenames in the current directory. See the man page of | ||
`glob(7)` for more information. (This is part of LPI topic | ||
1.103.3.) | ||
|
||
[student@linux gen]$ ls | ||
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc | ||
[student@linux gen]$ ls File* | ||
File4 File55 FileA Fileab FileAB | ||
[student@linux gen]$ ls file* | ||
file1 file2 file3 fileab fileabc | ||
[student@linux gen]$ ls *ile55 | ||
File55 | ||
[student@linux gen]$ ls F*ile55 | ||
File55 | ||
[student@linux gen]$ ls F*55 | ||
File55 | ||
[student@linux gen]$ | ||
|
||
## ? question mark | ||
|
||
Similar to the asterisk, the question mark `?` is | ||
interpreted by the shell as a sign to generate filenames, matching the | ||
question mark with exactly one character. | ||
|
||
[student@linux gen]$ ls | ||
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc | ||
[student@linux gen]$ ls File? | ||
File4 FileA | ||
[student@linux gen]$ ls Fil?4 | ||
File4 | ||
[student@linux gen]$ ls Fil?? | ||
File4 FileA | ||
[student@linux gen]$ ls File?? | ||
File55 Fileab FileAB | ||
[student@linux gen]$ | ||
|
||
## \[\] square brackets | ||
|
||
The square bracket `[` is interpreted by the shell as a | ||
sign to generate filenames, matching any of the characters between `[` | ||
and the first subsequent `]`. The order in this list between the | ||
brackets is not important. Each pair of brackets is replaced by exactly | ||
one character. | ||
|
||
[student@linux gen]$ ls | ||
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc | ||
[student@linux gen]$ ls File[5A] | ||
FileA | ||
[student@linux gen]$ ls File[A5] | ||
FileA | ||
[student@linux gen]$ ls File[A5][5b] | ||
File55 | ||
[student@linux gen]$ ls File[a5][5b] | ||
File55 Fileab | ||
[student@linux gen]$ ls File[a5][5b][abcdefghijklm] | ||
ls: File[a5][5b][abcdefghijklm]: No such file or directory | ||
[student@linux gen]$ ls file[a5][5b][abcdefghijklm] | ||
fileabc | ||
[student@linux gen]$ | ||
|
||
You can also exclude characters from a list between square brackets with | ||
the exclamation mark `!`. And you are allowed to make | ||
combinations of these `wild cards`. | ||
|
||
[student@linux gen]$ ls | ||
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc | ||
[student@linux gen]$ ls file[a5][!Z] | ||
fileab | ||
[student@linux gen]$ ls file[!5]* | ||
file1 file2 file3 fileab fileabc | ||
[student@linux gen]$ ls file[!5]? | ||
fileab | ||
[student@linux gen]$ | ||
|
||
## a-z and 0-9 ranges | ||
|
||
The bash shell will also understand ranges of characters between | ||
brackets. | ||
|
||
[student@linux gen]$ ls | ||
file1 file3 File55 fileab FileAB fileabc | ||
file2 File4 FileA Fileab fileab2 | ||
[student@linux gen]$ ls file[a-z]* | ||
fileab fileab2 fileabc | ||
[student@linux gen]$ ls file[0-9] | ||
file1 file2 file3 | ||
[student@linux gen]$ ls file[a-z][a-z][0-9]* | ||
fileab2 | ||
[student@linux gen]$ | ||
|
||
## \$LANG and square brackets | ||
|
||
But, don\'t forget the influence of the `LANG` variable. | ||
Some languages include lower case letters in an upper case range (and | ||
vice versa). | ||
|
||
student@linux:~/test$ ls [A-Z]ile? | ||
file1 file2 file3 File4 | ||
student@linux:~/test$ ls [a-z]ile? | ||
file1 file2 file3 File4 | ||
student@linux:~/test$ echo $LANG | ||
en_US.UTF-8 | ||
student@linux:~/test$ LANG=C | ||
student@linux:~/test$ echo $LANG | ||
C | ||
student@linux:~/test$ ls [a-z]ile? | ||
file1 file2 file3 | ||
student@linux:~/test$ ls [A-Z]ile? | ||
File4 | ||
student@linux:~/test$ | ||
|
||
If `$LC_ALL` is set, then this will also need to be reset to prevent | ||
file globbing. | ||
## `*` asterisk | ||
|
||
The asterisk `*` is interpreted by the shell as a sign to generate filenames, matching the asterisk to any combination of characters (even none). When no path is given, the shell will use filenames in the current directory. See the man page of `glob(7)` for more information. | ||
|
||
```console | ||
student@linux:~/gen$ ls | ||
file1 file2 file3 File4 File55 FileA fileå fileab Fileab FileAB fileabc fileæ fileø filex filey filez | ||
student@linux:~/gen$ ls File* | ||
File4 File55 FileA Fileab FileAB | ||
student@linux:~/gen$ ls file* | ||
file1 file2 file3 fileå fileab fileabc fileæ fileø filex filey filez | ||
student@linux:~/gen$ ls *ile55 | ||
File55 | ||
student@linux:~/gen$ ls F*ile55 | ||
File55 | ||
student@linux:~/gen$ ls F*55 | ||
File55 | ||
``` | ||
|
||
## `?` question mark | ||
|
||
Similar to the asterisk, the question mark `?` is interpreted by the shell as a sign to generate filenames, matching the question mark with exactly one character. | ||
|
||
```console | ||
student@linux:~/gen$ ls File? | ||
File4 FileA | ||
student@linux:~/gen$ ls Fil?4 | ||
File4 | ||
student@linux:~/gen$ ls Fil?? | ||
File4 FileA | ||
student@linux:~/gen$ ls File?? | ||
File55 Fileab FileAB | ||
``` | ||
|
||
## `[]` square brackets | ||
|
||
The square bracket `[` is interpreted by the shell as a sign to generate filenames, matching any of the characters between `[` and the first subsequent `]`. The order in this list between the brackets is not important. Each pair of brackets is replaced by exactly one character. | ||
|
||
```console | ||
student@linux:~/gen$ ls File[5A] | ||
FileA | ||
student@linux:~/gen$ ls File[A5]3 | ||
ls: cannot access 'File[A5]3': No such file or directory | ||
student@linux:~/gen$ ls File[A5] | ||
FileA | ||
student@linux:~/gen$ ls File[A5][5b] | ||
File55 | ||
student@linux:~/gen$ ls File[a5][5b] | ||
File55 Fileab | ||
student@linux:~/gen$ ls File[a5][5b][abcdefghijklm] | ||
ls: cannot access 'File[a5][5b][abcdefghijklm]': No such file or directory | ||
student@linux:~/gen$ ls file[a5][5b][abcdefghijklm] | ||
fileabc | ||
``` | ||
|
||
You can also exclude characters from a list between square brackets with the exclamation mark `!`. And you are allowed to make combinations of these *wildcards*. | ||
|
||
```console | ||
student@linux:~/gen$ ls file[a5][!Z] | ||
fileab | ||
student@linux:~/gen$ ls file[!5]* | ||
file1 file2 file3 fileå fileab fileabc fileæ fileø filex filey filez | ||
student@linux:~/gen$ ls file[!5]? | ||
fileab | ||
``` | ||
|
||
## `a-z` and `0-9` ranges | ||
|
||
The bash shell will also understand ranges of characters between brackets. | ||
|
||
```console | ||
student@linux:~/gen$ ls file[a-z]* | ||
fileab fileabc filex filey filez | ||
student@linux:~/gen$ ls file[0-9] | ||
file1 file2 file3 | ||
student@linux:~/gen$ ls file[a-z][a-z][0-9]* | ||
ls: cannot access 'file[a-z][a-z][0-9]*': No such file or directory | ||
student@linux:~/gen$ ls file[a-z][a-z][a-z]* | ||
fileabc | ||
``` | ||
|
||
## named character classes | ||
|
||
Instead of ranges, you can also specify named character classes: `[[:alnum:]]`,, `[[:alpha:]]`, `[[:blank:]]`, `[[:cntrl:]]`, `[[:digit:]]`, `[[:graph:]]`, `[[:lower:]]`, `[[:print:]]`, `[[:punct:]]`, `[[:space:]]`, `[[:upper:]]`, `[[:xdigit:]]`. Instead of, e.g. `[a-z]`, you can also use `[[:lower:]]`. | ||
|
||
```console | ||
student@linux:~/gen$ ls file[a-z]* | ||
fileab fileabc filex filey filez | ||
student@linux:~/gen$ ls file[[:lower:]]* | ||
fileå fileab fileabc fileæ fileø filex filey filez | ||
``` | ||
|
||
Remark that the named character classes work better for international characters. In the example above, `[a-z]` does not match the Danish characters `æ`, `ø`, and `å`, but `[[:lower:]]` does. | ||
|
||
## `$LANG` and square brackets | ||
|
||
But, don't forget the influence of the `$LANG` variable. Depending on the selected language or locale, the shell will interpret the square brackets and named character classes differently. Sort order may also be affected. | ||
|
||
For example, when we select the default locale called `C`: | ||
|
||
```console | ||
student@linux:~/gen$ sudo localectl set-locale C | ||
[... log out and log in again ...] | ||
student@linux:~/gen$ echo $LANG | ||
C | ||
student@linux:~/gen$ ls | ||
File4 File55 FileA FileAB Fileab file1 file2 file3 fileab fileabc filex filey filez 'file'$'\303\245' 'file'$'\303\246' 'file'$'\303\270' | ||
student@linux:~/gen$ ls file[[:lower:]]* | ||
fileab fileabc filex filey filez | ||
``` | ||
|
||
The Danish characters can't be displayed properly and don't match the `[[:lower:]]` character class. | ||
|
||
Let us change the locale to `da_DK.UTF-8` (Danish/Denmark with UTF-8 support) and see what happens: | ||
|
||
```console | ||
student@linux:~/gen$ sudo localectl set-locale da_DK.UTF-8 | ||
[... log out and log in again ...] | ||
student@linux:~/gen$ echo $LANG | ||
da_DK.UTF-8 | ||
student@linux:~/gen$ ls | ||
file1 file2 file3 File4 File55 FileA FileAB Fileab fileab fileabc filex filey filez fileæ fileø fileå | ||
student@linux:~/gen$ ls file[[:lower:]]* | ||
fileab fileabc filex filey filez fileæ fileø fileå | ||
``` | ||
|
||
Now the Danish characters are displayed properly and match the `[[:lower:]]` character class. | ||
|
||
In the `en_US.UTF-8` locale (US English, with UTF-8 support), the Danish characters are displayed properly, and also match the `[[:lower:]]` character class. However, they are sorted differently: | ||
|
||
```console | ||
student@linux:~/gen$ sudo localectl set-locale en_US.UTF-8 | ||
[... log out and log in again ...] | ||
student@linux:~/gen$ echo $LANG | ||
en_US.UTF-8 | ||
student@linux:~/gen$ ls | ||
file1 file2 file3 File4 File55 FileA fileå fileab Fileab FileAB fileabc fileæ fileø filex filey filez | ||
student@linux:~/gen$ ls file[[:lower:]]* | ||
fileå fileab fileabc fileæ fileø filex filey filez | ||
``` | ||
|
||
## preventing file globbing | ||
|
||
The screenshot below should be no surprise. The `echo *` | ||
will echo a \* when in an empty directory. And it will echo the names of | ||
all files when the directory is not empty. | ||
|
||
student@linux:~$ mkdir test42 | ||
student@linux:~$ cd test42 | ||
student@linux:~/test42$ echo * | ||
* | ||
student@linux:~/test42$ touch file42 file33 | ||
student@linux:~/test42$ echo * | ||
file33 file42 | ||
|
||
Globbing can be prevented using quotes or by escaping the | ||
special characters, as shown in this screenshot. | ||
|
||
student@linux:~/test42$ echo * | ||
file33 file42 | ||
student@linux:~/test42$ echo \* | ||
* | ||
student@linux:~/test42$ echo '*' | ||
* | ||
student@linux:~/test42$ echo "*" | ||
* | ||
If a wildcard pattern does not match any filenames, the shell will not expand the pattern. Consequently, when in an empty directory, `echo *` will display a `*`. It will echo the names of all files when the directory is not empty. | ||
|
||
```console | ||
student@linux:~$ mkdir test42 | ||
student@linux:~$ cd test42/ | ||
student@linux:~/test42$ echo * | ||
* | ||
student@linux:~/test42$ touch test{1,2,3} | ||
student@linux:~/test42$ echo * | ||
test1 test2 test3 | ||
``` | ||
|
||
Globbing can be prevented using quotes or by escaping the special characters, as shown in this screenshot. | ||
|
||
```console | ||
student@linux:~/test42$ echo * | ||
test1 test2 test3 | ||
student@linux:~/test42$ echo \* | ||
* | ||
student@linux:~/test42$ echo '*' | ||
* | ||
student@linux:~/test42$ echo "*" | ||
* | ||
``` | ||
|
Oops, something went wrong.