Skip to content

Commit

Permalink
Review module shell_globbing
Browse files Browse the repository at this point in the history
  • Loading branch information
bertvv committed Oct 28, 2024
1 parent 17693d0 commit 6de5c67
Show file tree
Hide file tree
Showing 4 changed files with 272 additions and 247 deletions.
6 changes: 2 additions & 4 deletions modules/shell_globbing/020_shell_file_globbing_about.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
Typing `man 7 glob` (on Debian) will tell you that long ago there was a
program called `/etc/glob` that would expand wildcard patterns.
This chapter will explain **file globbing**. Typing `man 7 glob` (on Debian) will tell you that long ago there was a program called `/etc/glob` that would expand *wildcard patterns*. Soon afterward, this became a shell built-in.

Today the shell is responsible for `file globbing` (or
dynamic filename generation). This chapter will explain `file globbing`.
A string is a wildcard pattern if it contains `?`, `*` or `[`. *Globbing* (or dynamic filename generation) is the operation that expands a wildcard pattern into a list of pathnames that match the pattern.

305 changes: 164 additions & 141 deletions modules/shell_globbing/030_shell_file_globbing_theory.md
Original file line number Diff line number Diff line change
@@ -1,145 +1,168 @@
## \* asterisk

The asterisk `*` is interpreted by the shell as a sign to
generate filenames, matching the asterisk to any combination of
characters (even none). When no path is given, the shell will use
filenames in the current directory. See the man page of
`glob(7)` for more information. (This is part of LPI topic
1.103.3.)

[student@linux gen]$ ls
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc
[student@linux gen]$ ls File*
File4 File55 FileA Fileab FileAB
[student@linux gen]$ ls file*
file1 file2 file3 fileab fileabc
[student@linux gen]$ ls *ile55
File55
[student@linux gen]$ ls F*ile55
File55
[student@linux gen]$ ls F*55
File55
[student@linux gen]$

## ? question mark

Similar to the asterisk, the question mark `?` is
interpreted by the shell as a sign to generate filenames, matching the
question mark with exactly one character.

[student@linux gen]$ ls
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc
[student@linux gen]$ ls File?
File4 FileA
[student@linux gen]$ ls Fil?4
File4
[student@linux gen]$ ls Fil??
File4 FileA
[student@linux gen]$ ls File??
File55 Fileab FileAB
[student@linux gen]$

## \[\] square brackets

The square bracket `[` is interpreted by the shell as a
sign to generate filenames, matching any of the characters between `[`
and the first subsequent `]`. The order in this list between the
brackets is not important. Each pair of brackets is replaced by exactly
one character.

[student@linux gen]$ ls
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc
[student@linux gen]$ ls File[5A]
FileA
[student@linux gen]$ ls File[A5]
FileA
[student@linux gen]$ ls File[A5][5b]
File55
[student@linux gen]$ ls File[a5][5b]
File55 Fileab
[student@linux gen]$ ls File[a5][5b][abcdefghijklm]
ls: File[a5][5b][abcdefghijklm]: No such file or directory
[student@linux gen]$ ls file[a5][5b][abcdefghijklm]
fileabc
[student@linux gen]$

You can also exclude characters from a list between square brackets with
the exclamation mark `!`. And you are allowed to make
combinations of these `wild cards`.

[student@linux gen]$ ls
file1 file2 file3 File4 File55 FileA fileab Fileab FileAB fileabc
[student@linux gen]$ ls file[a5][!Z]
fileab
[student@linux gen]$ ls file[!5]*
file1 file2 file3 fileab fileabc
[student@linux gen]$ ls file[!5]?
fileab
[student@linux gen]$

## a-z and 0-9 ranges

The bash shell will also understand ranges of characters between
brackets.

[student@linux gen]$ ls
file1 file3 File55 fileab FileAB fileabc
file2 File4 FileA Fileab fileab2
[student@linux gen]$ ls file[a-z]*
fileab fileab2 fileabc
[student@linux gen]$ ls file[0-9]
file1 file2 file3
[student@linux gen]$ ls file[a-z][a-z][0-9]*
fileab2
[student@linux gen]$

## \$LANG and square brackets

But, don\'t forget the influence of the `LANG` variable.
Some languages include lower case letters in an upper case range (and
vice versa).

student@linux:~/test$ ls [A-Z]ile?
file1 file2 file3 File4
student@linux:~/test$ ls [a-z]ile?
file1 file2 file3 File4
student@linux:~/test$ echo $LANG
en_US.UTF-8
student@linux:~/test$ LANG=C
student@linux:~/test$ echo $LANG
C
student@linux:~/test$ ls [a-z]ile?
file1 file2 file3
student@linux:~/test$ ls [A-Z]ile?
File4
student@linux:~/test$

If `$LC_ALL` is set, then this will also need to be reset to prevent
file globbing.
## `*` asterisk

The asterisk `*` is interpreted by the shell as a sign to generate filenames, matching the asterisk to any combination of characters (even none). When no path is given, the shell will use filenames in the current directory. See the man page of `glob(7)` for more information.

```console
student@linux:~/gen$ ls
file1 file2 file3 File4 File55 FileA fileå fileab Fileab FileAB fileabc fileæ fileø filex filey filez
student@linux:~/gen$ ls File*
File4 File55 FileA Fileab FileAB
student@linux:~/gen$ ls file*
file1 file2 file3 fileå fileab fileabc fileæ fileø filex filey filez
student@linux:~/gen$ ls *ile55
File55
student@linux:~/gen$ ls F*ile55
File55
student@linux:~/gen$ ls F*55
File55
```

## `?` question mark

Similar to the asterisk, the question mark `?` is interpreted by the shell as a sign to generate filenames, matching the question mark with exactly one character.

```console
student@linux:~/gen$ ls File?
File4 FileA
student@linux:~/gen$ ls Fil?4
File4
student@linux:~/gen$ ls Fil??
File4 FileA
student@linux:~/gen$ ls File??
File55 Fileab FileAB
```

## `[]` square brackets

The square bracket `[` is interpreted by the shell as a sign to generate filenames, matching any of the characters between `[` and the first subsequent `]`. The order in this list between the brackets is not important. Each pair of brackets is replaced by exactly one character.

```console
student@linux:~/gen$ ls File[5A]
FileA
student@linux:~/gen$ ls File[A5]3
ls: cannot access 'File[A5]3': No such file or directory
student@linux:~/gen$ ls File[A5]
FileA
student@linux:~/gen$ ls File[A5][5b]
File55
student@linux:~/gen$ ls File[a5][5b]
File55 Fileab
student@linux:~/gen$ ls File[a5][5b][abcdefghijklm]
ls: cannot access 'File[a5][5b][abcdefghijklm]': No such file or directory
student@linux:~/gen$ ls file[a5][5b][abcdefghijklm]
fileabc
```

You can also exclude characters from a list between square brackets with the exclamation mark `!`. And you are allowed to make combinations of these *wildcards*.

```console
student@linux:~/gen$ ls file[a5][!Z]
fileab
student@linux:~/gen$ ls file[!5]*
file1 file2 file3 fileå fileab fileabc fileæ fileø filex filey filez
student@linux:~/gen$ ls file[!5]?
fileab
```

## `a-z` and `0-9` ranges

The bash shell will also understand ranges of characters between brackets.

```console
student@linux:~/gen$ ls file[a-z]*
fileab fileabc filex filey filez
student@linux:~/gen$ ls file[0-9]
file1 file2 file3
student@linux:~/gen$ ls file[a-z][a-z][0-9]*
ls: cannot access 'file[a-z][a-z][0-9]*': No such file or directory
student@linux:~/gen$ ls file[a-z][a-z][a-z]*
fileabc
```

## named character classes

Instead of ranges, you can also specify named character classes: `[[:alnum:]]`,, `[[:alpha:]]`, `[[:blank:]]`, `[[:cntrl:]]`, `[[:digit:]]`, `[[:graph:]]`, `[[:lower:]]`, `[[:print:]]`, `[[:punct:]]`, `[[:space:]]`, `[[:upper:]]`, `[[:xdigit:]]`. Instead of, e.g. `[a-z]`, you can also use `[[:lower:]]`.

```console
student@linux:~/gen$ ls file[a-z]*
fileab fileabc filex filey filez
student@linux:~/gen$ ls file[[:lower:]]*
fileå fileab fileabc fileæ fileø filex filey filez
```

Remark that the named character classes work better for international characters. In the example above, `[a-z]` does not match the Danish characters `æ`, `ø`, and `å`, but `[[:lower:]]` does.

## `$LANG` and square brackets

But, don't forget the influence of the `$LANG` variable. Depending on the selected language or locale, the shell will interpret the square brackets and named character classes differently. Sort order may also be affected.

For example, when we select the default locale called `C`:

```console
student@linux:~/gen$ sudo localectl set-locale C
[... log out and log in again ...]
student@linux:~/gen$ echo $LANG
C
student@linux:~/gen$ ls
File4 File55 FileA FileAB Fileab file1 file2 file3 fileab fileabc filex filey filez 'file'$'\303\245' 'file'$'\303\246' 'file'$'\303\270'
student@linux:~/gen$ ls file[[:lower:]]*
fileab fileabc filex filey filez
```

The Danish characters can't be displayed properly and don't match the `[[:lower:]]` character class.

Let us change the locale to `da_DK.UTF-8` (Danish/Denmark with UTF-8 support) and see what happens:

```console
student@linux:~/gen$ sudo localectl set-locale da_DK.UTF-8
[... log out and log in again ...]
student@linux:~/gen$ echo $LANG
da_DK.UTF-8
student@linux:~/gen$ ls
file1 file2 file3 File4 File55 FileA FileAB Fileab fileab fileabc filex filey filez fileæ fileø fileå
student@linux:~/gen$ ls file[[:lower:]]*
fileab fileabc filex filey filez fileæ fileø fileå
```

Now the Danish characters are displayed properly and match the `[[:lower:]]` character class.

In the `en_US.UTF-8` locale (US English, with UTF-8 support), the Danish characters are displayed properly, and also match the `[[:lower:]]` character class. However, they are sorted differently:

```console
student@linux:~/gen$ sudo localectl set-locale en_US.UTF-8
[... log out and log in again ...]
student@linux:~/gen$ echo $LANG
en_US.UTF-8
student@linux:~/gen$ ls
file1 file2 file3 File4 File55 FileA fileå fileab Fileab FileAB fileabc fileæ fileø filex filey filez
student@linux:~/gen$ ls file[[:lower:]]*
fileå fileab fileabc fileæ fileø filex filey filez
```

## preventing file globbing

The screenshot below should be no surprise. The `echo *`
will echo a \* when in an empty directory. And it will echo the names of
all files when the directory is not empty.

student@linux:~$ mkdir test42
student@linux:~$ cd test42
student@linux:~/test42$ echo *
*
student@linux:~/test42$ touch file42 file33
student@linux:~/test42$ echo *
file33 file42

Globbing can be prevented using quotes or by escaping the
special characters, as shown in this screenshot.

student@linux:~/test42$ echo *
file33 file42
student@linux:~/test42$ echo \*
*
student@linux:~/test42$ echo '*'
*
student@linux:~/test42$ echo "*"
*
If a wildcard pattern does not match any filenames, the shell will not expand the pattern. Consequently, when in an empty directory, `echo *` will display a `*`. It will echo the names of all files when the directory is not empty.

```console
student@linux:~$ mkdir test42
student@linux:~$ cd test42/
student@linux:~/test42$ echo *
*
student@linux:~/test42$ touch test{1,2,3}
student@linux:~/test42$ echo *
test1 test2 test3
```

Globbing can be prevented using quotes or by escaping the special characters, as shown in this screenshot.

```console
student@linux:~/test42$ echo *
test1 test2 test3
student@linux:~/test42$ echo \*
*
student@linux:~/test42$ echo '*'
*
student@linux:~/test42$ echo "*"
*
```

Loading

0 comments on commit 6de5c67

Please sign in to comment.