-
-
Notifications
You must be signed in to change notification settings - Fork 657
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Regular expression concept with exercise
Fixes #2037 * Concept: regular expressions (in Go) * Concept exercise The first four tasks in the concept exercise are based on tasks in the corresponding C# exercise.
- Loading branch information
Showing
14 changed files
with
833 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"blurb": "Regular expressions", | ||
"authors": ["norbs57"], | ||
"contributors": [] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
# About | ||
|
||
Package [regexp][package-regexp] offers support for regular expressions in Go. | ||
|
||
## Syntax | ||
|
||
The [syntax][regexp-syntax] of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. | ||
More precisely, it is the [syntax accepted by RE2][re2-syntax] except for `\C`. | ||
|
||
All characters are UTF-8-encoded code points. | ||
Following `utf8.DecodeRune`, each byte of an invalid UTF-8 sequence is treated as if it encoded `utf8.RuneError (U+FFFD)`. | ||
|
||
It is convenient to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks. | ||
This avoids having to quote the backslashes. | ||
|
||
## Type `RegExp` | ||
|
||
The package defines a type `Regexp` for compiled regular expressions. | ||
Function `regexp.Compile` compiles a string into a regular expression. | ||
The function returns `nil` and an error if compilation failed: | ||
|
||
```go | ||
re, err := regexp.Compile(`(a|b)+`) | ||
fmt.Println(re, err) // => (a|b)+ <nil> | ||
re, err = regexp.Compile(`a|b)+`) | ||
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+` | ||
``` | ||
|
||
Function `MustCompile` is like `Compile` but panics if the expression cannot be parsed. | ||
It simplifies safe initialization of global variables holding compiled regular expressions. | ||
|
||
There are 16 methods of `Regexp` that match a regular expression and identify the matched text. | ||
Their names are matched by this regular expression: | ||
|
||
```text | ||
Find(All)?(String)?(Submatch)?(Index)? | ||
``` | ||
|
||
* If `All` is present, the routine matches successive non-overlapping matches of the entire expressions. | ||
* If `String` is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate. | ||
* If `Submatch` is present, the return value is a slice identifying the successive submatches of the expression. | ||
* If `Index` is present, matches and submatches are identified by byte index pairs within the input string. | ||
|
||
There are also methods for: | ||
|
||
* replacing matches of regular expressions with replacement strings and | ||
* splitting of strings separated by regular expressions. | ||
|
||
All-in-all, the `regexp` package defines more than 40 functions and methods. | ||
We will demonstrate the use of a few methods below. | ||
Please see the [API documentation][package-regexp] for details of these and other function | ||
|
||
## Examples | ||
|
||
Method `MatchString` reports whether a strings contains any match of a regular expression. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
b = re.MatchString("[a12]") // => true | ||
b = re.MatchString("12abc34(ef)") // => true | ||
b = re.MatchString(" abc!") // => true | ||
b = re.MatchString("123 456") // => false | ||
``` | ||
|
||
Method `FindString` returns a string holding the text of the leftmost match of the regular expression. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
s = re.FindString("[a12]") // => "a12" | ||
s = re.FindString("12abc34(ef)") // => "abc34" | ||
s = re.FindString(" abc!") // => "abc" | ||
s = re.FindString("123 456") // => "" | ||
``` | ||
|
||
Method `FindStringSubmatch` returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions. | ||
This can be used to identify the strings matching capturing groups. | ||
A return value of `nil` indicates no match. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+(\d*)`) | ||
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"} | ||
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"} | ||
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""} | ||
sl = re.FindStringSubmatch("123 456") // => <nil> | ||
``` | ||
|
||
Method `re.ReplaceAllString(src,repl)` returns a copy of `src`, replacing matches of the regular expression `re` with the replacement string `repl`. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
s = re.ReplaceAllString("[a12]", "X") // => "[X]" | ||
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)" | ||
s = re.ReplaceAllString(" abc!", "X") // => " X!" | ||
s = re.ReplaceAllString("123 456", "X") // => "123 456" | ||
``` | ||
|
||
Method `re.Split(s,n)` slices a text `s` into substrings separated by the expression and returns a slice of the substrings between those expression matches. | ||
The count `n` determines the maximal number of substrings to return. | ||
If `n<0`, the method returns all substrings. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
sl = re.Split("[a12]", -1) // => []string{"[","]"} | ||
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"} | ||
sl = re.Split(" abc!", -1) // => []string{" ","!"} | ||
sl = re.Split("123 456", -1) // => []string{"123 456"} | ||
``` | ||
|
||
## Performance | ||
|
||
The regexp implementation provided by this package is guaranteed to run in | ||
[time linear in the size of the input][re2-performance]. | ||
|
||
## Caveat | ||
|
||
As mentioned earlier, the `regexp` package implements RE2 regular expressions (except for `\C`). | ||
The syntax is largely compatible with PCRE ("Perl Compatible Regular Expression."), but there are some differences. | ||
Please see the "Caveat section" in [this article][reg-exp-wild] for details. | ||
|
||
[raw-string-literals]:https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings | ||
[package-regexp]:https://pkg.go.dev/regexp | ||
[regexp-syntax]:https://pkg.go.dev/regexp/syntax | ||
[re2-syntax]: https://golang.org/s/re2syntax | ||
[reg-exp-wild]: https://swtch.com/~rsc/regexp/regexp3.html | ||
[re2-performance]: https://swtch.com/~rsc/regexp/regexp1.html |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Introduction | ||
|
||
Package [regexp][package-regexp] offers support for regular expressions in Go. | ||
The [syntax][regexp-syntax] of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. | ||
|
||
The package defines a type `Regexp` for compiled regular expressions. | ||
Function `regexp.Compile` compiles a string into a regular expression. | ||
The function returns `nil` and an error if compilation failed: | ||
|
||
```go | ||
re, err := regexp.Compile(`(a|b)+`) | ||
fmt.Println(re, err) // => (a|b)+ <nil> | ||
re, err = regexp.Compile(`a|b)+`) | ||
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+` | ||
``` | ||
|
||
Function `MustCompile` is like `Compile` but panics if the expression cannot be parsed. | ||
It simplifies safe initialization of global variables holding compiled regular expressions. | ||
|
||
It is convenient to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks. | ||
This avoids having to quote the backslashes. | ||
|
||
The `regexp` package defines more than 40 functions and methods. | ||
We will demonstrate the use of a few methods below. | ||
Please see the [API documentation][package-regexp] for details of these and other functions. | ||
|
||
Method `MatchString` reports whether a strings contains any match of a regular expression. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
b = re.MatchString("[a12]") // => true | ||
b = re.MatchString("12abc34(ef)") // => true | ||
b = re.MatchString(" abc!") // => true | ||
b = re.MatchString("123 456") // => false | ||
``` | ||
|
||
Method `FindString` returns a string holding the text of the leftmost match of the regular expression. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
s = re.FindString("[a12]") // => "a12" | ||
s = re.FindString("12abc34(ef)") // => "abc34" | ||
s = re.FindString(" abc!") // => "abc" | ||
s = re.FindString("123 456") // => "" | ||
``` | ||
|
||
Method `FindStringSubmatch` returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions. | ||
This can be used to identify the strings matching capturing groups. | ||
A return value of `nil` indicates no match. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+(\d*)`) | ||
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"} | ||
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"} | ||
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""} | ||
sl = re.FindStringSubmatch("123 456") // => <nil> | ||
``` | ||
|
||
Method `re.ReplaceAllString(src,repl)` returns a copy of `src`, replacing matches of the regular expression `re` with the replacement string `repl`. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
s = re.ReplaceAllString("[a12]", "X") // => "[X]" | ||
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)" | ||
s = re.ReplaceAllString(" abc!", "X") // => " X!" | ||
s = re.ReplaceAllString("123 456", "X") // => "123 456" | ||
``` | ||
|
||
Method `re.Split(s,n)` slices a text `s` into substrings separated by the expression and returns a slice of the substrings between those expression matches. | ||
The count `n` determines the maximal number of substrings to return. | ||
If `n<0`, the method returns all substrings. | ||
|
||
```go | ||
re = regexp.MustCompile(`[a-z]+\d*`) | ||
sl = re.Split("[a12]", -1) // => []string{"[","]"} | ||
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"} | ||
sl = re.Split(" abc!", -1) // => []string{" ","!"} | ||
sl = re.Split("123 456", -1) // => []string{"123 456"} | ||
``` | ||
|
||
[raw-string-literals]: https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings | ||
[package-regexp]: https://pkg.go.dev/regexp | ||
[regexp-syntax]: https://pkg.go.dev/regexp/syntax |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
[ | ||
{ | ||
"url": "https://pkg.go.dev/regexp", | ||
"description": "Go standard library: package regexp" | ||
}, | ||
{ | ||
"url": "https://pkg.go.dev/regexp/syntax", | ||
"description": "Go standard library: package regexp/syntax" | ||
}, | ||
{ | ||
"url": "https://regex101.com/", | ||
"description": "regex101.com: online regular expression tester" | ||
}, | ||
{ | ||
"url": "https://en.wikipedia.org/wiki/Regular_expression", | ||
"description": "Wikipedia: regular expressions" | ||
}, | ||
{ | ||
"url": "https://gobyexample.com/regular-expressions", | ||
"description": "Go by example: regular expressions" | ||
}, | ||
{ | ||
"url":"https://yourbasic.org/golang/regexp-cheat-sheet", | ||
"description": "Youbasic.org: Regexp tutorial and cheat sheet" | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# General | ||
|
||
- API documentation: [package regexp][package-regexp] | ||
- Regular expression syntax: [regexp/syntax][regexp-syntax] | ||
- Website [regex101.com][regex101] has an online regular expression tester. | ||
- It is recommended to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks. | ||
|
||
## 1. Identify garbled log lines | ||
|
||
- Function [regexp.MatchString][fun-match-string] or method [MatchString][method-match-string] could be useful here. | ||
|
||
## 2. Split the log line | ||
|
||
- Method [Split][regexp-split] could be useful here. | ||
|
||
## 3. Count the number of lines containing `password` in quoted text | ||
|
||
- You can make expression matching case sensitive by prefixing the regular expression with `(?i)`. | ||
This will set the `i` flag, see for example [yourbasic.org][yourbasic-i-flag]. | ||
|
||
## 4. Remove artifacts from log | ||
|
||
- Method [ReplaceAllString][replace-all-string] could be useful here. | ||
|
||
## 5. Tag lines with user names | ||
|
||
- Method [FindStringSubmatch][find-string-submatch] could be useful here. | ||
|
||
[raw-string-literals]: https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings | ||
[package-regexp]: https://pkg.go.dev/regexp | ||
[regexp-syntax]: https://pkg.go.dev/regexp/syntax | ||
[regex101]: https://regex101.com/ | ||
[fun-re-match-string]: https://pkg.go.dev/regexp#MatchString | ||
[method-match-string]: https://pkg.go.dev/regexp#Regexp.MatchString | ||
[regexp-split]: https://pkg.go.dev/regexp#Regexp.Split | ||
[yourbasic-i-flag]: https://yourbasic.org/golang/regexp-cheat-sheet/#case-insensitive-and-multiline-matches | ||
[replace-all-string]: https://pkg.go.dev/regexp#Regexp.ReplaceAllString | ||
[find-string-submatch]: https://pkg.go.dev/regexp#Regexp.FindStringSubmatch |
Oops, something went wrong.