Skip to content

Commit

Permalink
Regular expression concept with exercise
Browse files Browse the repository at this point in the history
Fixes #2037
* Concept: regular expressions (in Go)
* Concept exercise

The first four tasks in the concept exercise are based on tasks in the corresponding C# exercise.
  • Loading branch information
norbs57 committed Apr 7, 2022
1 parent c086f44 commit f1f3282
Show file tree
Hide file tree
Showing 14 changed files with 833 additions and 0 deletions.
5 changes: 5 additions & 0 deletions concepts/regular-expressions/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"blurb": "Regular expressions",
"authors": ["norbs57"],
"contributors": []
}
125 changes: 125 additions & 0 deletions concepts/regular-expressions/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# About

Package [regexp][package-regexp] offers support for regular expressions in Go.

## Syntax

The [syntax][regexp-syntax] of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages.
More precisely, it is the [syntax accepted by RE2][re2-syntax] except for `\C`.

All characters are UTF-8-encoded code points.
Following `utf8.DecodeRune`, each byte of an invalid UTF-8 sequence is treated as if it encoded `utf8.RuneError (U+FFFD)`.

It is convenient to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks.
This avoids having to quote the backslashes.

## Type `RegExp`

The package defines a type `Regexp` for compiled regular expressions.
Function `regexp.Compile` compiles a string into a regular expression.
The function returns `nil` and an error if compilation failed:

```go
re, err := regexp.Compile(`(a|b)+`)
fmt.Println(re, err) // => (a|b)+ <nil>
re, err = regexp.Compile(`a|b)+`)
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+`
```

Function `MustCompile` is like `Compile` but panics if the expression cannot be parsed.
It simplifies safe initialization of global variables holding compiled regular expressions.

There are 16 methods of `Regexp` that match a regular expression and identify the matched text.
Their names are matched by this regular expression:

```text
Find(All)?(String)?(Submatch)?(Index)?
```

* If `All` is present, the routine matches successive non-overlapping matches of the entire expressions.
* If `String` is present, the argument is a string; otherwise it is a slice of bytes; return values are adjusted as appropriate.
* If `Submatch` is present, the return value is a slice identifying the successive submatches of the expression.
* If `Index` is present, matches and submatches are identified by byte index pairs within the input string.

There are also methods for:

* replacing matches of regular expressions with replacement strings and
* splitting of strings separated by regular expressions.

All-in-all, the `regexp` package defines more than 40 functions and methods.
We will demonstrate the use of a few methods below.
Please see the [API documentation][package-regexp] for details of these and other function

## Examples

Method `MatchString` reports whether a strings contains any match of a regular expression.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
b = re.MatchString("[a12]") // => true
b = re.MatchString("12abc34(ef)") // => true
b = re.MatchString(" abc!") // => true
b = re.MatchString("123 456") // => false
```

Method `FindString` returns a string holding the text of the leftmost match of the regular expression.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.FindString("[a12]") // => "a12"
s = re.FindString("12abc34(ef)") // => "abc34"
s = re.FindString(" abc!") // => "abc"
s = re.FindString("123 456") // => ""
```

Method `FindStringSubmatch` returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions.
This can be used to identify the strings matching capturing groups.
A return value of `nil` indicates no match.

```go
re = regexp.MustCompile(`[a-z]+(\d*)`)
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"}
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"}
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""}
sl = re.FindStringSubmatch("123 456") // => <nil>
```

Method `re.ReplaceAllString(src,repl)` returns a copy of `src`, replacing matches of the regular expression `re` with the replacement string `repl`.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.ReplaceAllString("[a12]", "X") // => "[X]"
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)"
s = re.ReplaceAllString(" abc!", "X") // => " X!"
s = re.ReplaceAllString("123 456", "X") // => "123 456"
```

Method `re.Split(s,n)` slices a text `s` into substrings separated by the expression and returns a slice of the substrings between those expression matches.
The count `n` determines the maximal number of substrings to return.
If `n<0`, the method returns all substrings.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
sl = re.Split("[a12]", -1) // => []string{"[","]"}
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"}
sl = re.Split(" abc!", -1) // => []string{" ","!"}
sl = re.Split("123 456", -1) // => []string{"123 456"}
```

## Performance

The regexp implementation provided by this package is guaranteed to run in
[time linear in the size of the input][re2-performance].

## Caveat

As mentioned earlier, the `regexp` package implements RE2 regular expressions (except for `\C`).
The syntax is largely compatible with PCRE ("Perl Compatible Regular Expression."), but there are some differences.
Please see the "Caveat section" in [this article][reg-exp-wild] for details.

[raw-string-literals]:https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings
[package-regexp]:https://pkg.go.dev/regexp
[regexp-syntax]:https://pkg.go.dev/regexp/syntax
[re2-syntax]: https://golang.org/s/re2syntax
[reg-exp-wild]: https://swtch.com/~rsc/regexp/regexp3.html
[re2-performance]: https://swtch.com/~rsc/regexp/regexp1.html
83 changes: 83 additions & 0 deletions concepts/regular-expressions/introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Introduction

Package [regexp][package-regexp] offers support for regular expressions in Go.
The [syntax][regexp-syntax] of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages.

The package defines a type `Regexp` for compiled regular expressions.
Function `regexp.Compile` compiles a string into a regular expression.
The function returns `nil` and an error if compilation failed:

```go
re, err := regexp.Compile(`(a|b)+`)
fmt.Println(re, err) // => (a|b)+ <nil>
re, err = regexp.Compile(`a|b)+`)
fmt.Println(re, err) // => <nil> error parsing regexp: unexpected ): `a|b)+`
```

Function `MustCompile` is like `Compile` but panics if the expression cannot be parsed.
It simplifies safe initialization of global variables holding compiled regular expressions.

It is convenient to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks.
This avoids having to quote the backslashes.

The `regexp` package defines more than 40 functions and methods.
We will demonstrate the use of a few methods below.
Please see the [API documentation][package-regexp] for details of these and other functions.

Method `MatchString` reports whether a strings contains any match of a regular expression.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
b = re.MatchString("[a12]") // => true
b = re.MatchString("12abc34(ef)") // => true
b = re.MatchString(" abc!") // => true
b = re.MatchString("123 456") // => false
```

Method `FindString` returns a string holding the text of the leftmost match of the regular expression.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.FindString("[a12]") // => "a12"
s = re.FindString("12abc34(ef)") // => "abc34"
s = re.FindString(" abc!") // => "abc"
s = re.FindString("123 456") // => ""
```

Method `FindStringSubmatch` returns a slice of strings holding the text of the leftmost match of the regular expression and the matches, if any, of its subexpressions.
This can be used to identify the strings matching capturing groups.
A return value of `nil` indicates no match.

```go
re = regexp.MustCompile(`[a-z]+(\d*)`)
sl = re.FindStringSubmatch("[a12]") // => []string{"a12","12"}
sl = re.FindStringSubmatch("12abc34(ef)") // => []string{"abc34","34"}
sl = re.FindStringSubmatch(" abc!") // => []string{"abc",""}
sl = re.FindStringSubmatch("123 456") // => <nil>
```

Method `re.ReplaceAllString(src,repl)` returns a copy of `src`, replacing matches of the regular expression `re` with the replacement string `repl`.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
s = re.ReplaceAllString("[a12]", "X") // => "[X]"
s = re.ReplaceAllString("12abc34(ef)", "X") // => "12X(X)"
s = re.ReplaceAllString(" abc!", "X") // => " X!"
s = re.ReplaceAllString("123 456", "X") // => "123 456"
```

Method `re.Split(s,n)` slices a text `s` into substrings separated by the expression and returns a slice of the substrings between those expression matches.
The count `n` determines the maximal number of substrings to return.
If `n<0`, the method returns all substrings.

```go
re = regexp.MustCompile(`[a-z]+\d*`)
sl = re.Split("[a12]", -1) // => []string{"[","]"}
sl = re.Split("12abc34(ef)", 2) // => []string{"12","(ef)"}
sl = re.Split(" abc!", -1) // => []string{" ","!"}
sl = re.Split("123 456", -1) // => []string{"123 456"}
```

[raw-string-literals]: https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings
[package-regexp]: https://pkg.go.dev/regexp
[regexp-syntax]: https://pkg.go.dev/regexp/syntax
26 changes: 26 additions & 0 deletions concepts/regular-expressions/links.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[
{
"url": "https://pkg.go.dev/regexp",
"description": "Go standard library: package regexp"
},
{
"url": "https://pkg.go.dev/regexp/syntax",
"description": "Go standard library: package regexp/syntax"
},
{
"url": "https://regex101.com/",
"description": "regex101.com: online regular expression tester"
},
{
"url": "https://en.wikipedia.org/wiki/Regular_expression",
"description": "Wikipedia: regular expressions"
},
{
"url": "https://gobyexample.com/regular-expressions",
"description": "Go by example: regular expressions"
},
{
"url":"https://yourbasic.org/golang/regexp-cheat-sheet",
"description": "Youbasic.org: Regexp tutorial and cheat sheet"
}
]
19 changes: 19 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,20 @@
"slices"
],
"status": "beta"
},
{
"slug": "parsing-log-files",
"name": "Parsing Log Files",
"uuid": "cb5de717-6c7c-4bf7-85ac-e5658bd84bf5",
"concepts": [
"regular-expressions"
],
"prerequisites": [
"runes",
"methods",
"slices"
],
"status": "beta"
}
],
"practice": [
Expand Down Expand Up @@ -2022,6 +2036,11 @@
"slug": "randomness",
"uuid": "5d354412-a87d-458b-af6c-fadfa58cfdc3"
},
{
"name": "Regular Expressions",
"slug": "regular-expressions",
"uuid": "fa43f573-0dcb-47cf-a006-8d464112f7db"
},
{
"name": "Runes",
"slug": "runes",
Expand Down
38 changes: 38 additions & 0 deletions exercises/concept/parsing-log-files/.docs/hints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# General

- API documentation: [package regexp][package-regexp]
- Regular expression syntax: [regexp/syntax][regexp-syntax]
- Website [regex101.com][regex101] has an online regular expression tester.
- It is recommended to write regular expressions as [raw string literals][raw-string-literals] enclosed by backticks.

## 1. Identify garbled log lines

- Function [regexp.MatchString][fun-match-string] or method [MatchString][method-match-string] could be useful here.

## 2. Split the log line

- Method [Split][regexp-split] could be useful here.

## 3. Count the number of lines containing `password` in quoted text

- You can make expression matching case sensitive by prefixing the regular expression with `(?i)`.
This will set the `i` flag, see for example [yourbasic.org][yourbasic-i-flag].

## 4. Remove artifacts from log

- Method [ReplaceAllString][replace-all-string] could be useful here.

## 5. Tag lines with user names

- Method [FindStringSubmatch][find-string-submatch] could be useful here.

[raw-string-literals]: https://yourbasic.org/golang/regexp-cheat-sheet/#raw-strings
[package-regexp]: https://pkg.go.dev/regexp
[regexp-syntax]: https://pkg.go.dev/regexp/syntax
[regex101]: https://regex101.com/
[fun-re-match-string]: https://pkg.go.dev/regexp#MatchString
[method-match-string]: https://pkg.go.dev/regexp#Regexp.MatchString
[regexp-split]: https://pkg.go.dev/regexp#Regexp.Split
[yourbasic-i-flag]: https://yourbasic.org/golang/regexp-cheat-sheet/#case-insensitive-and-multiline-matches
[replace-all-string]: https://pkg.go.dev/regexp#Regexp.ReplaceAllString
[find-string-submatch]: https://pkg.go.dev/regexp#Regexp.FindStringSubmatch
Loading

0 comments on commit f1f3282

Please sign in to comment.