Skip to content

Commit 45b4cb4

Browse files
authored
[pkg/stanza] Add 'regex_replace' operator (#37443)
#### Description We use the filelog receiver to ingest Artifactory logs. Some but not all of them contain ANSI color sequences. This makes regex creation pretty uncomfortable because both: log field separators and the ANSI color codes contain `[` characters. I was first thinking about using the `add` operator and removing the regex within the `EXPR()`. But I could not find any suitable regex function which works within that expression. I could imagine that I'm not the only one who would like to get rid of colors. Therefore I believe that an easy to use operator enhances readability of the pipeline and is less error-prone than a (potentially) hand-crafted regex. #### Testing Added unit tests and tested locally against Artifactory log files.
1 parent 6fa23c0 commit 45b4cb4

File tree

10 files changed

+790
-0
lines changed

10 files changed

+790
-0
lines changed

.chloggen/regex-replace.yaml

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: 'enhancement'
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: pkg/stanza
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: Add 'regex_replace' operator
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [37443]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext:
19+
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: []

pkg/stanza/adapter/register.go

+1
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import (
2626
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/move"
2727
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/noop"
2828
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/recombine"
29+
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/regexreplace"
2930
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/remove"
3031
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/retain"
3132
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/router"

pkg/stanza/docs/operators/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ General purpose:
4141
- [move](./move.md)
4242
- [noop](./noop.md)
4343
- [recombine](./recombine.md)
44+
- [regex_replace](./regex_replace.md)
4445
- [remove](./remove.md)
4546
- [retain](./retain.md)
4647
- [router](./router.md)
+142
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
## `regex_replace` operator
2+
3+
The `regex_replace` operator parses the string-typed field selected by `field` with the given user-defined or well-known regular expression.
4+
Optionally, it replaces the matched string.
5+
6+
#### Regex Syntax
7+
8+
This operator makes use of [Go regular expression](https://github.com/google/re2/wiki/Syntax). When writing a regex, consider using a tool such as [regex101](https://regex101.com/?flavor=golang).
9+
10+
### Configuration Fields
11+
12+
| Field | Default | Description |
13+
| --- | --- | --- |
14+
| `id` | `regex_replace` | A unique identifier for the operator. |
15+
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
16+
| `field` | required | The [field](../types/field.md) to strip. Must be a string. |
17+
| `regex` | `regex` or `regex_name` required | A [Go regular expression](https://github.com/google/re2/wiki/Syntax). |
18+
| `regex_name` | `regex` or `regex_name` required | A well-known regex to use. See below for a list of possible values. |
19+
| `replace_with` | optional | The [field](../types/field.md) to strip. Must be a string. |
20+
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
21+
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
22+
23+
#### Well-known regular expressions
24+
25+
| Name | Description |
26+
| --- | --- |
27+
| `ansi_control_sequences` | ANSI "Control Sequence Introducer (CSI)" escape codes starting with `ESC [` |
28+
29+
### Example Configurations
30+
31+
#### Collapse spaces
32+
33+
Configuration:
34+
```yaml
35+
- type: regex_replace
36+
regex: " +"
37+
replace_with: " "
38+
field: body
39+
```
40+
41+
<table>
42+
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
43+
<tr>
44+
<td>
45+
46+
```json
47+
{
48+
"resource": { },
49+
"attributes": { },
50+
"body": "Hello World"
51+
}
52+
```
53+
54+
</td>
55+
<td>
56+
57+
```json
58+
{
59+
"resource": { },
60+
"attributes": { },
61+
"body": "Hello World"
62+
}
63+
```
64+
65+
</td>
66+
</tr>
67+
</table>
68+
69+
#### Match and replace with groups
70+
71+
Configuration:
72+
```yaml
73+
- type: regex_replace
74+
regex: "{(.*)}"
75+
replace_with: "${1}"
76+
field: body
77+
```
78+
79+
<table>
80+
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
81+
<tr>
82+
<td>
83+
84+
```json
85+
{
86+
"resource": { },
87+
"attributes": { },
88+
"body": "{a}{bb}{ccc}"
89+
}
90+
```
91+
92+
</td>
93+
<td>
94+
95+
```json
96+
{
97+
"resource": { },
98+
"attributes": { },
99+
"body": "abbccc"
100+
}
101+
```
102+
103+
</td>
104+
</tr>
105+
</table>
106+
107+
#### Remove all ANSI color escape codes from the body
108+
109+
Configuration:
110+
```yaml
111+
- type: regex_replace
112+
regex_name: ansi_control_sequences
113+
field: body
114+
```
115+
116+
<table>
117+
<tr><td> Input Entry </td> <td> Output Entry </td></tr>
118+
<tr>
119+
<td>
120+
121+
```json
122+
{
123+
"resource": { },
124+
"attributes": { },
125+
"body": "\x1b[31mred\x1b[0m"
126+
}
127+
```
128+
129+
</td>
130+
<td>
131+
132+
```json
133+
{
134+
"resource": { },
135+
"attributes": { },
136+
"body": "red"
137+
}
138+
```
139+
140+
</td>
141+
</tr>
142+
</table>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
// Copyright The OpenTelemetry Authors
2+
// SPDX-License-Identifier: Apache-2.0
3+
4+
package regexreplace // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/transformer/regexreplace"
5+
6+
import (
7+
"fmt"
8+
"regexp"
9+
10+
"go.opentelemetry.io/collector/component"
11+
12+
"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/entry"
13+
"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator"
14+
"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper"
15+
)
16+
17+
const operatorType = "regex_replace"
18+
19+
// derived from https://en.wikipedia.org/wiki/ANSI_escape_code#CSIsection
20+
var ansiCsiEscapeRegex = regexp.MustCompile(`\x1B\[[\x30-\x3F]*[\x20-\x2F]*[\x40-\x7E]`)
21+
22+
func init() {
23+
operator.Register(operatorType, func() operator.Builder { return NewConfig() })
24+
}
25+
26+
// NewConfig creates a new ansi_control_sequences config with default values
27+
func NewConfig() *Config {
28+
return NewConfigWithID(operatorType)
29+
}
30+
31+
// NewConfigWithID creates a new ansi_control_sequences config with default values
32+
func NewConfigWithID(operatorID string) *Config {
33+
return &Config{
34+
TransformerConfig: helper.NewTransformerConfig(operatorID, operatorType),
35+
}
36+
}
37+
38+
// Config is the configuration of an ansi_control_sequences operator.
39+
type Config struct {
40+
helper.TransformerConfig `mapstructure:",squash"`
41+
RegexName string `mapstructure:"regex_name"`
42+
Regex string `mapstructure:"regex"`
43+
ReplaceWith string `mapstructure:"replace_with"`
44+
Field entry.Field `mapstructure:"field"`
45+
}
46+
47+
func (c *Config) getRegexp() (*regexp.Regexp, error) {
48+
if (c.RegexName == "") == (c.Regex == "") {
49+
return nil, fmt.Errorf("either regex or regex_name must be set")
50+
}
51+
52+
switch c.RegexName {
53+
case "ansi_control_sequences":
54+
return ansiCsiEscapeRegex, nil
55+
case "":
56+
return regexp.Compile(c.Regex)
57+
default:
58+
return nil, fmt.Errorf("regex_name %s is unknown", c.RegexName)
59+
}
60+
}
61+
62+
// Build will build an ansi_control_sequences operator.
63+
func (c Config) Build(set component.TelemetrySettings) (operator.Operator, error) {
64+
transformerOperator, err := c.TransformerConfig.Build(set)
65+
if err != nil {
66+
return nil, err
67+
}
68+
69+
regexp, err := c.getRegexp()
70+
if err != nil {
71+
return nil, err
72+
}
73+
74+
return &Transformer{
75+
TransformerOperator: transformerOperator,
76+
field: c.Field,
77+
regexp: regexp,
78+
replaceWith: c.ReplaceWith,
79+
}, nil
80+
}

0 commit comments

Comments
 (0)