Skip to content

Commit

Permalink
[extension/filestorage] remove feature flag `extension.filestorage.re…
Browse files Browse the repository at this point in the history
…placeUnsafeCharacters` (#33236)

**Description:**

Removes the feature flag according to schedule.

**Link to tracking Issue:**

-
#3148

**Testing:**

No changes.

**Documentation:**

Replaced the description of the feature gate with the description of the
feature.
  • Loading branch information
andrzej-stencel authored May 31, 2024
1 parent 4dd718b commit 2b1c4a6
Show file tree
Hide file tree
Showing 5 changed files with 39 additions and 51 deletions.
27 changes: 27 additions & 0 deletions .chloggen/remove-file-storage-feature-gate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: extension/filestorage

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Replace path-unsafe characters in component names

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [3148]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: The feature gate `extension.filestorage.replaceUnsafeCharacters` is now removed.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
41 changes: 10 additions & 31 deletions extension/storage/filestorage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,44 +100,23 @@ exporters:
nop:
```

## Feature Gates
## Replacing unsafe characters in component names

See the [Collector feature gates](https://github.com/open-telemetry/opentelemetry-collector/blob/main/featuregate/README.md#collector-feature-gates) for an overview of feature gates in the collector.
The extension uses the type and name of the component using the extension to create a file where the component's data is stored.
For example, if a Filelog receiver named `filelog/logs` uses the extension, its data is stored in a file named `receiver_filelog_logs`.

### `extension.filestorage.replaceUnsafeCharacters`
Sometimes the component name contains characters that either have special meaning in paths - like `/` - or are problematic or even forbidden in file names (depending on the host operating system), like `?` or `|`.
To prevent surprising or erroneous behavior, some characters in the component names are replaced before creating the file name to store data by the extension.

When enabled, characters that are not safe in file names are replaced in component name using the extension before creating the file name to store data by the extension.
For example, for a Filelog receiver named `filelog/logs/container`, the component name `logs/container` is sanitized into `logs~007Econtainer` and the data is stored in a file named `receiver_filelog_logs~007Econtainer`.

For example, for a Filelog receiver named `filelog/logs/json`, the data is stored:

- in path `receiver_filelog_logs/json` with the feature flag disabled (note that this is a file named `json` inside directory named `receiver_filelog_logs`).
- in file `receiver_filelog_logs~007Ejson` with the feature flag enabled.

This replacement is done to prevent surprising behavior or errors in the File Storage extension.

The feature replaces all usafe characters with a tilde `~` and the character's [Unicode number][unicode_chars] in hex.
Every unsafe character is replaced with a tilde `~` and the character's [Unicode number][unicode_chars] in hex.
The only safe characters are: uppercase and lowercase ASCII letters `A-Z` and `a-z`, digits `0-9`, dot `.`, hyphen `-`, underscore `_`.

Changing the state of this feature gate may change the path to the file that the extension is writing component's data to. This may lead to loss of the data stored in the original path.

Before enabling this feature gate, ideally make sure that all component names that use the File Storage extension have names that only contain safe characters.
In case you want to keep using unsafe characters in your component names, you may want to rename the files used for storage before enabling this feature gate.
For example, `mv ./receiver_filelog_logs/json ./receiver_filelog_logs~007Ejson`.

For more details, see the following issues:

- [File storage extension - invalid file name characters must be encoded #3148](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148)
- [[filestorage] receiver name sanitization #20731](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20731)

The schedule for this feature gate is:

- Introduced in v0.87.0 (October 2023) as `alpha` - disabled by default.
- Moved to `beta` in v0.92.0 (January 2024) - enabled by default.
- Moved to `stable` in v0.99.0 (April 2024) - cannot be disabled.
- Removed in v0.102.0 (three releases after `stable`).
The tilde `~` character is also replaced even though it is a safe character, to make sure that the sanitized component name never overlaps with a component name that does not require sanitization.

[unicode_chars]: https://en.wikipedia.org/wiki/List_of_Unicode_characters


## Troubleshooting

_Currently, the File Storage extension uses [bbolt](https://github.com/etcd-io/bbolt) to store and read data on disk. The
Expand Down Expand Up @@ -165,4 +144,4 @@ default
file_input.knownFiles2
{"Fingerprint":{"first_bytes":"MzEwNzkKMjE5Cg=="},"Offset":10,"FileAttributes":{"log.file.name":"1.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:18.164331-07:00","LastDataLength":0}}
{"Fingerprint":{"first_bytes":"MjQ0MDMK"},"Offset":6,"FileAttributes":{"log.file.name":"2.log"},"HeaderFinalized":false,"FlushState":{"LastDataChange":"2024-03-20T18:16:39.96429-07:00","LastDataLength":0}}
```
```
16 changes: 2 additions & 14 deletions extension/storage/filestorage/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,19 +14,9 @@ import (
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension"
"go.opentelemetry.io/collector/extension/experimental/storage"
"go.opentelemetry.io/collector/featuregate"
"go.uber.org/zap"
)

var replaceUnsafeCharactersFeatureGate = featuregate.GlobalRegistry().MustRegister(
"extension.filestorage.replaceUnsafeCharacters",
featuregate.StageStable,
featuregate.WithRegisterDescription("When enabled, characters that are not safe in file paths are replaced in component name using the extension. For example, the data for component `filelog/logs/json` will be stored in file `receiver_filelog_logs~007Ejson` and not in `receiver_filelog_logs/json`."),
featuregate.WithRegisterReferenceURL("https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148"),
featuregate.WithRegisterFromVersion("v0.87.0"),
featuregate.WithRegisterToVersion("v0.102.0"),
)

type localFileStorage struct {
cfg *Config
logger *zap.Logger
Expand Down Expand Up @@ -66,9 +56,7 @@ func (lfs *localFileStorage) GetClient(_ context.Context, kind component.Kind, e
rawName = fmt.Sprintf("%s_%s_%s_%s", kindString(kind), ent.Type(), ent.Name(), name)
}

if replaceUnsafeCharactersFeatureGate.IsEnabled() {
rawName = sanitize(rawName)
}
rawName = sanitize(rawName)
absoluteName := filepath.Join(lfs.cfg.Directory, rawName)
client, err := newClient(lfs.logger, absoluteName, lfs.cfg.Timeout, lfs.cfg.Compaction, !lfs.cfg.FSync)

Expand Down Expand Up @@ -110,7 +98,7 @@ func sanitize(name string) string {
// https://en.wikipedia.org/wiki/List_of_Unicode_characters
// For example, the slash is replaced with "~002F", and the tilde itself is replaced with "~007E".
// We perform replacement on the tilde even though it is a safe character to make sure that the sanitized component name
// never overlaps with a component name that does not reqire sanitization.
// never overlaps with a component name that does not require sanitization.
var sanitized strings.Builder
for _, character := range name {
if isSafe(character) {
Expand Down
2 changes: 0 additions & 2 deletions extension/storage/filestorage/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ require (
go.opentelemetry.io/collector/component v0.101.1-0.20240529223953-eaab76e46d38
go.opentelemetry.io/collector/confmap v0.101.1-0.20240529223953-eaab76e46d38
go.opentelemetry.io/collector/extension v0.101.1-0.20240529223953-eaab76e46d38
go.opentelemetry.io/collector/featuregate v1.8.1-0.20240529223953-eaab76e46d38
go.opentelemetry.io/otel/metric v1.27.0
go.opentelemetry.io/otel/trace v1.27.0
go.uber.org/goleak v1.3.0
Expand All @@ -25,7 +24,6 @@ require (
github.com/go-logr/stdr v1.2.2 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/hashicorp/go-version v1.7.0 // indirect
github.com/knadh/koanf/maps v0.1.1 // indirect
github.com/knadh/koanf/providers/confmap v0.1.0 // indirect
github.com/knadh/koanf/v2 v2.1.1 // indirect
Expand Down
4 changes: 0 additions & 4 deletions extension/storage/filestorage/go.sum

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 2b1c4a6

Please sign in to comment.