Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File source has suboptimial performance degredation as line sizes increase #6730

Open
blt opened this issue Mar 11, 2021 · 0 comments
Open
Labels
domain: performance Anything related to Vector's performance source: file Anything `file` source related type: bug A code related bug.

Comments

@blt
Copy link
Contributor

blt commented Mar 11, 2021

Vector Version

Custom build of 0327d46. Build string:

RUSTFLAGS="-g" cargo build --no-default-features --features "sources-file,sources-internal_metrics,sinks-prometheus,sinks-blackhole" --release 

Vector Configuration File

data_dir = "data"

[sources.internal]
type = "internal_metrics"

[sources.file]
type = "file"
max_line_bytes = 102400
fingerprint.strategy = "device_and_inode"
include = ["logs/*.log"]

[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["internal"]
address = "0.0.0.0:9001"

[sinks.blackhole]
type = "blackhole"
inputs = ["file"]

Debug Output

Expected Behavior

The file source as of the above commit demonstrates a sub-optimal throughput drop-off when line sizes go beyond 100Kb. Before this point the source is capable of processing at least 25Mb/s -- see vectordotdev/vector-test-harness#74 for details -- but after this performance drops off significantly all the way to 900Kb/s.

Actual Behavior

vectordotdev/vector-test-harness#74 (comment)

This will cause vector to hold onto file descriptors, as a result. This fills up user disks, given enough time.

Example Data

See referenced issue.

Additional Context

See referenced issue.

References

@blt blt added type: bug A code related bug. source: file Anything `file` source related domain: performance Anything related to Vector's performance labels Mar 11, 2021
@blt blt self-assigned this Mar 11, 2021
blt added a commit that referenced this issue Mar 16, 2021
* Begin rework of `file-source`

This commit is the start of a process to address #6730. The major changes
introduced in this commit so far are application of clippy suggestions and model
checks of the `read_until_with_max_size` test. I have a pretty good idea of how
that function works now and will be introducing benchmarks.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Clippy fixes

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Introduce benchmark

In this commit I have introduce a benchmark for the read_until etc
function. To do this I've had to make it part of the public API of the crate,
but since the crate sits inside a larger project I'm less chuffed about this. I
have fiddled with the test layout some as well.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* end Cargo.toml in newline

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Slim down the feature list slightly, relax bounds

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Reverse 'bytes' upgrade

This crate looks to be challenging to upgrade. Best to do once
tokio is updated in this project.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* Use libc on Windows

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>

* remove unused file

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance source: file Anything `file` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

1 participant