Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial new profiling work #74

Merged
merged 6 commits into from
May 20, 2021
Merged

Initial new profiling work #74

merged 6 commits into from
May 20, 2021

Conversation

blt
Copy link
Contributor

@blt blt commented Mar 10, 2021

This commit introduces a new performance test file_to_blackhole. What we have
here is not ready for prime time yet, but the basic stubs of a repeatable
process are in place. We have indications that users are struggling with file
source capped topologies keeping up, which is why I've dropped into blackhole
versus into an http sink. As much as possible this work focuses on file source
alone.

Throughput falls off pretty rapidly as line sizes increase, looks like. See
notes.md in this commit for details.

Signed-off-by: Brian L. Troutwine brian@troutwine.us

This commit introduces a new performance test `file_to_blackhole`. What we have
here is not ready for prime time yet, but the basic stubs of a repeatable
process are in place. We have indications that users are struggling with file
source capped topologies keeping up, which is why I've dropped into blackhole
versus into an http sink. As much as possible this work focuses on file source
alone.

Throughput falls off pretty rapidly as line sizes increase, looks like. See
`notes.md` in this commit for details.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt blt requested review from lukesteensen and binarylogic March 10, 2021 22:24
Comment on lines +56 to +70
| 10 | 25 Kb/sec | 5 Kb | 250 Kb/sec | 250 Kb/sec | 267 Kb/sec | 0 Kb/sec |
| 10 | 250 Kb/sec | 5 Kb | 2.5 Mb/sec | 2.5 Mb/sec | 2.68 Mb/sec | 0 Mb/sec |
| 10 | 2.5 Mb/sec | 5 Kb | 25 Mb/sec | 25 Mb/sec | 26.8 Mb/sec | 0 Mb/sec |
| 10 | 2.5 Mb/sec | 10 Kb | 25 Mb/sec | 25 Mb/sec | 26.2 Mb/sec | 0 Mb/sec |
| 10 | 2.5 Mb/sec | 100 Kb | 25 Mb/sec | 25 Mb/sec | 25.7 Mb/sec | 0 Mb/sec |
| 10 | 2.5 Mb/sec | 100 Kb | 25 Mb/sec | 25 Mb/sec | 25.7 Mb/sec | 0 Mb/sec |
| 10 | 2.5 Mb/sec | 105 Kb | 25 Mb/sec | 23.9 Mb/sec | 24.6 Mb/sec | 1.1 Mb/sec |
| 10 | 2.5 Mb/sec | 110 Kb | 25 Mb/sec | 22 Mb/sec | 22.06 Mb/sec | 3 Mb/sec |
| 10 | 2.5 Mb/sec | 125 Kb | 25 Mb/sec | 17 Mb/sec | 17.04 Mb/sec | 9 Mb/sec |
| 10 | 2.5 Mb/sec | 150 Kb | 25 Mb/sec | 12.3 Mb/sec | 12.6 Mb/sec | 12.7 Mb/sec |
| 10 | 2.5 Mb/sec | 200 Kb | 25 Mb/sec | 6 Mb/sec | 6.15 Mb/sec | 19 Mb/sec |
| 10 | 2.5 Mb/sec | 300 Kb | 25 Mb/sec | 3 Mb/sec | 3.08 Mb/sec | 22 Mb/sec |
| 10 | 2.5 Mb/sec | 400 Kb | 25 Mb/sec | 1.53 Mb/sec | 1.57 Mb/sec | 23.47 Mb/sec |
| 10 | 2.5 Mb/sec | 500 Kb | 25 Mb/sec | 1.05 Mb/sec | 1.07 Mb/sec | 23.95 Mb/sec |
| 10 | 2.5 Mb/sec | 900 Kb | 25 Mb/sec | 231 Kb/sec | 237 Kb/sec | 24.77 Mb/sec |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukesteensen @binarylogic this is probably the most relevant chunk for our ongoing conversation.

@blt blt requested a review from jszwedko March 10, 2021 22:26
@blt
Copy link
Contributor Author

blt commented Mar 10, 2021

@tsantero this is also potentially relevant to what you folks were experiencing.

@blt
Copy link
Contributor Author

blt commented Mar 10, 2021

Relevant flamegraphs: https://gist.github.com/blt/87b90377ae8d8c6e8a08e28c6c437037

The size in the file names refers to the line size that file_gen produces.

@blt
Copy link
Contributor Author

blt commented Mar 10, 2021

The diff doesn't mean much. The two flamegraphs for each line size are pretty radically different, which in itself is interesting.

@binarylogic
Copy link
Contributor

That was fast, very nice. @lukesteensen has a much better understanding of the file source performance characteristics. I'm curious what he thinks.

@blt
Copy link
Contributor Author

blt commented Mar 11, 2021 via email

@tsantero
Copy link

@blt thanks for tagging me here. I'll be able to spend some time with this tonight and report back anything interesting.

@blt
Copy link
Contributor Author

blt commented Mar 11, 2021

@tsantero rad, I appreciate it. If you pull 0.12 and flip debug logs on you'll get utilization numbers -- vectordotdev/vector#6634 -- that'll show us which part of your topology is consuming the most time.

Copy link
Member

@lukesteensen lukesteensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Really happy to see this progressing.


[sources.file]
type = "file"
max_line_bytes = 102400
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, I think this should explain why we see a drop-off right at 100kb line size 😄 . If we add another zero it should align better with the generated data.

That being said, I'm not terribly worried about extremely large individual lines. My assumption is that 100kb is a reasonable upper limit for "normal" use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, good eye. I thought I had that set to 1MB.

Copy link
Contributor Author

@blt blt Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's good, correcting the line size actually makes the results I'm seeing more explicable. I'll add a section to the notes tomorrow.

I can for sure produce a situation where vector falls behind and never catches up and this happens cleanly without my max_line_bytes bug muddying the waters. If I have file_gen 10 files at 500Kb line bytes and adjust max_line_bytes to actually be 1MB the effect still exists, except now the throughput from file source is 5MB where I'd expect 25MB. Likewise if file_gen produces 255 files at 100Kb line bytes the throughput from the file source is ~460Mb/s where file_gen is producing at ~630Mb/s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I adjust file_gen to produce 255 10Kb line files the numbers look like 600Mb/s from file_gen and 400Mb/s from the file source. Excellent result. I'll produce a clearer table tomorrow, but these numbers are promising.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Comment on lines +76 to +144
### Source Exploration

The main entry point for the file source is `src/sources/file.rs` function
`file_source`. This wires up user config, data directory and out
`Pipeline`. Interior return is a boxed async move, `FileServer` from
`file_server` crate is setup as a
[`spawn_blocking`](https://docs.rs/tokio/1.3.0/tokio/task/fn.spawn_blocking.html). Possible
that `block_in_place` would be more ideal but realistically we'd be async all
the way down. `FileServer` is adapted from
[cernan](https://github.com/postmates/cernan/tree/master/src/source/file) code,
notably written before Rust async's story was ideal. That said, I do not find it
explicable that this would be the reason for fall-off once line lengths go
beyond a certain limit, absent other indications of blocking.

## 2021/03/12

As noted by Luke Steensen the drop-off above 100Kb is partially explicable owing
the a bug in the vector config. I had intended the `max_line_bytes` setting to
be 1MB but it was set to 100KB. Vector config is now adjusted to:

```toml
data_dir = "data"

[sources.internal]
type = "internal_metrics"

[sources.file]
type = "file"
max_line_bytes = 1024000
fingerprint.strategy = "device_and_inode"
include = ["logs/*.log"]

[sinks.prometheus]
type = "prometheus_exporter"
inputs = ["internal"]
address = "0.0.0.0:9001"

[sinks.blackhole]
type = "blackhole"
inputs = ["file"]
```

New results, `bytes_per_second` held to 2.5 Mb/sec:

| `duplicates` | `bytes_per_second ` | `maximum_line_size_bytes` | `file_gen` bytes/sec throughput | file source `sum(rate(vector_processed_bytes_total[1m]))` | delta `file_gen`, `file` |
| ------------ | ------------------- | ------------------- | ------------------------------- | --------------------------------------------------------- | ------------------------ |
| 10 | 2.5 Mb/sec | 10 Kb/sec | 25 Mb/sec | 25 Mb/sec | 0 Kb/sec |
| 10 | 25 Mb/sec | 10 Kb/sec | 246 Mb/sec | 219 Mb/sec | 27 Mb/sec |
| 10 | 50 Mb/sec | 10 Kb/sec | 475 Mb/sec | 221 Mb/sec | 254 Mb/sec |
| 10 | 2.5 Mb/sec | 100 Kb/sec | 25 Mb/sec | 25 Mb/sec | 0 Kb/sec |
| 50 | 2.5 Mb/sec | 10 Kb/sec | 125 Mb/sec | 125 Mb/sec | 0 Mb/sec |
| 75 | 2.5 Mb/sec | 10 Kb/sec | 187 Mb/sec | 187 Mb/sec | 0 Mb/sec |
| 85 | 2.5 Mb/sec | 10 Kb/sec | 212 Mb/sec | 212 Mb/sec | 0 Mb/sec |
| 90 | 2.5 Mb/sec | 10 Kb/sec | 225 Mb/sec | 222 Mb/sec | 3 Mb/sec |
| 95 | 2.5 Mb/sec | 10 Kb/sec | 237 Mb/sec | 216 Mb/sec | 19 Mb/sec |
| 100 | 2.5 Mb/sec | 10 Kb/sec | 250 Mb/sec | 221 Mb/sec | 29 Mb/sec |
| 255 | 2.5 Mb/sec | 10 Kb/sec | 616 Mb/sec | 222 Mb/sec | 394 Mb/sec |
| 510 | 100 Kb/sec | 1 Kb/sec | 48 Mb/sec | 40 Mb/sec | 8 Mb/sec |

One thing stood out in the tests: if I allowed vector to develop a backlog and
then stopped `file_gen` the throughput would gradually catch up, before
inevitably falling to zero when all the bytes were processed. There are two
dimensions that cause the file source to fall behind, based on the above table:
total number of files and byte throughput of files. 510 files of relatively
meager throughput -- 48 Mb/sec -- forced file source to fall behind by 8 Mb/sec,
a relatively small number of files -- 10 -- producing at 50Mb/sec caused the
file source to fall behind by 254 Mb/sec. The progenitor code for `file-source`
was concerned with fairness and not throughput, which I believe shows today in
the data.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukesteensen new results with the config bug corrected. Thanks again for catching that.

blt added 4 commits March 15, 2021 12:05
This case intends to demonstrate the difference between a multi-hop
remap config and a single hop. The remap program in vector-nohop has
problems but the basic idea is here.

Once the nohop is corrected we should also generate a differential flamegraph.

Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
@blt blt requested a review from tobz May 20, 2021 18:26
@blt blt marked this pull request as ready for review May 20, 2021 18:27
Copy link

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing egregious jumps out here.

As discussed in chat, we know this is a first pass, originally designed for manual benchmarking, so we can continue to refine, standardize, etc.

@blt blt merged commit 37d17c3 into vectordotdev:master May 20, 2021
@blt blt deleted the blt-new_perf branch May 20, 2021 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants