-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial new profiling work #74
Conversation
This commit introduces a new performance test `file_to_blackhole`. What we have here is not ready for prime time yet, but the basic stubs of a repeatable process are in place. We have indications that users are struggling with file source capped topologies keeping up, which is why I've dropped into blackhole versus into an http sink. As much as possible this work focuses on file source alone. Throughput falls off pretty rapidly as line sizes increase, looks like. See `notes.md` in this commit for details. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
| 10 | 25 Kb/sec | 5 Kb | 250 Kb/sec | 250 Kb/sec | 267 Kb/sec | 0 Kb/sec | | ||
| 10 | 250 Kb/sec | 5 Kb | 2.5 Mb/sec | 2.5 Mb/sec | 2.68 Mb/sec | 0 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 5 Kb | 25 Mb/sec | 25 Mb/sec | 26.8 Mb/sec | 0 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 10 Kb | 25 Mb/sec | 25 Mb/sec | 26.2 Mb/sec | 0 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 100 Kb | 25 Mb/sec | 25 Mb/sec | 25.7 Mb/sec | 0 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 100 Kb | 25 Mb/sec | 25 Mb/sec | 25.7 Mb/sec | 0 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 105 Kb | 25 Mb/sec | 23.9 Mb/sec | 24.6 Mb/sec | 1.1 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 110 Kb | 25 Mb/sec | 22 Mb/sec | 22.06 Mb/sec | 3 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 125 Kb | 25 Mb/sec | 17 Mb/sec | 17.04 Mb/sec | 9 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 150 Kb | 25 Mb/sec | 12.3 Mb/sec | 12.6 Mb/sec | 12.7 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 200 Kb | 25 Mb/sec | 6 Mb/sec | 6.15 Mb/sec | 19 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 300 Kb | 25 Mb/sec | 3 Mb/sec | 3.08 Mb/sec | 22 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 400 Kb | 25 Mb/sec | 1.53 Mb/sec | 1.57 Mb/sec | 23.47 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 500 Kb | 25 Mb/sec | 1.05 Mb/sec | 1.07 Mb/sec | 23.95 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 900 Kb | 25 Mb/sec | 231 Kb/sec | 237 Kb/sec | 24.77 Mb/sec | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukesteensen @binarylogic this is probably the most relevant chunk for our ongoing conversation.
@tsantero this is also potentially relevant to what you folks were experiencing. |
Relevant flamegraphs: https://gist.github.com/blt/87b90377ae8d8c6e8a08e28c6c437037 The size in the file names refers to the line size that |
The diff doesn't mean much. The two flamegraphs for each line size are pretty radically different, which in itself is interesting. |
That was fast, very nice. @lukesteensen has a much better understanding of the file source performance characteristics. I'm curious what he thinks. |
Worth calling out that this is not necessarily THE problem we’ve been working on, but it is a problem.
…On Wed, Mar 10, 2021, at 17:12, Binary Logic wrote:
That was fast, very nice. @lukesteensen <https://github.com/lukesteensen> has a much better understanding of the file source performance characteristics. I'm curious what he thinks.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#74 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAA36YOQ45E2BA6GTPBQN3TDADGLANCNFSM4Y7ACYHA>.
|
@blt thanks for tagging me here. I'll be able to spend some time with this tonight and report back anything interesting. |
@tsantero rad, I appreciate it. If you pull 0.12 and flip debug logs on you'll get utilization numbers -- vectordotdev/vector#6634 -- that'll show us which part of your topology is consuming the most time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Really happy to see this progressing.
|
||
[sources.file] | ||
type = "file" | ||
max_line_bytes = 102400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something, I think this should explain why we see a drop-off right at 100kb line size 😄 . If we add another zero it should align better with the generated data.
That being said, I'm not terribly worried about extremely large individual lines. My assumption is that 100kb is a reasonable upper limit for "normal" use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh, good eye. I thought I had that set to 1MB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's good, correcting the line size actually makes the results I'm seeing more explicable. I'll add a section to the notes tomorrow.
I can for sure produce a situation where vector falls behind and never catches up and this happens cleanly without my max_line_bytes
bug muddying the waters. If I have file_gen
10 files at 500Kb line bytes and adjust max_line_bytes
to actually be 1MB the effect still exists, except now the throughput from file source is 5MB where I'd expect 25MB. Likewise if file_gen
produces 255 files at 100Kb line bytes the throughput from the file source is ~460Mb/s where file_gen
is producing at ~630Mb/s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I adjust file_gen
to produce 255 10Kb line files the numbers look like 600Mb/s from file_gen
and 400Mb/s from the file source. Excellent result. I'll produce a clearer table tomorrow, but these numbers are promising.
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
### Source Exploration | ||
|
||
The main entry point for the file source is `src/sources/file.rs` function | ||
`file_source`. This wires up user config, data directory and out | ||
`Pipeline`. Interior return is a boxed async move, `FileServer` from | ||
`file_server` crate is setup as a | ||
[`spawn_blocking`](https://docs.rs/tokio/1.3.0/tokio/task/fn.spawn_blocking.html). Possible | ||
that `block_in_place` would be more ideal but realistically we'd be async all | ||
the way down. `FileServer` is adapted from | ||
[cernan](https://github.com/postmates/cernan/tree/master/src/source/file) code, | ||
notably written before Rust async's story was ideal. That said, I do not find it | ||
explicable that this would be the reason for fall-off once line lengths go | ||
beyond a certain limit, absent other indications of blocking. | ||
|
||
## 2021/03/12 | ||
|
||
As noted by Luke Steensen the drop-off above 100Kb is partially explicable owing | ||
the a bug in the vector config. I had intended the `max_line_bytes` setting to | ||
be 1MB but it was set to 100KB. Vector config is now adjusted to: | ||
|
||
```toml | ||
data_dir = "data" | ||
|
||
[sources.internal] | ||
type = "internal_metrics" | ||
|
||
[sources.file] | ||
type = "file" | ||
max_line_bytes = 1024000 | ||
fingerprint.strategy = "device_and_inode" | ||
include = ["logs/*.log"] | ||
|
||
[sinks.prometheus] | ||
type = "prometheus_exporter" | ||
inputs = ["internal"] | ||
address = "0.0.0.0:9001" | ||
|
||
[sinks.blackhole] | ||
type = "blackhole" | ||
inputs = ["file"] | ||
``` | ||
|
||
New results, `bytes_per_second` held to 2.5 Mb/sec: | ||
|
||
| `duplicates` | `bytes_per_second ` | `maximum_line_size_bytes` | `file_gen` bytes/sec throughput | file source `sum(rate(vector_processed_bytes_total[1m]))` | delta `file_gen`, `file` | | ||
| ------------ | ------------------- | ------------------- | ------------------------------- | --------------------------------------------------------- | ------------------------ | | ||
| 10 | 2.5 Mb/sec | 10 Kb/sec | 25 Mb/sec | 25 Mb/sec | 0 Kb/sec | | ||
| 10 | 25 Mb/sec | 10 Kb/sec | 246 Mb/sec | 219 Mb/sec | 27 Mb/sec | | ||
| 10 | 50 Mb/sec | 10 Kb/sec | 475 Mb/sec | 221 Mb/sec | 254 Mb/sec | | ||
| 10 | 2.5 Mb/sec | 100 Kb/sec | 25 Mb/sec | 25 Mb/sec | 0 Kb/sec | | ||
| 50 | 2.5 Mb/sec | 10 Kb/sec | 125 Mb/sec | 125 Mb/sec | 0 Mb/sec | | ||
| 75 | 2.5 Mb/sec | 10 Kb/sec | 187 Mb/sec | 187 Mb/sec | 0 Mb/sec | | ||
| 85 | 2.5 Mb/sec | 10 Kb/sec | 212 Mb/sec | 212 Mb/sec | 0 Mb/sec | | ||
| 90 | 2.5 Mb/sec | 10 Kb/sec | 225 Mb/sec | 222 Mb/sec | 3 Mb/sec | | ||
| 95 | 2.5 Mb/sec | 10 Kb/sec | 237 Mb/sec | 216 Mb/sec | 19 Mb/sec | | ||
| 100 | 2.5 Mb/sec | 10 Kb/sec | 250 Mb/sec | 221 Mb/sec | 29 Mb/sec | | ||
| 255 | 2.5 Mb/sec | 10 Kb/sec | 616 Mb/sec | 222 Mb/sec | 394 Mb/sec | | ||
| 510 | 100 Kb/sec | 1 Kb/sec | 48 Mb/sec | 40 Mb/sec | 8 Mb/sec | | ||
|
||
One thing stood out in the tests: if I allowed vector to develop a backlog and | ||
then stopped `file_gen` the throughput would gradually catch up, before | ||
inevitably falling to zero when all the bytes were processed. There are two | ||
dimensions that cause the file source to fall behind, based on the above table: | ||
total number of files and byte throughput of files. 510 files of relatively | ||
meager throughput -- 48 Mb/sec -- forced file source to fall behind by 8 Mb/sec, | ||
a relatively small number of files -- 10 -- producing at 50Mb/sec caused the | ||
file source to fall behind by 254 Mb/sec. The progenitor code for `file-source` | ||
was concerned with fairness and not throughput, which I believe shows today in | ||
the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukesteensen new results with the config bug corrected. Thanks again for catching that.
This case intends to demonstrate the difference between a multi-hop remap config and a single hop. The remap program in vector-nohop has problems but the basic idea is here. Once the nohop is corrected we should also generate a differential flamegraph. Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
Signed-off-by: Brian L. Troutwine <brian@troutwine.us>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing egregious jumps out here.
As discussed in chat, we know this is a first pass, originally designed for manual benchmarking, so we can continue to refine, standardize, etc.
This commit introduces a new performance test
file_to_blackhole
. What we havehere is not ready for prime time yet, but the basic stubs of a repeatable
process are in place. We have indications that users are struggling with file
source capped topologies keeping up, which is why I've dropped into blackhole
versus into an http sink. As much as possible this work focuses on file source
alone.
Throughput falls off pretty rapidly as line sizes increase, looks like. See
notes.md
in this commit for details.Signed-off-by: Brian L. Troutwine brian@troutwine.us