Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exit non-zero when Vector fails to gracefully shutdown #13731

Closed
Sh4d1 opened this issue Jul 27, 2022 · 8 comments
Closed

Exit non-zero when Vector fails to gracefully shutdown #13731

Sh4d1 opened this issue Jul 27, 2022 · 8 comments
Labels
domain: shutdown Anything related to the shutdown of Vector type: bug A code related bug.

Comments

@Sh4d1
Copy link
Contributor

Sh4d1 commented Jul 27, 2022

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Hello, when I reload vector, and there is an issue with a sink (http in my case), this happens:

Jul 27 14:37:47 vector03 vector[33482]: 2022-07-27T14:37:47.739008Z ERROR vector::topology::running: Failed to gracefully shut down in time. Killing components. components="http_out"
Jul 27 14:37:47 vector03 systemd[1]: vector.service: Succeeded.

It looks like it exits with a exit code 0, which then stop systemd from restarting Vector.
I'll take a look when I have some time, but if somebody has an idea before, go for it 😄 (Can't really run with RUST_BACKTRACE atm)

Configuration

[sinks.http_out]
type = "http"
inputs = [ "filter" ]
uri = "http://x.x.x.x"
compression = "none"
encoding.codec = "json"

Version

0.23.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@Sh4d1 Sh4d1 added the type: bug A code related bug. label Jul 27, 2022
@Sh4d1 Sh4d1 changed the title Killing a componenet exits Vector Killing a component exits Vector Jul 27, 2022
@jszwedko
Copy link
Member

Hi @Sh4d1 !

Can you share the full log output and full Vector configuration? I think I reproduced this, but the behavior is just that the sink continuously tries to send the data, which is the correct behavior since the error might recover (in my case DNS resolution). Vector doesn't shut down, but it would if it had consumed all of the input. If Vector does start to shut down, then the sink will eventually time out shutting down and exit as you saw.

I could see having a different exit code for hard shut downs though.

For systemd, you can configure it to always restart even if it exited 0 with Restart=always.

@Sh4d1
Copy link
Contributor Author

Sh4d1 commented Jul 27, 2022

Indeed I ended up changing the Restart of course :D

For the config it's the same than #13718 (the first block)
Here is the full log:


Jul 27 14:36:47 vector03 systemd[1]: Reloaded Vector.
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.720410Z  INFO vector::topology::running: Reloading running topology with new configuration.
Jul 27 14:36:47 vector03 vector[33482]: thread 'vector-worker' panicked at 'Pausing non-existent sink is not valid: haproxy_filter', lib/vector-core/src/fanout.rs:315:17
Jul 27 14:36:47 vector03 vector[33482]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.720950Z  INFO vector::topology::running: Running healthchecks.
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.721078Z  INFO vector::topology::running: New configuration loaded successfully.
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.722491Z  INFO vector: Vector has reloaded. path=[File("/etc/vector/vector.toml", None)]
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.737052Z ERROR transform{component_kind="transform" component_id=vector_parser component_type=remap component_name=vector_parser}: vector::topology: An error occurred that Vector couldn't handle.
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.737218Z  INFO vector: Vector has stopped.
Jul 27 14:36:47 vector03 vector[33482]: thread 'vector-worker' panicked at 'unit error: ()', lib/vector-core/src/fanout.rs:369:42
Jul 27 14:36:47 vector03 vector[33482]: thread 'vector-worker' panicked at 'join error: JoinError::Panic(Id(32), ...)', src/topology/builder.rs:212:30
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.737962Z ERROR source{component_kind="source" component_id=vector_in component_type=vector component_name=vector_in}: vector::topology: An error occurred that Vector couldn't handle.
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.738751Z ERROR source{component_kind="source" component_id=vector_in component_type=vector component_name=vector_in}: vector::internal_events::common: Failed to forward event(s), downstream is closed. error_code="stream_closed" error_type="writer_failed" stage="sending" count=1000
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.739053Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="elasticsearch_out, vector_in, s3_out, journald_parser, http_out" time_remaining="59 seconds left"
Jul 27 14:36:47 vector03 vector[33482]: 2022-07-27T14:36:47.740422Z ERROR source{component_kind="source" component_id=vector_in component_type=vector component_name=vector_in}: vector::internal_events::common: Failed to forward event(s), downstream is closed. error_code="stream_closed" error_type="writer_failed" stage="sending" count=1000
Jul 27 14:36:52 vector03 vector[33482]: 2022-07-27T14:36:52.739198Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="54 seconds left"
Jul 27 14:36:57 vector03 vector[33482]: 2022-07-27T14:36:57.739343Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="49 seconds left"
Jul 27 14:37:02 vector03 vector[33482]: 2022-07-27T14:37:02.738900Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="44 seconds left"
Jul 27 14:37:07 vector03 vector[33482]: 2022-07-27T14:37:07.738703Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="39 seconds left"
Jul 27 14:37:12 vector03 vector[33482]: 2022-07-27T14:37:12.739411Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="34 seconds left"
Jul 27 14:37:17 vector03 vector[33482]: 2022-07-27T14:37:17.738839Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="s3_out, http_out" time_remaining="29 seconds left"
Jul 27 14:37:22 vector03 vector[33482]: 2022-07-27T14:37:22.738611Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="http_out" time_remaining="24 seconds left"
Jul 27 14:37:27 vector03 vector[33482]: 2022-07-27T14:37:27.738634Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="http_out" time_remaining="19 seconds left"
Jul 27 14:37:32 vector03 vector[33482]: 2022-07-27T14:37:32.739271Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="http_out" time_remaining="14 seconds left"
Jul 27 14:37:37 vector03 vector[33482]: 2022-07-27T14:37:37.738683Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="http_out" time_remaining="9 seconds left"
Jul 27 14:37:42 vector03 vector[33482]: 2022-07-27T14:37:42.739096Z  INFO vector::topology::running: Shutting down... Waiting on running components. remaining_components="http_out" time_remaining="4 seconds left"
Jul 27 14:37:47 vector03 vector[33482]: 2022-07-27T14:37:47.739008Z ERROR vector::topology::running: Failed to gracefully shut down in time. Killing components. components="http_out"
Jul 27 14:37:47 vector03 systemd[1]: vector.service: Succeeded.

I think I reproduced this, but the behavior is just that the sink continuously tries to send the data, which is the correct behavior since the error might recover

Yes that's my case too (well some timeout but same behaviour I guess)

@jszwedko
Copy link
Member

Aha, I see. That's interesting that you managed to hit a panic too, which seems to be related to reload. I think there are two issues here:

  • You hit a panic during reload (having the before/after config would help with reproducing that)
  • When Vector isn't able to gracefully shut down we could consider exiting non-zero

I'll repurpose this issue for the second point: that Vector could exit non-zero when it fails to gracefully shutdown. If you are able to reproduce that panic, would you mind opening it as a separate issue?

@jszwedko jszwedko changed the title Killing a component exits Vector Exit non-zero when Vector fails to gracefully shutdown Jul 27, 2022
@jszwedko jszwedko added the domain: shutdown Anything related to the shutdown of Vector label Jul 27, 2022
@Sh4d1
Copy link
Contributor Author

Sh4d1 commented Jul 27, 2022

Ah yeah I forgot about the panic, yeah it totally not related, apparently it panicked with (on a filter transform):

.ident == "haproxy" && (contains(string!(.\"hp.name.fe\"), "...") || contains(string(\".hp.name.fe\"), "..."))

I have some logs:

Jul 27 14:14:46 par2dc5-sis-prd-cl0vector03 vector[32767]: 2022-07-27T14:14:46.970112Z ERROR transform{component_kind="transform" component_id=haproxy_filter component_type=filter component_name=haproxy_filter}: vector::internal_events::conditions: VRL condition execution failed. error=function call error for "contains" at (24:67): function call error for "string" at (33:53): expected string, got null internal_log_rate_secs=120 error_type="script_failed" stage="processing"
Jul 27 14:14:46 par2dc5-sis-prd-cl0vector03 vector[32767]: 2022-07-27T14:14:46.970345Z ERROR transform{component_kind="transform" component_id=haproxy_filter component_type=filter component_name=haproxy_filter}: vector::internal_events::conditions: Internal log [VRL condition execution failed.] is being rate limited.
Jul 27 14:16:47 par2dc5-sis-prd-cl0vector03 vector[32767]: 2022-07-27T14:16:47.404175Z ERROR transform{component_kind="transform" component_id=haproxy_filter component_type=filter component_name=haproxy_filter}: vector::internal_events::conditions: Internal log [VRL condition execution failed.] has been rate limited 61896 times.
Jul 27 14:16:47 par2dc5-sis-prd-cl0vector03 vector[32767]: 2022-07-27T14:16:47.404269Z ERROR transform{component_kind="transform" component_id=haproxy_filter component_type=filter component_name=haproxy_filter}: vector::internal_events::conditions: VRL condition execution failed. error=function call error for "contains" at (24:67): function call error for "string" at (33:53): expected string, got null internal_log_rate_secs=120 error_type="script_failed" stage="processing"
Jul 27 14:16:47 par2dc5-sis-prd-cl0vector03 vector[32767]: 2022-07-27T14:16:47.404365Z ERROR transform{component_kind="transform" component_id=haproxy_filter component_type=filter component_name=haproxy_filter}: vector::internal_events::conditions: Internal log [VRL condition execution failed.] is being rate limited.

I was trying to make the previous filter work, I'm not sure which change cause the panicked though, however now I have to:

condition = ".ident == \"haproxy\" && exists(.\"hp.name.fe\") && is_string(.\"hp.name.fe\") && (contains(string!(.\"hp.name.fe\"), \"public_in\") || contains(string(\".hp.name.fe\"), \"infra_in\"))"

in order to to get those errors (not sure if I should need it, but yeah different issue). I'll see if I can reproduce but the panick but now I don't have any timeout issues so not sure it'll work 😅

@mans2singh
Copy link
Contributor

Hi Folks:

Just wanted to know if there is any targeted release for fixing this issue.

Thanks.

@mans2singh
Copy link
Contributor

👍

@jszwedko
Copy link
Member

Hey! No not yet unfortunately. Happy to accept a PR in this area though.

github-merge-queue bot pushed a commit that referenced this issue Jun 23, 2023
…shutdown times out (#17676)

<!--
**Your PR title must conform to the conventional commit spec!**

  <type>(<scope>)!: <description>

  * `type` = chore, enhancement, feat, fix, docs
  * `!` = OPTIONAL: signals a breaking change
* `scope` = Optional when `type` is "chore" or "docs", available scopes
https://github.com/vectordotdev/vector/blob/master/.github/semantic.yml#L20
  * `description` = short description of the change

Examples:

  * enhancement(file source): Add `sort` option to sort discovered files
  * feat(new source): Initial `statsd` source
  * fix(file source): Fix a bug discovering new files
  * chore(external docs): Clarify `batch_size` option
-->

Issue: [Exit non-zero when Vector fails to gracefully
shutdown](#13731)
@jszwedko
Copy link
Member

jszwedko commented Aug 2, 2023

Closed by #17676

@jszwedko jszwedko closed this as completed Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: shutdown Anything related to the shutdown of Vector type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

3 participants