-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
We are using Vector to archive incoming syslog messages on S3. After we upgraded from 0.20.0 to 0.22.2 the S3 sink stopped working and Vector dropped every incoming event. It received the events, put them into a disk buffer - which can't be too big because of the insane memory requirement on startup, but that's another issue -, tried to send them to S3, failed with the following error message and then dropped the events/cleared the disk buffer. We lost a few days of events before we noticed that they are missing from the archive.
2022-06-17T08:45:19.883623Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=failed to construct request: Failed to load credentials from the credentials provider: An error occurred while loading credentials: Error response from IMDS (code: 301). Response { status: 301, version: HTTP/1.1, headers: {"content-type": "text/html; charset=utf-8", "location": "/latest/meta-data/iam/security-credentials/", "date": "Fri, 17 Jun 2022 08:45:19 GMT", "content-length": "78"}, body: SdkBody { inner: Once(Some(b"<a href=\"/latest/meta-data/iam/security-credentials/\">Moved Permanently</a>.\n\n")), retryable: true } } component_kind="sink" component_type="aws_s3" component_id=s3_archive component_name=s3_archive
The strange thing that if I curl the same URL from the pod then it works.
~ # curl -v http://169.254.169.254/latest/meta-data/iam/security-credentials/ ; echo
* Trying 169.254.169.254:80...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.79.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Fri, 17 Jun 2022 08:46:59 GMT
< Content-Length: 94
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 169.254.169.254 left intact
arn:aws:iam::1234567890abcd:role/role-name
We enabled both IMDSv1 AND v2 on the instance. We are using KIAM which intercepts the requests sent to the IMDS but this setup was working fine in 0.20.0.
Also we are very worried about the silent loss of the events. We expected that the buffer won't be emptied if the destination is not available/reachable. We think this is the real issue. In the future we will migrate away from KIAM so if 0.22.2 and onwards doesn't support that setup anymore then we can live with it and use 0.20.0 for a while but the silent data loss is frightening.
Configuration
data_dir = "/var/lib/vector"
[api]
enabled = false
### sources
[sources.internal_metrics]
type = "internal_metrics"
scrape_interval_secs = 5
[sources.syslog]
type = "syslog"
address = "0.0.0.0:601"
connection_limit = 200
keepalive = { time_secs = 50 }
max_length = 102400
mode = "tcp"
### transforms
### sinks
[sinks.prometheus_exporter]
type = "prometheus_exporter"
inputs = [ "internal_metrics" ]
address = "0.0.0.0:9090"
flush_period_secs = 60
[sinks.s3_archive]
type = "aws_s3"
inputs = [ "syslog" ]
batch = { max_bytes = 536870912, timeout_secs = 600 }
bucket = "our-bucket"
buffer = { max_size = 1073741824, type = "disk", when_full = "block" }
compression = "gzip"
encoding = { codec = "text", timestamp_format = "rfc3339" }
filename_time_format = "%T"
healthcheck = { enabled = true }
key_prefix = "logs/foobar/year=%Y/month=%m/day=%d/"
region = "us-west-2"
server_side_encryption = "aws:kms"
ssekms_key_id = "alias/log-archive"
Version
vector 0.22.2 (x86_64-unknown-linux-musl 0024c92 2022-06-15)
Debug Output
No response
Example Data
No response
Additional Context
Vector is running in Kubernetes and uses EC2 instance roles which are provided by KIAM.
References
No response