Skip to content

S3 sink stopped working in 0.22 and lost events #13211

@akunszt

Description

@akunszt

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We are using Vector to archive incoming syslog messages on S3. After we upgraded from 0.20.0 to 0.22.2 the S3 sink stopped working and Vector dropped every incoming event. It received the events, put them into a disk buffer - which can't be too big because of the insane memory requirement on startup, but that's another issue -, tried to send them to S3, failed with the following error message and then dropped the events/cleared the disk buffer. We lost a few days of events before we noticed that they are missing from the archive.

2022-06-17T08:45:19.883623Z ERROR vector::topology::builder: msg="Healthcheck: Failed Reason." error=failed to construct request: Failed to load credentials from the credentials provider: An error occurred while loading credentials: Error response from IMDS (code: 301). Response { status: 301, version: HTTP/1.1, headers: {"content-type": "text/html; charset=utf-8", "location": "/latest/meta-data/iam/security-credentials/", "date": "Fri, 17 Jun 2022 08:45:19 GMT", "content-length": "78"}, body: SdkBody { inner: Once(Some(b"<a href=\"/latest/meta-data/iam/security-credentials/\">Moved Permanently</a>.\n\n")), retryable: true } } component_kind="sink" component_type="aws_s3" component_id=s3_archive component_name=s3_archive

The strange thing that if I curl the same URL from the pod then it works.

~ # curl -v http://169.254.169.254/latest/meta-data/iam/security-credentials/ ; echo
*   Trying 169.254.169.254:80...
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /latest/meta-data/iam/security-credentials/ HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Date: Fri, 17 Jun 2022 08:46:59 GMT
< Content-Length: 94
< Content-Type: text/plain; charset=utf-8
< 
* Connection #0 to host 169.254.169.254 left intact
arn:aws:iam::1234567890abcd:role/role-name

We enabled both IMDSv1 AND v2 on the instance. We are using KIAM which intercepts the requests sent to the IMDS but this setup was working fine in 0.20.0.

Also we are very worried about the silent loss of the events. We expected that the buffer won't be emptied if the destination is not available/reachable. We think this is the real issue. In the future we will migrate away from KIAM so if 0.22.2 and onwards doesn't support that setup anymore then we can live with it and use 0.20.0 for a while but the silent data loss is frightening.

Configuration

data_dir = "/var/lib/vector"

[api]
enabled = false

### sources
[sources.internal_metrics]
type = "internal_metrics"
scrape_interval_secs = 5

[sources.syslog]
type = "syslog"
address = "0.0.0.0:601"
connection_limit = 200
keepalive = { time_secs = 50 }
max_length = 102400
mode = "tcp"

### transforms

### sinks
[sinks.prometheus_exporter]
type = "prometheus_exporter"
inputs = [ "internal_metrics" ]
address = "0.0.0.0:9090"
flush_period_secs = 60

[sinks.s3_archive]
type = "aws_s3"
inputs = [ "syslog" ]
batch = { max_bytes = 536870912, timeout_secs = 600 }
bucket = "our-bucket"
buffer = { max_size = 1073741824, type = "disk", when_full = "block" }
compression = "gzip"
encoding = { codec = "text", timestamp_format = "rfc3339" }
filename_time_format = "%T"
healthcheck = { enabled = true }
key_prefix = "logs/foobar/year=%Y/month=%m/day=%d/"
region = "us-west-2"
server_side_encryption = "aws:kms"
ssekms_key_id = "alias/log-archive"

Version

vector 0.22.2 (x86_64-unknown-linux-musl 0024c92 2022-06-15)

Debug Output

No response

Example Data

No response

Additional Context

Vector is running in Kubernetes and uses EC2 instance roles which are provided by KIAM.

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    sink: aws_s3Anything `aws_s3` sink relatedtype: bugA code related bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions