-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ECS] [request]: Support Mem_Buf_Limit in FireLens #964
Comments
Hi @PettitWesley , I would like to add Mem_Buf_Limit to the input plugin for FireLens to avoid the container OOM issue. However, I found that FireLens just support users to modify the output plugin in task definition as you mentioned. [1] https://github.com/aws/aws-for-fluent-bit |
Hey @chikinchoi, there is a workaround that allows you to edit Mem_Buf_Limit right now, though it is slightly inconvenient. I am working on writing and publishing a short tutorial on that; I can post a shortened version here before the full post is published. Stay tuned. |
Hi @PettitWesley , Look forward to your short tuorial!! Thank you very much! |
Hey @PettitWesley , just wanna know may I have the general idea for how to implement the wordaround? :) |
@chikinchoi Sorry for the delay, here is the short tutorial (which will be improved and cleaned up and published elsewhere in some time). Background: How FireLens configures Fluentd and Fluent BitBefore we learn how to set input parameters, we need to understand how FireLens works in detail. As explained in Under the Hood: FireLens for ECS Tasks: https://aws.amazon.com/blogs/containers/under-the-hood-firelens-for-amazon-ecs-tasks/
Thus, while fundamentally FireLens just aimed to enable Fluentd and Fluent Bit in ECS and ECS Fargate, we built configuration management features to make that easy. This involved two things:
Consequently, the configuration file for Fluentd or Fluent Bit ile is “fully managed” by ECS. With the config-file-type option, you can import your own configuration. However, the input definitions are always generated by ECS, and your additional config is then imported using the Fluentd/Fluent Bit include statement. Internally, Fluentd and Fluent Bit concatenate the two config files together- so your config is appended to the generated config. The generated config is always mounted into your log routing container at set locations:
Tutorial: Setting input parameters (WIP)The configuration for Fluent Bit is generated by the ECS Agent, and mounted into the FireLens container at /fluent-bit/etc/fluent-bit.conf. The AWS for Fluent Bit container image and the official open source container distribution of Fluent Bit use this as the default configuration path. The input configuration for FireLens can be seen here; the input definitions are always the same, they do not change based on user input: https://github.com/aws-samples/amazon-ecs-firelens-under-the-hood/blob/master/generated-configs/fluent-bit/generated_by_firelens.conf#L3 Basically, logs are always read from a Unix Socket mounted into the container at /var/run/fluent.sock. As a FireLens user, you can set your own input configuration by overriding the default entry point command for the Fluent Bit container. See the following: If you use AWS for Fluent Bit, override the entry point command to be something like: /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so -c /fluent-bit/alt/fluent-bit.conf Build a custom Fluent Bit image with your own configuration file at that location. Remember to set the input definition with the same unix path:
You can then add additional options in this input section. To make your config dynamic at runtime, remember that you can use environment variables in Fluent Bit config:
You can then set the values of those environment variables in the FireLens container. |
Let me know if any of it is confusing |
Hi @PettitWesley , Thank you for your update!
However, is it mean the fluent-bit cannot get the log configuration key & value from FireLens for updating the output plugin? For example, In order to connect the external Fluentd, I will add 'Host' & 'Port' to the FireLens Log configuration in the sidecar application task definition [1]. [1] https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_firelens.html |
At the moment, this is not possible. My workaround is the only solution.
Yes, all configuration must be in the custom fluent bit configuration file that you add. You can not use the logConfiguration options- specify |
I think a possibly good alternative to fixing this in FireLens would be a Service section configuration for Fluent Bit that governs that max memory used for buffering by all inputs. |
Hi @PettitWesley , May I know is there any update in this case? |
The detailed blog on the workaround has been published: https://aws.amazon.com/blogs/containers/how-to-set-fluentd-and-fluent-bit-input-parameters-in-firelens/ Other than that we don't have an updated ETA on this feature at this time. |
Is this option ( |
It looks like the answer is NO, it is not supported on a Service level. 😞 |
One issue with this approach when you are building a custom docker image is inability to generate dynamic records ( Example:
UPDATE: I've devised a solution of how these dynamic records could be passed to FluentBit, also with this solution you'll be able to pass FluentBit configuration parameters as well, via environment variables.
#!/bin/bash
### Fluent Bit configuration parameters (defaults)
## Service section
export FLB_SERVICE_FLUSH=${FLB_SERVICE_FLUSH:-"1"}
export FLB_SERVICE_GRACE=${FLB_SERVICE_GRACE:-"30"}
export FLB_SERVICE_LOG_LEVEL=${FLB_SERVICE_LOG_LEVEL:-"info"}
## Input section
export FLB_INPUT_MEM_BUF_LIMIT=${FLB_INPUT_MEM_BUF_LIMIT:-"100MB"}
### Collect EC2 and ECS metadata
export EC2_INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
export ECS_METADATA=$(curl -s ${ECS_CONTAINER_METADATA_URI_V4})
export ECS_CLUSTER=$(echo ${ECS_METADATA} | python -c "import json, sys; print(json.load(sys.stdin)['Labels']['com.amazonaws.ecs.cluster'])")
export ECS_TASK_ARN=$(echo ${ECS_METADATA} | python -c "import json, sys; print(json.load(sys.stdin)['Labels']['com.amazonaws.ecs.task-arn'])")
export ECS_TASK_DEFINITION_FAMILY=$(echo ${ECS_METADATA} | python -c "import json, sys; print(json.load(sys.stdin)['Labels']['com.amazonaws.ecs.task-definition-family'])")
export ECS_TASK_DEFINITION_VERSION=$(echo ${ECS_METADATA} | python -c "import json, sys; print(json.load(sys.stdin)['Labels']['com.amazonaws.ecs.task-definition-version'])")
export ECS_IMAGE_VERSION=$(echo ${ECS_METADATA} | python -c "import json, sys; print(json.load(sys.stdin)['Image'].split(':')[-1])")
export ECS_TASK_DEFINITION="${ECS_TASK_DEFINITION_FAMILY}:${ECS_TASK_DEFINITION_VERSION}"
echo "AWS for Fluent Bit Container Image Version ${ECS_IMAGE_VERSION}"
exec /fluent-bit/bin/fluent-bit -e /fluent-bit/firehose.so -e /fluent-bit/cloudwatch.so -e /fluent-bit/kinesis.so -c /fluent-bit/alt/fluent-bit.conf Full example of FluentBit configuration file.
[SERVICE]
Flush ${FLB_SERVICE_FLUSH}
Grace ${FLB_SERVICE_GRACE}
log_Level ${FLB_SERVICE_LOG_LEVEL}
[INPUT]
Name forward
unix_path /var/run/fluent.sock
Mem_Buf_Limit ${FLB_INPUT_MEM_BUF_LIMIT}
[INPUT]
Name forward
Listen 0.0.0.0
Port 24224
Mem_Buf_Limit ${FLB_INPUT_MEM_BUF_LIMIT}
[INPUT]
Name tcp
Tag firelens-healthcheck
Listen 127.0.0.1
Port 8877
Mem_Buf_Limit ${FLB_INPUT_MEM_BUF_LIMIT}
[FILTER]
Name record_modifier
Match *
Record ec2_instance_id ${EC2_INSTANCE_ID}
Record ecs_cluster ${ECS_CLUSTER}
Record ecs_task_arn ${ECS_TASK_ARN}
Record ecs_task_definition ${ECS_TASK_DEFINITION}
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /aws/ecs/${ENV_NAME}/${SERVICE_NAME}
log_stream_prefix ${LOG_STREAM_PREFIX}-
auto_create_group true
#log_key log
[OUTPUT]
Name null
Match firelens-healthcheck
ARG DOCKER_BASE_IMAGE
FROM ${DOCKER_BASE_IMAGE}
COPY entrypoint.sh /
COPY fluent-bit.conf /fluent-bit/alt/fluent-bit.conf
CMD ["/bin/bash", "-c", "/entrypoint.sh"] Build custom docker image by executing: docker build --no-cache --build-arg DOCKER_BASE_IMAGE=amazon/aws-for-fluent-bit:2.16.1 -t xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/custom-fluent-bit:2.16.1-1.0.0 . ECS task definition file: {
"containerDefinitions": [
{
"cpu": 128,
"environment": [
{
"name": "SERVICE_NAME",
"value": "my-service"
},
{
"name": "ENV_NAME",
"value": "test"
}
],
"essential": true,
"image": "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/my-service:latest",
"logConfiguration": {
"logDriver": "awsfirelens"
},
"linuxParameters": {
"initProcessEnabled": true
},
"memory": 256,
"name": "my-service",
"portMappings": [
{
"containerPort": 80,
"hostPort": 0,
"protocol": "tcp"
}
],
"volumesFrom": []
},
{
"environment": [
{
"name": "SERVICE_NAME",
"value": "my-service"
},
{
"name": "ENV_NAME",
"value": "test"
},
{
"name": "LOG_STREAM_PREFIX",
"value": "test"
},
{
"name": "FLB_SERVICE_LOG_LEVEL",
"value": "info"
},
{
"name": "FLB_INPUT_MEM_BUF_LIMIT",
"value": "100MB"
}
],
"essential": true,
"image": "xxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/custom-fluent-bit:2.16.1-1.0.0",
"name": "log_router",
"firelensConfiguration": {
"type": "fluentbit"
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-create-group": "true",
"awslogs-group": "/aws/ecs/test/my-service",
"awslogs-stream-prefix": "test-firelens",
"awslogs-region": "us-east-1"
}
},
"memoryReservation": 100
}
],
"family": "test-my-service",
"executionRoleArn": "arn:aws:iam::xxxxxxxxxxxx:role/ecs-firelens-execution-role",
"placementConstraints": []
} |
The feature is released on EC2 Agent now: https://github.com/aws/amazon-ecs-agent/releases/tag/v1.55.0. For Fargate side, we will continue to work on it and drive it to be supported soon. Note: |
Hello all, we are working on this feature and would like to gather some real user data here. So what is the actual real world values are expected to be setting for this? We are considering set 256 MB memory as a max value in Fargate and wonder if it will work in your user case. Please leave some comments here if possible. Thanks! Note: On EC2 Agent, it has been released and has no limit right now. Note: |
Sorry for the confusion above. Above release update is for a fluentd log driver option so it is not related to this request. For this request, mem_buf_limit is a fluent bit config option. I've opened a new issue to track the request I am working on: #1484. Thanks for the understanding. |
In this AWS doc the option |
@PettitWesley Can we use the Throttle filter in fluent-bit if we set the retry limit to |
@farazhv the log driver buffer limit is entirely different. app stdout/stderr => container runtime buffer (1) => fluentd log driver buffer buffer (2) => Fluent Bit forward input => Fluent Bit internal buffer (3) => log destination
|
I am not sure what you are looking for here. What is your use case/goal? |
For the feature request in this issue, I am now thinking we will just may be contribute this instead: fluent/fluent-bit#5711 |
Firelens runs as a sidecar to the application container in a Fargate task. Logs emitted to stdout by the application container are sent by Firelens to CloudWatch and DataDog. My goal is to prevent an OOM from taking down the task. It is acceptable to lose application logs if I have a guarantee the task won’t go down due to an OOM in Firelens. |
@farazhv I think what you want is this tutorial: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention Also remember that you can set the FireLens container as non-essential so that if it fails then it won't take down the task. Also check out our health check guide: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/health-check |
FluentBit has a limited set of buffers constrained by its memory. With memory available, it will output the log messages it receives. However, if it isn't able to output messages or if the inflow of messages is very high, then the buffers in it will reach capacity resulting in an OOM crash. If I want to revise my original use case to introduce resilience such that if I need FluentBit to drop messages in case, it reaches a hard memory limit, would it be safe to configure the throttle filter and disable retries? This is based on the assumption that the throttle filter will prevent the buffers from reaching capacity if the inflow rate is very high, and disabling retries will prevent them from reaching capacity if the output is unsuccessful. |
@farazhv I think this would work. I have not actually tested it or seen it used in production though, so this is a hypothesis not something that is proven. But the thinking makes sense, if Fluent Bit via the throttle filter is limited in the rate which it can accept logs, and no retries are configured meaning that any issues at the output does not really lead to backpressure... then the memory usage should not be able to increase much. |
Fluentbit already supports Rather than extend fiuentbit itself to have a new parameter, it seems like the simpler gap to plug is that it's not possible to control the This limitation rears its head with https://github.com/aws/aws-for-fluent-bit/blob/mainline/use_cases/init-process-for-fluent-bit/README.md, you can import additional fluentbit config files from S3, but as far as I can see you're stuck with controlling the generated INPUT via firelens options. |
@adrian-skybaker Which input settings are you interested in setting? Which are highest priority? Which are critical? IMO, the critical CC @matthewfala |
Yes I think so. However... it still seems unfortunate that if you want absolute control over this input, the only choice will still be a custom image, even with the new init process that allows supplementary config from S3 includes. Perhaps ultimately its just that I'm trying to workaround the lack of S3 custom config source for Fargate, but an option to completely suppress this INPUT would mean I could redeclare it myself with full control. |
That's all well and good, but fluentbit doesn't support that today. Whereas it does support setting these on INPUT, it's just not controllable via firelens. You can also imagine a scenario where even if it was supported, I might want a specific value set for stdin, but a different service level setting (eg for several other tail inputs). But perhaps this all comes back to the same limitation that once you want to have control over this input, you have to stick with a custom image (which is a very clunky way of passing some .conf files). |
@adrian-skybaker I agree, most ideal solution is that you could pass in arbitrary parameters to the generate input. I still think in that case that for Fluent Bit in general there are user experiences reasons to have a global |
Yes I agree. This feedback is a bit off-topic for this issue, but after using https://github.com/aws/aws-for-fluent-bit/blob/mainline/use_cases/init-process-for-fluent-bit/README.md for a day or so, my view is that you end up with quite a messy hybrid trying to combine firelens-config-generated fluentbit conf with hand-crafted fluentbit conf, with several surprises and limitations. IMO a simpler mental model is either 100% firelens config, with higher level options like the global buffer setting you mention, existing streamlined cloudwatch etc, or 100% self-managed fluentbit conf files (supported by some niceties like helpful plugins and env vars being available). Of course the latter is available today, it just requires your own container. |
any updates on custom config for fargate ecs? I've tried to create firelens custom image to prevent OOM kill (set storage type to filesystem) and use that with my application container. Log should be forward to kinesis streams. However after setting up custom config, the log is sending to nowhere...; task is started successfully. So I came back to default config with original firelens image.. then log is successfully forwarded to kinesis streams. Any idea to debug this? |
@rnlduaeo Can you please fully describe your issue and submit task def, Fluent bit config, and logs to an issue here: https://github.com/aws/aws-for-fluent-bit/issues |
For others, here is our oomkill guide: https://github.com/aws-samples/amazon-ecs-firelens-examples/tree/mainline/examples/fluent-bit/oomkill-prevention This guide explains all of the different buffer settings, and ways they can be used. Please read. |
Hi @PettitWesley, thank you for sharing this guide. It looks like it basically highlights the importance of being able to at least allow My point is, not allowing for this settings at input level make this solution not production ready as we end up with 3 options if we want to ship logs from Fargate:
I understand how it is a pain to allow certain configs in the |
@acm19 I understand the difficulties you are facing. The simplest and fastest to implement solution to this problem (which also means it has the best chance to actually get released), would be to modify the ECS init AWS for Fluent Bit image: https://github.com/aws/aws-for-fluent-bit/tree/develop/use_cases/init-process-for-fluent-bit Its an image that we vend, its just another tag, as explained in that link. Please then also view the 3 examples we have for it here: The init tag allows you to specify additional configuration files as env vars in your task definition. The image will use these config files, they can either be built into the image, mounted into the container at runtime, or pulled from S3. Init also supports injecting ECS metadata as an env var, so you get that support. Currently, the init tag will also always include the main FireLens generated input config file: https://github.com/aws/aws-for-fluent-bit/blob/develop/init/fluent_bit_init_process.go#L388 I'm thinking I could simply add a new env var, like What do you think? |
Should the value of the Options:
|
Hi @PettitWesley, thank you for your reply. I see some major disadvantages of using this image in a production environment:
So, even though it'd be very useful for testing configurations quickly, I'd probably consider more sensible to manage a custom image for production applications.
I personally prefer option 2. |
Community Note
Tell us about your request
With FireLens, the input definitions for Fluent Bit are generated by ECS. This prevents customers from setting any custom options on the input configuration.
Mem_Buf_Limit
is an input configuration option which sets the total memory available for buffering logs.This field should probably be optionally configurable by customers. We should determine if there are any other input options as well which might need to be configured.
UPDATE: We will likely work with Fluent Bit upstream to contribute this instead: fluent/fluent-bit#5711
Which service(s) is this request for?
ECS EC2, and ECS Fargate
Are you currently working around this issue?
There's a way to "hack" in input configuration, which is not ideal but I should possibly publish a tutorial on if it is desired...
EDIT: Here is the detailed tutorial: https://aws.amazon.com/blogs/containers/how-to-set-fluentd-and-fluent-bit-input-parameters-in-firelens/
The text was updated successfully, but these errors were encountered: