filelogreceiver tries to read logs from beginning even start_at=end #24839

ashlee3209 · 2023-08-03T15:01:28Z

Component(s)

receiver/filelog

What happened?

Description

I have setup an otel-collector with a log pipeline of filelog receiver and loki exporter. The filelog receiver is set with start_at: end which should read new logs from the moment that the collector is started but it seems only respect this for start_at for the first few log paths but all other log paths are still reading from beginning and it seems also reading without order so the loki exporter returns 'entry too far behind' errror.

Steps to Reproduce

I started to run this collector with a clean loki and no existing checkpoint file.

2023-08-02T19:49:11.119Z error exporterhelper/queued_retry.go:391 Exporting failed. The error is not retryable. Dropping data.
{"kind": "exporter", "data_type": "logs", "name": "loki", "error": "Permanent error: HTTP 400 "Bad Request":
entry with timestamp 2022-07-21 14:27:48.808964741 +0000 UTC ignored, reason: 'entry too far behind, oldest acceptable timestamp is: 2023-02-03T08:02:57Z'
for stream: {exporter="OTLP", k8s_cluster_name="my_cluster", k8s_namespace_name="openshift-ingress-operator", k8s_pod_name="ingress-operator-6c46f79974-bcrpp", log_iostream="stderr"},", "dropped_items": 100}

I have found this timestamp 2022-07-21 14:27:48.808964741 +0000 in one of the old logs before my collector started which should not be read since I set start_at=end.

Also in the collector log, I can see it Started watching file other than Started watching file from end for those logs.

2023-08-02T19:49:06.389Z info fileconsumer/file.go:194 Started watching file from end. To read preexisting logs, configure the argument 'start_at' to 'beginning' {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/default_testpod_44f7ded7-03a6-42d3-82d7-f5126a4ed125/testcontainer/0.log"}
2023-08-02T19:49:07.389Z info fileconsumer/file.go:194 Started watching file from end. To read preexisting logs, configure the argument 'start_at' to 'beginning' {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/openshift-cluster-machine-approver_machine-approver-9bb775975-bwpd6_582d5452-0bfd-4790-8d83-adc9a9394ef2/machine-approver-controller/8.log"}
2023-08-02T19:49:07.901Z info fileconsumer/file.go:192 Started watching file {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/var/log/pods/openshift-ingress-canary_ingress-canary-5tb8p_57266302-0dff-4c53-9a07-93c2f8e6acd0/serve-healthcheck-canary/3.log"}

Expected Result

If start_at is end, then the receiver should only read incoming logs from the moment it starts.

Actual Result

It reads old logs.

Collector version

0.82.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

openshift k8s cluster.

OpenTelemetry Collector configuration

mode: "daemonset"
presets:
  # Configures the collector to collect logs.
  # Adds the filelog receiver to the logs pipeline
  # and adds the necessary volumes and volume mounts.
  # Best used with mode = daemonset.
  logsCollection:
    enabled: true
    includeCollectorLogs: true
    # Enabling this writes checkpoints in /var/lib/otelcol/ host directory.
    # Note this changes collector's user to root, so that it can write to host directory.
    storeCheckpoints: true
  # Configures the Kubernetes Processor to add Kubernetes metadata.
  # Adds the k8sattributes processor to all the pipelines
  # and adds the necessary rules to ClusteRole.
  # Best used with mode = daemonset.
  kubernetesAttributes:
    enabled: true

config:
  extensions:
    # The health_check extension is mandatory for this chart.
    # Without the health_check extension the collector will fail the readiness and liveliness probes.
    # The health_check extension can be modified, but should never be removed.
    health_check: { }
    memory_ballast:
      #  The ballast should be configured to be 1/3 to 1/2 of the memory allocated to the collector.
      size_in_percentage: 50
  exporters:
    logging:
      verbosity: detailed
    loki:
      endpoint: "http://loki_address:loki_port/loki/api/v1/push"
  processors:
    attributes:
      actions:
        - action: insert
          key: loki.attribute.labels
          value: log.iostream
    resource:
      attributes:
        - action: insert
          key: loki.resource.labels
          value: k8s.cluster.name,k8s.namespace.name,k8s.pod.name
    batch:
      send_batch_max_size: 800
      timeout: 10s
      send_batch_size: 500
    memory_limiter:
      # this calculation is based on the resource allocated to this collector, check resources/limits in the bottom
      check_interval: 1s
      # limit_mib is the hard limit
      limit_percentage: 90
      # The soft limit value will be equal to (limit_mib - spike_limit_mib).
      spike_limit_percentage: 20
    k8sattributes:
      passthrough: false
      filter:
        node_from_env_var: K8S_NODE_NAME
      pod_association:
        - sources:
            - from: resource_attribute
              name: k8s.pod.ip
        - sources:
            - from: resource_attribute
              name: k8s.pod.uid
        - sources:
            - from: connection
      extract:
        metadata:
          - k8s.namespace.name
          - k8s.deployment.name
          - k8s.statefulset.name
          - k8s.daemonset.name
          - k8s.cronjob.name
          - k8s.job.name
          - k8s.pod.name
          - k8s.pod.uid
          - k8s.node.name
  receivers:
    jaeger: null
    otlp: null
    prometheus: null
    zipkin: null
    filelog:
      include: [ /var/log/pods/*/*/*.log ]
      start_at: end
      poll_interval: 1s
      retry_on_failure:
        enabled: true
  service:
    telemetry:
      logs:
        level: "info"
      metrics:
        address: "[${env:MY_POD_IP}]:8888"
    extensions:
      - health_check
      - memory_ballast
    pipelines:
      logs:
        receivers:
          - filelog
        processors:
          - memory_limiter
          - k8sattributes
          - attributes
          - resource
        exporters:
          - loki
      metrics: null
      traces: null

# Resource limits & requests. Update according to your own use case as these values might be too low for a typical deployment.
resources:
  limits:
    cpu: 256m
    memory: 5120Mi

Log output

No response

Additional context

No response

github-actions · 2023-08-03T15:06:45Z

Pinging code owners:

receiver/filelog: @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2023-08-03T18:37:32Z

I have been unable to replicate the behavior you are describing. Would it be possible to describe a minimal scenario (e.g. set of simple files, simplest possible config, precise steps to follow), or even better, a unit test that fails?

fgalind1 · 2023-10-18T21:15:46Z

actually we were having a similar issue.. in our case we had > # of log files than the max_concurrent_files limit, which ended up enabling the batch mode. Looks like when batch is enabled, fromBeginning is enabled m.readerFactory.fromBeginning = true and that causes only for the first batch to work properly but not the rest.

In our case in the meantime we WA this by adding a huge limit of concurrent files to avoid batching and that solved the problem. Also I wasn't able to make it work with storage: file_storage. Only when we do not provide a storage for the file position/offset is when it wasn't reading files from start and always just tailing from the end

djaglowski · 2023-10-18T21:37:58Z

@fgalind1, thanks for sharing your findings as well. Based on your description, I think this was likely fixed in #27773.

atoulme · 2023-12-06T05:49:26Z

Closing as complete, please reopen if the problem still occurs.

obs-gh-enricogiorio · 2024-10-23T12:59:43Z

Hi @djaglowski I am still facing this issue despite the fix.
I made sure start_at is set to "end" and I have checked the receiver's logs, nothing mentions that any batching happened.
Also given the amount of log files I have in the system, I can pretty much ensure that the default limit of 1024 concurrent files has ever been reached.
For instance, on 16/10/24 13:22:24.363 CEST, the receiver read a log line with timestamp 14/05/24 14:40:41.042 CEST, which is >5 months old.

From this screenshot you can see:

BUNDLE_TIMESTAMP is when the log line was parsed (technically it's when it reached our system, but it's likely just a few seconds earlier)
timestamp is the log's timestamp
delta is the discrepancy between the two. The logs you see are 4 months old
The purple bits above are when the data falls over time. You can see that this did not happen constantly but only twice, with many instances of very old logs being read all together

This is how I configured the receiver:

  filelog:
    exclude:  '[]'
    include:  '["/var/log/pods/*/*/*.log", "/var/log/kube-apiserver-audit.log"]'
    include_file_name: false
    include_file_path: true
    exclude_older_than: 24h
    operators:
    - id: container-parser
      max_log_size: 102400
      type: container
    retry_on_failure:
      enabled: true
      initial_interval: 1s
      max_interval:  30s
      max_elapsed_time: 5m
    start_at: "end"
    storage: file_storage

Should I create a new issue or asking here is fine?

Thanks

ashlee3209 added bug Something isn't working needs triage New item requiring triage labels Aug 3, 2023

github-actions bot added the receiver/filelog label Aug 3, 2023

mwear added question Further information is requested and removed needs triage New item requiring triage labels Aug 3, 2023

crobert-1 added the waiting for author label Sep 8, 2023

atoulme closed this as completed Dec 6, 2023

obs-gh-enricogiorio mentioned this issue Oct 30, 2024

filelog receiver reads old logs even when start_at is set to end #36091

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filelogreceiver tries to read logs from beginning even start_at=end #24839

filelogreceiver tries to read logs from beginning even start_at=end #24839

ashlee3209 commented Aug 3, 2023

github-actions bot commented Aug 3, 2023

djaglowski commented Aug 3, 2023

fgalind1 commented Oct 18, 2023

djaglowski commented Oct 18, 2023

atoulme commented Dec 6, 2023

obs-gh-enricogiorio commented Oct 23, 2024 •

edited

Loading

filelogreceiver tries to read logs from beginning even start_at=end #24839

filelogreceiver tries to read logs from beginning even start_at=end #24839

Comments

ashlee3209 commented Aug 3, 2023

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Aug 3, 2023

djaglowski commented Aug 3, 2023

fgalind1 commented Oct 18, 2023

djaglowski commented Oct 18, 2023

atoulme commented Dec 6, 2023

obs-gh-enricogiorio commented Oct 23, 2024 • edited Loading

obs-gh-enricogiorio commented Oct 23, 2024 •

edited

Loading