Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch #6363

Closed
umgbhalla opened this issue Oct 20, 2022 · 15 comments
Closed
Labels
bug Something isn't working

Comments

@umgbhalla
Copy link

umgbhalla commented Oct 20, 2022

Describe the bug
I have noticed an issue on opentelemetry http collector port , that it gives StatusCode.UNAVAILABLE when sending traces

Steps to reproduce
Setup docker compose or k8s setup for opentemetry collector , ( i have confirmed this on both k8s and docker compose ) and use this repo to produce traces, (edit ./src/helpers/tracing/index.ts to change the endpoint if neccesary)

What did you expect to see?
no error for status code and traces being collected , as otlp over grpc is working

What did you see instead?
StatusCode.UNAVAILABLE only on otlp http

What version did you use?
Version: 0.60.0

What config did you use?
docker-compose.yaml

version: "2.4"

services:
  otel-collector:
    container_name: otel-collector
    image: otel/opentelemetry-collector:0.60.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    # user: root # required for reading docker container logs
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    environment:
      - OTEL_RESOURCE_ATTRIBUTES=host.name=otel-host,os.type=linux
    ports:
      # - "1777:1777"     # pprof extension
      - "4317:4317"     # OTLP gRPC receiver
      - "4318:4318"     # OTLP HTTP receiver
      # - "8888:8888"     # OtelCollector internal metrics
      # - "8889:8889"     # signoz spanmetrics exposed by the agent
      # - "9411:9411"     # Zipkin port
      # - "13133:13133"   # health check extension
      # - "14250:14250"   # Jaeger gRPC
      # - "14268:14268"   # Jaeger thrift HTTP
      # - "55678:55678"   # OpenCensus receiver
      # - "55679:55679"   # zPages extension
    restart: on-failure
    networks:
      - api-dockernet

networks:
  api-dockernet:
    driver: bridge

otel-collector-config.yaml

receivers:
  jaeger:
    protocols:
      grpc:
        endpoint: 0.0.0.0:14250
      thrift_http:
        endpoint: 0.0.0.0:14268
      thrift_compact:
        endpoint: 0.0.0.0:6831
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
        cors:
          allowed_origins:
            - http://*
            - https://*
  zipkin:
       endpoint: 0.0.0.0:9411


processors:
  batch:
    send_batch_size: 4000
    send_batch_max_size: 4000
    timeout: 10s
  # If set to null, will be overridden with values based on k8s resource limits
  memory_limiter: null

exporters:
  otlp:
    endpoint: '<redacted>:80'
    tls:
      insecure: true
    sending_queue:
      queue_size: 1000000
  prometheusremotewrite:
    endpoint: 'http://<redacted>/write'
    tls:
      insecure: true


service:
  pipelines:
    traces:
      receivers: [jaeger, otlp]
      exporters: [ otlp]
      processors: [batch]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheusremotewrite]

Environment
OS: any

Additional context
this issue is only happening on otlp http and not on otlp grpc

@umgbhalla umgbhalla added the bug Something isn't working label Oct 20, 2022
@adityaraibytelearn
Copy link

Is this resolved. I can see the same issue while using grpc.

@benjamingorman
Copy link

I'm also seeing this over both grpc and http.

2023-01-12 17:08:10,079 WARNING opentelemetry.exporter.otlp.proto.grpc.exporter /usr/local/lib/python3.8/dist-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py:356   Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 16s.

I'm running the jaeger all in one image like this:

docker run --name jaeger   -e COLLECTOR_OTLP_ENABLED=true -e DJAEGER_AGENT_HOST=0.0.0.0  -p 16686:16686   -p 4317:4317   -p 4318:4318  jaegertracing/all-in-one:1.35

@h4ckroot
Copy link

I had a similar issue, and I found that this error will emit if your application cannot reach the collector. This could happen if you are running the application and the collector on two different networks (or on two different docker-compose files that do not share the same network).

I hope this helps!.

@umgbhalla umgbhalla closed this as not planned Won't fix, can't repro, duplicate, stale Feb 21, 2023
@charliebarber
Copy link

I am also getting this issue in a docker container between a instrumented Python app and the collector. They are on the same network with the bridge as a driver. Can't seem to fix it.

@LronDC
Copy link

LronDC commented Apr 4, 2023

May I ask why this issue has been closed?

@gilbertobr
Copy link

I am also having the same problem.

Script template used:

import logging

from opentelemetry import trace
from opentelemetry._logs import set_logger_provider
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter,
)
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

trace.set_tracer_provider(TracerProvider())
trace.get_tracer_provider().add_span_processor(
    BatchSpanProcessor(ConsoleSpanExporter())
)

logging.basicConfig(level=logging.DEBUG)

logger_provider = LoggerProvider(
    resource=Resource.create(
        {
            "service.name": "shoppingcart",
            "service.instance.id": "instance-12",
        }
    ),
)
set_logger_provider(logger_provider)

exporter = OTLPLogExporter(endpoint="grpc.otel-collector.my.domain.io:80", insecure=True, timeout=20)
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
handler = LoggingHandler(level=logging.NOTSET, logger_provider=logger_provider)

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)

# Log directly
logging.info("Jackdaws love my big sphinx of quartz.")

# Create different namespaced loggers
logger1 = logging.getLogger("myapp.area1")
logger2 = logging.getLogger("myapp.area2")

logger1.debug("Quick zephyrs blow, vexing daft Jim.")
logger1.info("How quickly daft jumping zebras vex.")
logger2.warning("Jail zesty vixen who grabbed pay from quack.")
logger2.error("The five boxing wizards jump quickly.")


# Trace context correlation
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
    # Do something
    logger2.error("Hyderabad, we have a major problem.")

logger_provider.shutdown()

@gilbertobr
Copy link

I noticed that in nginx (proxy)
returns 400

 "PRI * HTTP/2.0" 400 150 "-" "-" 0 5.001 [] [] - - - - 

@sherlockliu
Copy link

Any updates about this one? sounds like haven't resolved but been closed

@rodrigoazv
Copy link

In my case i was using wrong name of host, because of the docker-compose, we should use the name of container, in my case

http://jaeger over http://localhost

@tquach-evertz
Copy link

Any updates about this one? sounds like haven't resolved but been closed

The same issue hast just happened with our application... Looks like the issue hasn't been resolved yet

@john-pl
Copy link

john-pl commented Jun 15, 2023

We're having the same problem. I don't feel this should be closed.

@wizrds
Copy link

wizrds commented Jun 21, 2023

I'm encountering the same issue as well. Running otel-collector in a docker container with the gRPC port exposed and connecting to it from a native python application. The line Transient error StatusCode.UNAVAILABLE encountered while exporting metrics, retrying in 1s. will sometimes spam the logs and other times I don't see it once. Is there anyway to hide the output at least?

@menyisskov
Copy link

We're having the same issue.
We run the app on k8s (docker desktop), and the all-in-one on the same laptop with the docker run command.

Any ideas what can be causing it?

@chansonzhang
Copy link

I run a jaeger-all-in-one.exe binary on Windows, and export span from an instrumented Sanic app, failed with error "Failed to export batch. Status code: StatusCode.UNAVAILABLE"

@kevarr
Copy link

kevarr commented Aug 7, 2024

The solution (using python-opentelemtry) for me was to fix my OTLPSpanExporter import. I was attempting to export gRPC spans, but was importing with:

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

Instead I needed to import:


from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

If you're exporting using http/protobuf import from opentelemetry.exporter.otlp.proto.http.trace_exporter instead.

It's a very subtle difference. I suppose I should've paid closer attention when my IDE made an import suggestion for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests