Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

help request: opentelemetry + grpc-transcode not working #9270

Closed
ThalysonR opened this issue Apr 8, 2023 · 13 comments · Fixed by #9606
Closed

help request: opentelemetry + grpc-transcode not working #9270

ThalysonR opened this issue Apr 8, 2023 · 13 comments · Fixed by #9606
Assignees
Labels
bug Something isn't working

Comments

@ThalysonR
Copy link

Description

I've been testing apisix and tried using otel plugin while transcoding to grpc and got an error. I did try using the otel plugin with http and it works fine, so I assume the problem is with the 2 plugins working together. This is the error i got:

 2023/04/08 05:21:57 [error] 56#56: *4242 lua entry thread aborted: runtime error: ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: bad argument #1 to 'encode' (type 'opentelemetry.proto.trace.v1.TracesData' does not exists)
 stack traceback:
 coroutine 0:
        [C]: in function 'encode'
        ...deps/share/lua/5.1/opentelemetry/trace/exporter/otlp.lua:104: in function 'export_spans'
        ...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:45: in function 'process_batches'
        ...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:77: in function <...are/lua/5.1/opentelemetry/trace/batch_span_processor.lua:57>, context: ngx.timer, client: 172.20.0.1, server: 0.0.0.0:9080

This is my config:

# config.yaml
plugins:
  - grpc-transcode
  - opentelemetry

  opentelemetry:
    resource:
      service.name: APISIX
      tenant.id: business_id
    collector:
      address: otel-collector:4318
      request_timeout: 3
      request_headers:
        foo: bar
    batch_span_processor:
      drop_on_queue_full: false
      max_queue_size: 6
      batch_timeout: 2
      inactive_timeout: 1
      max_export_batch_size: 2
# apisix.yaml
routes:
  - id: users
    methods: [GET]
    uri: /users
    plugins:
      grpc-transcode:
        proto_id: "1"
        service: users.UsersService
        method: FindAll
      opentelemetry:
        sampler:
          name: always_on
    upstream:
      scheme: grpc
      type: roundrobin
      nodes:
        "users:5000": 1

The transcoding part works fine, I'm able to get the expected response from the grpc service, just missing the otel trace.
Am I maybe using the plugins incorrectly?

Environment

  • APISIX version (run apisix version): 3.2.0 docker image
  • Operating system (run uname -a): Ubuntu WSL2
@Sn0rt
Copy link
Contributor

Sn0rt commented Apr 21, 2023

@shreemaan-abhishek help me talk a look

@shreemaan-abhishek
Copy link
Contributor

On it.

@shreemaan-abhishek
Copy link
Contributor

Note that open telemetry only supports binary-encoded OLTP over HTTP.

So I wanted to know if you were using grpc-transcode to support OLTP over GRPC.

@ThalysonR
Copy link
Author

ThalysonR commented Apr 24, 2023

@shreemaan-abhishek I was trying to get regular OTLP over HTTP traces, the GRPC transcoding was being used to reach GRPC services (User service, in this example)

@shreemaan-abhishek
Copy link
Contributor

Thanks, I was able to reproduce this issue. Let me take a look for a fix.

@shreemaan-abhishek
Copy link
Contributor

@kingluo, should we support this or it is by design?

@kingluo
Copy link
Contributor

kingluo commented Apr 25, 2023

It's a bug.
grpc-transcode plugin will compile the configured proto but does not save its original one and restore it later.

local function compile_proto(content)
-- clear pb state
pb.state(nil)

Then in turn it discards the otel proto compiled at startup:

https://github.com/yangxikun/opentelemetry-lua/blob/bb56b3c8e2163711a763c871ae702b36ee09c557/lib/opentelemetry/trace/exporter/pb.lua#L4

@monkeyDluffy6017 monkeyDluffy6017 added the bug Something isn't working label Apr 26, 2023
@coffeebe4code
Copy link

Any progress on this issue, also facing open telemetry issues with the same reported error.

@Jamel-jun
Copy link

When will this problem be repaired? Now it may affect me in production, and I feel sad about this.

@leslie-tsang
Copy link
Member

When will this problem be repaired? Now it may affect me in production, and I feel sad about this.

Yes, @kingluo Please take a look. :)

@leslie-tsang
Copy link
Member

@ThalysonR Please take a look. :) #9606

@leslie-tsang leslie-tsang reopened this Jun 8, 2023
@kingluo
Copy link
Contributor

kingluo commented Jun 8, 2023

cc @coffeebe4code @Jamel-jun The bugfix was merged, please check.

@monkeyDluffy6017
Copy link
Contributor

considered resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants