Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/clickhouse] exported data is missing, the amount is not match #33923

Open
Lincyaw opened this issue Jul 4, 2024 · 5 comments
Open
Labels
bug Something isn't working exporter/clickhouse Stale

Comments

@Lincyaw
Copy link

Lincyaw commented Jul 4, 2024

Component(s)

exporter/clickhouse

What happened?

Description

I use opentelemetry-collector-contrib:0.104.0 version, with clickhouse exporter. But the data amount is mismatch compared to the reported data.

Steps to Reproduce

Use the following docker compose file to start a instance

services:
  opentelemetry-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: opentelemetry-collector
    ports:
      - "4317:4317"
    volumes:
      - ./otel-config.yml:/etc/otel-config.yml
    command: ["--config=/etc/otel-config.yml"]
    depends_on:
      clickhouse:
        condition: service_healthy

  clickhouse:
    image: clickhouse/clickhouse-server:latest
    container_name: clickhouse
    ports:
      - "8123:8123"
      - "9000:9000"
    environment:
      - CLICKHOUSE_DB=db
      - CLICKHOUSE_USER=default
      - CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT=1
      - CLICKHOUSE_PASSWORD=password
    volumes:
      - clickhouse_data:/var/lib/clickhouse
    healthcheck:
      test:
        [
          "CMD",
          "wget",
          "--spider",
          "-q",
          "0.0.0.0:8123/ping"
        ]
      interval: 30s
      timeout: 5s
      retries: 3
    ulimits:
      nproc: 65535
      nofile:
        soft: 262144
        hard: 262144

volumes:
  clickhouse_data:

This otel config is used, place these two files together, then run docker compose up

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:
    timeout: 3s
    send_batch_size: 100000

exporters:
  debug:
    verbosity: normal
  clickhouse:
    endpoint: tcp://clickhouse:9000?dial_timeout=10s&compress=lz4&username=default&password=password
    database: default
    ttl: 0
    logs_table_name: otel_logs
    traces_table_name: otel_traces
    metrics_table_name: otel_metrics
    timeout: 5s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]

Then, use a golang code, to send data:

package main

import (
	"context"
	"fmt"
	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"
	"google.golang.org/protobuf/proto"
	"log"
	"os"
	"time"

	metricpb "go.opentelemetry.io/proto/otlp/collector/metrics/v1"
	pb "go.opentelemetry.io/proto/otlp/metrics/v1"
)

func sendRequest(ctx context.Context, metricData *pb.ResourceMetrics) int {
	client, err := grpc.NewClient("10.10.10.29:4317", grpc.WithTransportCredentials(insecure.NewCredentials()))
	if err != nil {
		log.Fatalf("Failed to connect: %v", err)
	}
	defer client.Close()

	count := 0
	for _, v := range metricData.ScopeMetrics {
		for _, vv := range v.Metrics {
			count += len(vv.GetGauge().DataPoints)
		}
	}

	metricClient := metricpb.NewMetricsServiceClient(client)
	resp, err := metricClient.Export(ctx, &metricpb.ExportMetricsServiceRequest{
		ResourceMetrics: []*pb.ResourceMetrics{
			metricData,
		},
	})
	if err != nil {
		log.Fatalf("Failed to send metrics: %v", err)
	}
	fmt.Println(resp)
	return count
}

func main() {
	data, err := os.ReadFile("data.pb")
	if err != nil {
		log.Fatalf("Failed to read file: %v", err)
	}
	var metricData pb.MetricsData
	if err := proto.Unmarshal(data, &metricData); err != nil {
		log.Fatalf("Failed to unmarshal data: %v", err)
	}
	total := 0
	for _, resource := range metricData.ResourceMetrics {
		cnt := sendRequest(context.Background(), resource)
		fmt.Println("send ", cnt, " data points")
		total += cnt
		time.Sleep(1 * time.Second)
	}
	fmt.Println("send total ", total, " data points")
}

The data.pb is attached in dropbox, please download it.

go.mod is:

module awesomeProject

go 1.22

require (
	go.opentelemetry.io/proto/otlp v1.3.1
	google.golang.org/grpc v1.65.0
	google.golang.org/protobuf v1.34.2
)

require (
	github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0 // indirect
	golang.org/x/net v0.25.0 // indirect
	golang.org/x/sys v0.20.0 // indirect
	golang.org/x/text v0.15.0 // indirect
	google.golang.org/genproto/googleapis/api v0.0.0-20240528184218-531527333157 // indirect
	google.golang.org/genproto/googleapis/rpc v0.0.0-20240528184218-531527333157 // indirect
)

Then run the go file. The data will be sent to the otel collector, then to clickhouse.

Expected Result

The amount should exactly equal to 1380344.

Actual Result

It is unstable. Sometimes less, sometimes normal

image

Collector version

0.104.0

Environment information

Environment

OS: debian sid

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
processors:
  batch:
    timeout: 3s
    send_batch_size: 100000

exporters:
  debug:
    verbosity: normal
  clickhouse:
    endpoint: tcp://clickhouse:9000?dial_timeout=10s&compress=lz4&username=default&password=password
    database: default
    ttl: 0
    logs_table_name: otel_logs
    traces_table_name: otel_traces
    metrics_table_name: otel_metrics
    timeout: 5s
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s

extensions:
  health_check:
  pprof:
  zpages:

service:
  extensions: [health_check, pprof, zpages]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug, clickhouse]

Log output

No response

Additional context

No response

@Lincyaw Lincyaw added bug Something isn't working needs triage New item requiring triage labels Jul 4, 2024
Copy link
Contributor

github-actions bot commented Jul 4, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Lincyaw
Copy link
Author

Lincyaw commented Jul 4, 2024

@SpencerTorres
Copy link
Member

Hello! Thanks for the detailed issue and sample data.

Are the results correct with a different exporter? If you wrote the lines to a file, would they match up in that case? I want to make sure this isn't an issue with ClickHouse.

Also it looks like you're counting from metrics. You should check to confirm that the metrics are not being grouped at any point. Possible points where they could be summed/grouped:

  • The OTel SDK in your metrics generating app
  • The OTel collector pipeline
  • ClickHouse, assuming you're using an aggregate function to sum up metrics

Also check the exporter logs to see if everything is being exported correctly (no dropped/failed batches).
Again, to confirm where this may be happening, you should do another test where you write to ClickHouse and another exporter.

@Frapschen
Copy link
Contributor

Hi, @Lincyaw Help me add the below config to your otel config:

service:
  telemetry:
    logs:
      level: "info"
    metrics:
      address: 0.0.0.0:8888

Please check the metrics related to receiver and exporter metrics from 0.0.0.0:8888. them will record the count of exported or dropped metrics points by clickhouse exporter.

Copy link
Contributor

github-actions bot commented Dec 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/clickhouse Stale
Projects
None yet
Development

No branches or pull requests

4 participants