Skip to content

Conversation

@titaneric
Copy link
Contributor

Summary

This PR intends to improve the avro encoing error message as described in #24046. Luckily, we could simply bump the avro-rs crate to resolve this error.

Vector configuration

Create demo logs deems to have encoding error, providing EXACT error described in the issue.

api:
  enabled: true
  graphql: true
  playground: true
  address: "127.0.0.1:8686"

sources:
  sample_logs:
    type: demo_logs
    format: shuffle
    lines:
    - '{"timestamp": "2025-10-30T10:01:00Z", "level": "ERROR", "message": "UNION MISMATCH: user_id as string instead of long", "user_id": "not_a_number", "session_id": "def456", "metadata": null, "error_details": null, "response_time": null}'

transforms:
  # No transformation needed - the logs are already in the right format
  passthrough:
    type: remap
    inputs:
    - sample_logs
    source: |
      # Just pass through the data as-is
      message = .message
      del(.)
      . = parse_json!(message)

sinks:
  # JSON output for debugging - this should work fine
  debug_json:
    type: console
    inputs:
    - passthrough
    encoding:
      codec: json
  # Avro output with strict union types - this should trigger the union mismatch errors
  avro_strict_unions:
    type: console
    inputs:
    - passthrough
    encoding:
      codec: avro
      avro:
        schema: |
          {
            "type": "record",
            "name": "LogEntry",
            "namespace": "com.example.logs",
            "fields": [
              {
                "name": "timestamp",
                "type": "string"
              },
              {
                "name": "level",
                "type": {
                  "type": "enum",
                  "name": "LogLevel",
                  "symbols": ["DEBUG", "INFO", "WARN", "ERROR", "FATAL"]
                }
              },
              {
                "name": "message",
                "type": "string"
              },
              {
                "name": "user_id",
                "type": ["null", "long"],
                "default": null
              },
              {
                "name": "session_id",
                "type": ["null", "string"],
                "default": null
              },
              {
                "name": "metadata",
                "type": [
                  "null",
                  {
                    "type": "record",
                    "name": "Metadata",
                    "fields": [
                      {
                        "name": "ip",
                        "type": "string"
                      },
                      {
                        "name": "user_agent",
                        "type": ["null", "string"],
                        "default": null
                      }
                    ]
                  }
                ],
                "default": null
              },
              {
                "name": "error_details",
                "type": [
                  "null",
                  {
                    "type": "record",
                    "name": "ErrorDetails",
                    "fields": [
                      {
                        "name": "code",
                        "type": "int"
                      },
                      {
                        "name": "description",
                        "type": "string"
                      }
                    ]
                  }
                ],
                "default": null
              },
              {
                "name": "response_time",
                "type": ["null", "double"],
                "default": null
              }
            ]
          }

How did you test this PR?

Given the previous vector config, run the following command to have better error message.

cargo build --no-default-features --features sources-demo_logs --features sinks-console --features transforms-remap  --features api
 ./target/debug/vector --config ./avro_union_file_repro.yaml -v

AS IS

2025-10-30T18:17:50.409039Z  INFO vector: Vector has started. debug="true" version="0.51.0" arch="aarch64" revision=""
2025-10-30T18:17:50.409631Z DEBUG source{component_kind="source" component_id=sample_logs component_type=demo_logs}: vector::topology::builder: Source pump starting.
2025-10-30T18:17:50.410361Z DEBUG vector::utilization: component_id=debug_json utilization=0.9952
2025-10-30T18:17:50.410373Z DEBUG vector::utilization: component_id=passthrough utilization=0.9937
2025-10-30T18:17:50.410381Z DEBUG vector::utilization: component_id=avro_strict_unions utilization=0.9949
{"error_details":null,"level":"ERROR","message":"UNION MISMATCH: user_id as string instead of long","metadata":null,"response_time":null,"session_id":"def456","timestamp":"2025-10-30T10:01:00Z","user_id":"not_a_number"}
2025-10-30T18:17:50.411742Z ERROR sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector::internal_events::codecs: Failed serializing frame. error=Could not find matching type in union error_code="encoder_serialize" error_type="encoder_failed" stage="sending"
2025-10-30T18:17:50.411814Z ERROR sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Failed serializing frame.

TO BE

2025-10-30T18:08:30.568841Z  INFO vector: Vector has started. debug="true" version="0.51.0" arch="aarch64" revision=""
2025-10-30T18:08:30.568986Z DEBUG source{component_kind="source" component_id=sample_logs component_type=demo_logs}: vector::topology::builder: Source pump starting.
2025-10-30T18:08:30.569848Z DEBUG vector::utilization: component_id=passthrough utilization=0.9936
2025-10-30T18:08:30.569864Z DEBUG vector::utilization: component_id=avro_strict_unions utilization=0.9945
2025-10-30T18:08:30.569873Z DEBUG vector::utilization: component_id=debug_json utilization=0.9944
{"error_details":null,"level":"ERROR","message":"UNION MISMATCH: user_id as string instead of long","metadata":null,"response_time":null,"session_id":"def456","timestamp":"2025-10-30T10:01:00Z","user_id":"not_a_number"}
2025-10-30T18:08:30.571085Z ERROR sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector::internal_events::codecs: Failed serializing frame. error=Could not find matching type in UnionSchema { schemas: [Null, Long], variant_index: {Null: 0, Long: 1} } for String("not_a_number") error_code="encoder_serialize" error_type="encoder_failed" stage="sending"
2025-10-30T18:08:30.571176Z ERROR sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Failed serializing frame."
2025-10-30T18:08:30.571244Z DEBUG sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector::topology::builder: Sink finished with an error.
2025-10-30T18:08:30.571262Z ERROR sink{component_kind="sink" component_id=avro_strict_unions component_type=console}: vector::topology: An error occurred that Vector couldn't handle: the task completed with an error

Please note that the error message now print what's field that does not match with the schema.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Closes: #24046

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

@titaneric titaneric requested a review from a team as a code owner October 30, 2025 18:20
Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @titaneric, this is a much more informative error.

@pront pront enabled auto-merge October 30, 2025 18:28
auto-merge was automatically disabled October 31, 2025 10:49

Head branch was pushed to by a user without write access

Signed-off-by: titaneric <chenyihuang001@gmail.com>
Signed-off-by: titaneric <chenyihuang001@gmail.com>
Signed-off-by: titaneric <chenyihuang001@gmail.com>
Signed-off-by: titaneric <chenyihuang001@gmail.com>
@titaneric titaneric force-pushed the feat/improve-avro-encode-error-msg branch from 42b58ba to c733585 Compare November 3, 2025 14:00
@titaneric
Copy link
Contributor Author

@pront , I fix the CI issue. Please take your time to move forward the PR.

@pront pront enabled auto-merge November 3, 2025 17:37
@pront pront added this pull request to the merge queue Nov 3, 2025
Merged via the queue into vectordotdev:master with commit aef66cf Nov 3, 2025
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avro encoding error message lacks detail.

2 participants