Refactor Redpanda Migrator components #3026

mihaitodor · 2024-11-21T02:27:15Z

I hijacked this PR to address several issues:

Fixed

The redpanda_migrator output no longer rejects messages if it can't perform schema ID translation.
The redpanda_migrator input no longer converts the kafka key to string.

Added

New redpanda_migrator_offsets input.
Fields offset_topic, offset_group, offset_partition, offset_commit_timestamp and offset_metadata added to the redpanda_migrator_offsets output.
Field topic_lag_refresh_period added to the redpanda and redpanda_common inputs.
Metric redpanda_lag now emitted by the redpanda and redpanda_common inputs.
Metadata kafka_lag now emitted by the redpanda and redpanda_common inputs.
The redpanda_migrator_bundle input and output now set labels for their subcomponents.

Changed

The kafka_key and max_in_flight fields of the redpanda_migrator_offsets output are now deprecated.
Fields batch_size, multi_header, replication_factor, replication_factor_override and output_resource for the redpanda_migrator input are now deprecated.
Fields kafka_key and max_in_flight for the redpanda_migrator_offsets output are now deprecated.
Field batching for the redpanda_migrator output is now deprecated.
The redpanda_migrator input no longer emits tombstone messages.

Redpanda Migrator offset metadata

One quick way to test this is via the following config. Note how I overwrite kafka_offset_metadata to foobar in a mapping processor.

input:
  redpanda_migrator_bundle:
    redpanda_migrator:
      seed_brokers: [ "localhost:9092" ]
      topics:
        - '^[^_]' # Skip internal topics which start with `_`
      regexp_topics: true
      consumer_group: migrator_bundle
      start_from_oldest: true
      replication_factor_override: true
      replication_factor: -1

    schema_registry:
      url: http://localhost:8081
      include_deleted: true
      subject_filter: ""

output:
  processors:
    - switch:
        - check: metadata("input_label") == "redpanda_migrator_offsets_input"
          processors:
            - mapping: |
                meta kafka_offset_metadata = "foobar"
  redpanda_migrator_bundle:
    redpanda_migrator:
      seed_brokers: [ "localhost:9093" ]
      max_in_flight: 1
      replication_factor_override: true
      replication_factor: -1

    schema_registry:
      url: http://localhost:8082

mihaitodor · 2024-12-16T11:39:41Z

internal/impl/kafka/franz_reader_ordered.go

+		log:               res.Logger(),
+		shutSig:           shutdown.NewSignaller(),
+		clientOpts:        optsFn,
+		topicLagGauge:     res.Metrics().NewGauge("redpanda_lag", "topic", "partition"),


When I added the redpanda_migrator input, I had both this gauge and the kafka_lag metadata field. I don't know if we want any of these available by default. Also, should this gauge name be somehow derived from the actual input type (redpanda, redpanda_common, redpanda_migrator, redpanda_migrator_offsets)? It does get the label of the input if set, so maybe that's sufficient.

I think the label is enough. Do we really want this lag metric for all these inputs? Probably I would assume...

I also think it's a bit overkill and I don't recall now which conversation led to this pattern. I also emit the kafka_lag metadata field with each message, so one could add a metric processor in the pipeline which creates a gauge for topics as needed. One downside with this approach is if messages stop flowing completely, then this gauge wouldn't get any updates. I think the main idea was to make it easier for people to discover this metric, but it's not clear what the perf impact might be if we consume from thousands of topics, each having multiple partitions. Should I remove it? (cc @Jeffail)

I like having the metric emitted here, it's relatively cheap, and extracting from meta is awkward enough no one is going to do it willingly.