Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Redpanda Migrator components #3026

Merged
merged 27 commits into from
Jan 15, 2025

Conversation

mihaitodor
Copy link
Collaborator

@mihaitodor mihaitodor commented Nov 21, 2024

I hijacked this PR to address several issues:

Fixed

  • The redpanda_migrator output no longer rejects messages if it can't perform schema ID translation.
  • The redpanda_migrator input no longer converts the kafka key to string.

Added

  • New redpanda_migrator_offsets input.
  • Fields offset_topic, offset_group, offset_partition, offset_commit_timestamp and offset_metadata added to the redpanda_migrator_offsets output.
  • Field topic_lag_refresh_period added to the redpanda and redpanda_common inputs.
  • Metric redpanda_lag now emitted by the redpanda and redpanda_common inputs.
  • Metadata kafka_lag now emitted by the redpanda and redpanda_common inputs.
  • The redpanda_migrator_bundle input and output now set labels for their subcomponents.

Changed

  • The kafka_key and max_in_flight fields of the redpanda_migrator_offsets output are now deprecated.
  • Fields batch_size, multi_header, replication_factor, replication_factor_override and output_resource for the redpanda_migrator input are now deprecated.
  • Fields kafka_key and max_in_flight for the redpanda_migrator_offsets output are now deprecated.
  • Field batching for the redpanda_migrator output is now deprecated.
  • The redpanda_migrator input no longer emits tombstone messages.

Redpanda Migrator offset metadata

One quick way to test this is via the following config. Note how I overwrite kafka_offset_metadata to foobar in a mapping processor.

input:
  redpanda_migrator_bundle:
    redpanda_migrator:
      seed_brokers: [ "localhost:9092" ]
      topics:
        - '^[^_]' # Skip internal topics which start with `_`
      regexp_topics: true
      consumer_group: migrator_bundle
      start_from_oldest: true
      replication_factor_override: true
      replication_factor: -1

    schema_registry:
      url: http://localhost:8081
      include_deleted: true
      subject_filter: ""

output:
  processors:
    - switch:
        - check: metadata("input_label") == "redpanda_migrator_offsets_input"
          processors:
            - mapping: |
                meta kafka_offset_metadata = "foobar"
  redpanda_migrator_bundle:
    redpanda_migrator:
      seed_brokers: [ "localhost:9093" ]
      max_in_flight: 1
      replication_factor_override: true
      replication_factor: -1

    schema_registry:
      url: http://localhost:8082

@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch from 34421d0 to 081592f Compare November 21, 2024 02:46
@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch 12 times, most recently from a86bdbd to 72237c4 Compare December 12, 2024 01:18
@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch 5 times, most recently from d37239f to 784ff42 Compare December 16, 2024 11:16
@mihaitodor mihaitodor changed the title Add Redpanda Migrator offset metadata Refactor Redpanda Migrator components Dec 16, 2024
@mihaitodor mihaitodor marked this pull request as ready for review December 16, 2024 11:21
log: res.Logger(),
shutSig: shutdown.NewSignaller(),
clientOpts: optsFn,
topicLagGauge: res.Metrics().NewGauge("redpanda_lag", "topic", "partition"),
Copy link
Collaborator Author

@mihaitodor mihaitodor Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I added the redpanda_migrator input, I had both this gauge and the kafka_lag metadata field. I don't know if we want any of these available by default. Also, should this gauge name be somehow derived from the actual input type (redpanda, redpanda_common, redpanda_migrator, redpanda_migrator_offsets)? It does get the label of the input if set, so maybe that's sufficient.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the label is enough. Do we really want this lag metric for all these inputs? Probably I would assume...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think it's a bit overkill and I don't recall now which conversation led to this pattern. I also emit the kafka_lag metadata field with each message, so one could add a metric processor in the pipeline which creates a gauge for topics as needed. One downside with this approach is if messages stop flowing completely, then this gauge wouldn't get any updates. I think the main idea was to make it easier for people to discover this metric, but it's not clear what the perf impact might be if we consume from thousands of topics, each having multiple partitions. Should I remove it? (cc @Jeffail)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having the metric emitted here, it's relatively cheap, and extracting from meta is awkward enough no one is going to do it willingly.

@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch from 784ff42 to 642fd09 Compare December 16, 2024 11:43
@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch 6 times, most recently from 34c5d16 to 5749553 Compare December 31, 2024 14:13
@mihaitodor mihaitodor requested review from Jeffail and removed request for Jeffail December 31, 2024 14:50
Signed-off-by: Mihai Todor <todormihai@gmail.com>
- New `redpanda_migrator_offsets` input
- Fields `offset_topic`, `offset_group`, `offset_partition`, `offset_commit_timestamp` and `offset_metadata` added to the `redpanda_migrator_offsets` output

Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
This is required in order to pull in twmb/franz-go#838

This is needed because the `redpanda_migrator` input needs to
create all the matched topics during the first call to
`ReadBatch()`.

Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
- Move OnConnect topic creation logic to the output to avoid the
circular dependency between the input and output (the input
doesn't need to know about the output)
- Clean up error handling

Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
Signed-off-by: Mihai Todor <todormihai@gmail.com>
This won't work until data is actually fetched...

Signed-off-by: Mihai Todor <todormihai@gmail.com>
@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch from 0ce0e32 to 9f67fea Compare January 15, 2025 14:07
Signed-off-by: Mihai Todor <todormihai@gmail.com>
@mihaitodor mihaitodor force-pushed the mihaitodor-add-redpanda-migrator-offset-metadata branch from 11a3078 to 037d93b Compare January 15, 2025 15:47
@mihaitodor mihaitodor merged commit e9a056c into main Jan 15, 2025
4 checks passed
@mihaitodor mihaitodor deleted the mihaitodor-add-redpanda-migrator-offset-metadata branch January 15, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants