Skip to content

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Dec 30, 2025

Describe the issue this Pull Request addresses

We see occasional timeouts with the utilities tests in Azure CI so this PR aims to improve the test runtimes to ensure CI is not flakey.

Summary and Changelog

  • Updates the KafkaTestUtils to use a test container instead of the deprecated streamer library. With this change, the test utils are setup once per class. This cut the runtime of TestHoodieDeltaStreamer from 31 minutes to 12 minutes on my local machine.
  • Streamers are shutdown in tests to ensure there are no lingering threadpools or resources leftover
  • Minor fixes include fixing logging, avoiding printing

Impact

Improve CI stability

Risk Level

None

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added size:M PR with lines of changes in (100, 300] size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Dec 30, 2025
command: 'run'
arguments: >
-v $(Build.SourcesDirectory):/hudi
-v /var/run/docker.sock:/var/run/docker.sock
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for running docker in docker


import scala.Tuple2;

public class KafkaTestUtils {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop in replacement for the existing KafkaTestUtils but does not require zookeeper

HoodieTableMetaClient meta = createMetaClient(storage, tablePath);
HoodieTimeline timeline = meta.getActiveTimeline().getCommitAndReplaceTimeline().filterCompletedInstants();
LOG.info("Timeline Instants=" + meta.getActiveTimeline().getInstants());
LOG.info("Timeline Instants={}", meta.getActiveTimeline().getInstants());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed logging to avoid the toString calls while I was updating this file

<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to be where it is required

HoodieStreamer streamer = new HoodieStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties()));
streamer.sync();
successTables.add(Helpers.getTableWithDatabase(context));
streamer.shutdownGracefully();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the instances are shutdown properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. should we move this to finally block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense

// Initial bulk insert to ingest to first hudi table
HoodieDeltaStreamer.Config cfg = getConfig(tableBasePath, getSyncNames("MockSyncTool1", "MockSyncTool2"));
new HoodieDeltaStreamer(cfg, jsc, fs, hiveServer.getHiveConf()).sync();
syncOnce(new HoodieDeltaStreamer(cfg, jsc, fs, hiveServer.getHiveConf()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of changes in this PR are using utils or other means to ensure the streamer is shutdown


Map<String, String> hudiOpts = new HashMap<>();
public KafkaTestUtils testUtils;
protected static KafkaTestUtils testUtils;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are now only setting up kafka once to cut down on the overhead

@the-other-tim-brown the-other-tim-brown changed the title use kafka container for tests chore(ci): Hudi-utilities test improvements Dec 31, 2025
@the-other-tim-brown the-other-tim-brown marked this pull request as ready for review December 31, 2025 17:15
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job on the fixes and test run time reduction

HoodieStreamer streamer = new HoodieStreamer(context.getConfig(), jssc, Option.ofNullable(context.getProperties()));
streamer.sync();
successTables.add(Helpers.getTableWithDatabase(context));
streamer.shutdownGracefully();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor. should we move this to finally block?


@AfterEach
public void cleanupClass() {
@AfterAll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep AfterEach and add testUtils.deleteTopics(); in that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It generally doesn't make any impact since the topics are so small but I can add it


@AfterEach
public void tearDown() {
@AfterAll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep AfterEach and add testUtils.deleteTopics(); in that?

testUtils.setup();
}

@AfterEach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we keep AfterEach and add testUtils.deleteTopics(); in that?


@AfterEach
public void teardown() throws Exception {
@AfterAll
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit bd0ae66 into apache:master Jan 1, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants