Skip to content

Conversation

@kewats
Copy link

@kewats kewats commented Nov 28, 2025

Summary

Optimizes Docker build performance by implementing BuildKit cache mounts for APT package operations. This change reduces rebuild times by 2-6 minutes per build while maintaining identical image outputs and sizes.

Key changes:

  • Add --mount=type=cache for /var/cache/apt and /var/lib/apt/lists in all apt-get commands
  • Remove rm -rf /var/lib/apt/lists/* cleanup commands (no longer needed with external cache mounts)
  • Apply consistently across all three ingestion Dockerfiles

Performance impact:

  • First build: No change
  • Subsequent builds: faster APT operations
  • Zero runtime impact: Final images are identical in size and behavior

Technical Details

BuildKit cache mounts provide persistent external caches that survive between builds but are not included in final images. This eliminates redundant package downloads when rebuilding after upstream layer changes, which is especially valuable for DataHub's multi-stage builds with heavy dependencies (LDAP, Kerberos, ODBC, Kafka, JRE, etc.).

The sharing=locked parameter prevents cache corruption in parallel build scenarios.

Files Changed

  • docker/datahub-actions/Dockerfile - 4 apt-get commands optimized
  • docker/datahub-ingestion-base/Dockerfile - 4 apt-get commands optimized
  • docker/datahub-ingestion/Dockerfile - 4 apt-get commands optimized

Test Plan

  • Verify Docker builds complete successfully with BuildKit enabled
  • Confirm image sizes remain unchanged
  • Measure build time improvements on second build

Requirements

Requires Docker with BuildKit support (default since Docker 23.0). Legacy docker build without BuildKit will fail gracefully with clear error messages.

@github-actions github-actions bot added devops PR or Issue related to DataHub backend & deployment community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 28, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Nov 28, 2025
@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution PR or Issue raised by member(s) of DataHub Community devops PR or Issue related to DataHub backend & deployment needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants