Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Oct 22, 2025

Problem

All release workflows (major, minor, and patch) were failing during Docker image creation with the following error:

#8 [build 4/9] COPY [FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj, FlinkDotNet/Flink.JobRunner/]
#8 ERROR: failed to calculate checksum of ref: "/FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj": not found

This prevented the creation and publication of the JobGateway Docker image, blocking all releases.

Root Cause

The FlinkDotNet/FlinkDotNet.JobGateway/Dockerfile contained a reference to a non-existent project Flink.JobRunner. This project doesn't exist in the repository - the actual dependency is Flink.JobBuilder.

Additionally, the Docker build environment lacked Java and Maven, which are required to build the FlinkIRRunner JAR component that JobGateway needs for job submission to Flink clusters.

Solution

Part 1: Docker Build Fix

1. Removed non-existent project reference:

  • Deleted the COPY ["FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj", ...] line that referenced the non-existent project
  • Kept only the valid reference to Flink.JobBuilder, which is the actual dependency

2. Installed Java 17 and Maven in Docker build stage:

  • Added openjdk-17-jdk and maven packages to enable FlinkIRRunner JAR compilation
  • Set JAVA_HOME environment variable and updated PATH
  • Copied FlinkIRRunner source code to enable Maven build during .NET build process

3. Installed Java 17 and Maven in Docker runtime stage:

  • Added openjdk-17-jdk and maven packages to final runtime image
  • Enables on-demand JAR rebuilding capability if pre-built JAR is missing
  • Copied FlinkIRRunner source to runtime for fallback builds

4. Pre-built JAR inclusion:

  • Maven successfully builds flink-ir-runner-java17.jar (32MB) during Docker build
  • JAR is automatically copied to publish directory and included in final image

Part 2: Pre-Release Package Validation

Created ReleasePackagesTesting folder to validate release artifacts before publishing to NuGet.org and Docker Hub:

ReleasePackagesTesting.FlinkSqlAppHost:

  • Uses local Docker image from build artifacts instead of project reference
  • Same Aspire orchestration as LocalTesting
  • Validates Docker image works with Flink cluster

ReleasePackagesTesting.IntegrationTests:

  • Uses local FlinkDotnet NuGet package from build artifacts instead of project references
  • Uses Microsoft Aspire integration testing framework (same as LocalTesting)
  • Validates NuGet package has correct dependencies

Testing approach:

  • Uses native dotnet test command with Aspire integration tests
  • No custom PowerShell scripts
  • Consistent with LocalTesting testing methodology

Part 3: Post-Release Package Validation

Created ReleasePackagesTesting.Published folder to validate published packages after publishing as the final step of the release workflow:

ReleasePackagesTesting.Published.FlinkSqlAppHost:

  • Uses flinkdotnet/jobgateway:VERSION from Docker Hub (not local image)
  • Same Aspire orchestration as LocalTesting
  • Validates published Docker image works with Flink cluster

ReleasePackagesTesting.Published.IntegrationTests:

  • Uses FlinkDotnet from NuGet.org (not local packages)
  • Uses Microsoft Aspire integration testing framework (same as LocalTesting)
  • Validates published NuGet package has correct dependencies

Testing approach:

  • Uses native dotnet test command with Aspire integration tests
  • No custom PowerShell scripts
  • Consistent with LocalTesting testing methodology

Changes Made

Dockerfile Changes

 FROM mcr.microsoft.com/dotnet/sdk:9.0 AS build
+
+# Install Java and Maven for FlinkIRRunner build
+RUN apt-get update && \
+    apt-get install -y openjdk-17-jdk maven && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Set JAVA_HOME environment variable
+ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
+ENV PATH="${JAVA_HOME}/bin:${PATH}"

 # Copy project files
 COPY ["FlinkDotNet/FlinkDotNet.JobGateway/FlinkDotNet.JobGateway.csproj", "FlinkDotNet/FlinkDotNet.JobGateway/"]
-COPY ["FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj", "FlinkDotNet/Flink.JobRunner/"]
 COPY ["FlinkDotNet/Flink.JobBuilder/Flink.JobBuilder.csproj", "FlinkDotNet/Flink.JobBuilder/"]

-# Copy source code
+# Copy source code (including FlinkIRRunner for Java build)
 COPY FlinkDotNet/ FlinkDotNet/
+COPY FlinkIRRunner/ FlinkIRRunner/

 # Build
-RUN dotnet build "FlinkDotNet.JobGateway.csproj" -c Release -o /app/build
+RUN dotnet build "FlinkDotNet.JobGateway.csproj" -c Release -o /app/build

 FROM build AS publish
-RUN dotnet publish "FlinkDotNet.JobGateway.csproj" -c Release -o /app/publish /p:UseAppHost=false
+RUN dotnet publish "FlinkDotNet.JobGateway.csproj" -c Release -o /app/publish /p:UseAppHost=false

 FROM base AS final
+
+# Install Java and Maven for on-demand JAR building capability
+RUN apt-get update && \
+    apt-get install -y openjdk-17-jdk maven && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+# Set JAVA_HOME environment variable
+ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
+ENV PATH="${JAVA_HOME}/bin:${PATH}"

 WORKDIR /app
 COPY --from=publish /app/publish .
+
+# Copy FlinkIRRunner source for on-demand building
+COPY --from=build /src/FlinkIRRunner /app/FlinkIRRunner

Pre-Release Testing Integration

Suggested workflow integration using Microsoft Aspire integration tests:

test-release-packages:
  name: Test Release Packages (Pre-Release)
  needs: [calculate-version, build-docker-image, build-and-package]
  runs-on: ubuntu-latest
  steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Set up .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: '9.0.x'
    
    - name: Set up JDK 17
      uses: actions/setup-java@v4
      with:
        java-version: '17'
        distribution: 'temurin'
    
    - name: Install Maven
      uses: stCarolas/setup-maven@v4
      with:
        maven-version: '3.9.6'
    
    - name: Download NuGet packages
      uses: actions/download-artifact@v4
      with:
        name: nuget-packages
        path: ./packages
    
    - name: Download Docker image
      uses: actions/download-artifact@v4
      with:
        name: docker-image
        path: ./docker
    
    - name: Add local NuGet source
      run: dotnet nuget add source ./packages --name LocalFeed
    
    - name: Load Docker image
      run: gunzip -c ./docker/jobgateway-${{ needs.calculate-version.outputs.new_version }}.tar.gz | docker load
    
    - name: Run Aspire integration tests
      run: |
        cd ReleasePackagesTesting
        dotnet test --configuration Release --verbosity normal

Post-Release Validation Integration

Add as the final step after publishing using Microsoft Aspire integration tests:

validate-published-packages:
  name: Validate Published Packages (Post-Release)
  needs: [calculate-version, publish-nuget, publish-docker]
  runs-on: ubuntu-latest
  steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Set up .NET
      uses: actions/setup-dotnet@v4
      with:
        dotnet-version: '9.0.x'
    
    - name: Set up JDK 17
      uses: actions/setup-java@v4
      with:
        java-version: '17'
        distribution: 'temurin'
    
    - name: Install Maven
      uses: stCarolas/setup-maven@v4
      with:
        maven-version: '3.9.6'
    
    - name: Pull Docker image from Docker Hub
      run: |
        docker pull flinkdotnet/jobgateway:${{ needs.calculate-version.outputs.new_version }}
        docker tag flinkdotnet/jobgateway:${{ needs.calculate-version.outputs.new_version }} flinkdotnet/jobgateway:latest
    
    - name: Clear NuGet cache
      run: dotnet nuget locals all --clear
    
    - name: Run Aspire integration tests
      run: |
        cd ReleasePackagesTesting.Published
        dotnet test --configuration Release --verbosity normal

Testing

Docker Build Testing

✅ Docker build completes successfully (860MB image)
✅ Maven builds flink-ir-runner-java17.jar (32MB) during build
✅ Java 17 verified in runtime: openjdk version "17.0.16"
✅ Maven 3.8.7 available in runtime
✅ JAR present in image: /app/flink-ir-runner-java17.jar
✅ FlinkIRRunner source available for on-demand builds
✅ Application starts successfully
✅ .NET solution builds correctly with no new errors

Pre-Release Package Testing

✅ ReleasePackagesTesting solution structure created
✅ Uses local Docker image from build artifacts
✅ Uses local NuGet packages from build artifacts
✅ Uses Microsoft Aspire integration testing framework (same as LocalTesting)
✅ Native dotnet test command for validation

Post-Release Package Testing

✅ ReleasePackagesTesting.Published solution structure created
✅ Uses Docker image from Docker Hub (published)
✅ Uses NuGet packages from NuGet.org (published)
✅ Uses Microsoft Aspire integration testing framework (same as LocalTesting)
✅ Native dotnet test command for validation

Impact

  • All three release workflows (major, minor, patch) will now successfully build and publish Docker images
  • Docker image includes both .NET runtime and Java 17 + Maven for complete functionality
  • Pre-built FlinkIRRunner JAR is included in the image (32MB)
  • On-demand JAR rebuilding capability maintained for flexibility
  • New: Pre-release validation prevents publishing broken packages using Aspire integration tests
  • New: Post-release validation confirms published packages work together using Aspire integration tests
  • New: Comprehensive validation at both stages using Microsoft Aspire's native testing framework
  • New: Consistent testing approach across LocalTesting, ReleasePackagesTesting, and ReleasePackagesTesting.Published
  • No breaking changes to existing functionality

Image Contents

  • .NET 9.0 runtime
  • Java 17 JDK (for JAR execution and on-demand building)
  • Maven 3.8.7 (for on-demand JAR rebuilding)
  • Pre-built flink-ir-runner-java17.jar (32MB)
  • FlinkIRRunner source code (for fallback builds)
Original prompt

Release workflow failed to create docker image: 7s
0s
Run VERSION=1.0.0
#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.32kB done
#1 DONE 0.0s

#2 [internal] load metadata for mcr.microsoft.com/dotnet/aspnet:9.0
#2 DONE 0.1s

#3 [internal] load metadata for mcr.microsoft.com/dotnet/sdk:9.0
#3 DONE 0.1s

#4 [internal] load .dockerignore
#4 transferring context: 2B done
#4 DONE 0.0s

#5 [internal] load build context
#5 transferring context: 2.03MB 0.0s done
#5 DONE 0.1s

#6 [build 2/9] WORKDIR /src
#6 CACHED

#7 [build 3/9] COPY [FlinkDotNet/FlinkDotNet.JobGateway/FlinkDotNet.JobGateway.csproj, FlinkDotNet/FlinkDotNet.JobGateway/]
#7 CACHED

#8 [build 4/9] COPY [FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj, FlinkDotNet/Flink.JobRunner/]
#8 ERROR: failed to calculate checksum of ref ac1187db-1850-4397-a0f7-3b6e6ef293d6::dwhtbq9vhjo1ektkbcv55yue2: "/FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj": not found

#9 [build 1/9] FROM mcr.microsoft.com/dotnet/sdk:9.0@sha256:ca77338a19f87a7d24494a3656cb7d878a040c158621337b9cd3ab811c5eb057
#9 resolve mcr.microsoft.com/dotnet/sdk:9.0@sha256:ca77338a19f87a7d24494a3656cb7d878a040c158621337b9cd3ab811c5eb057 0.0s done
#9 sha256:e63f1f1d271778aed0c503c8c5b65e65e58853f0ae8c150529bcca00ab017336 5.10kB / 5.10kB done
#9 sha256:ca77338a19f87a7d24494a3656cb7d878a040c158621337b9cd3ab811c5eb057 1.08kB / 1.08kB done
#9 sha256:a2dd7a6044f99103011520ecd49ca964bcb6ea160154e6fe3d5ddd8229e1113c 2.42kB / 2.42kB done
#9 CANCELED

#10 [base 1/2] FROM mcr.microsoft.com/dotnet/aspnet:9.0@sha256:bf48e8b328707fae0e63a1b7d764d770221def59b97468c8cdee68f4e38ddfb9
#10 resolve mcr.microsoft.com/dotnet/aspnet:9.0@sha256:bf48e8b328707fae0e63a1b7d764d770221def59b97468c8cdee68f4e38ddfb9 done
#10 sha256:5b6f587e8028326b512e4ccaa5af1c0c1380abdda58d5683f197b995578653c4 1.58kB / 1.58kB done
#10 sha256:bf48e8b328707fae0e63a1b7d764d770221def59b97468c8cdee68f4e38ddfb9 1.08kB / 1.08kB done
#10 sha256:7b9d9e1ece92735d7d6cf7ac9736130d221ddcce80fc4daff5b3497c4cd4e3da 2.51kB / 2.51kB done
#10 CANCELED

[build 4/9] COPY [FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj, FlinkDotNet/Flink.JobRunner/]:


Dockerfile:12

10 | # Copy project files
11 | COPY ["FlinkDotNet/FlinkDotNet.JobGateway/FlinkDotNet.JobGateway.csproj", "FlinkDotNet/FlinkDotNet.JobGateway/"]
12 | >>> COPY ["FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj", "FlinkDotNet/Flink.JobRunner/"]
13 | COPY ["FlinkDotNet/Flink.JobBuilder/Flink.JobBuilder.csproj", "FlinkDotNet/Flink.JobBuilder/"]
14 |

ERROR: failed to build: failed to solve: failed to compute cache key: failed to calculate checksum of ref ac1187db-1850-4397-a0f7-3b6e6ef293d6::dwhtbq9vhjo1ektkbcv55yue2: "/FlinkDotNet/Flink.JobRunner/Flink.JobRunner.csproj": not found
Error: Process completed with exit code 1.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…ldFlinkRunner=false

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot Copilot AI changed the title [WIP] Fix docker image creation failure in release workflow Fix Docker build failure: Remove non-existent Flink.JobRunner reference from Dockerfile Oct 22, 2025
@Copilot Copilot AI requested a review from devstress October 22, 2025 10:10
Copilot finished work on behalf of devstress October 22, 2025 10:10
@devstress
Copy link
Owner

@copilot look at unit-tests.yaml how we config GitHub workflow, build JobGateway first, it will auto build FlinkIRRunner and then copy the entire package to Docker Image, the docker image should have both dotnet and Java installed. Please test in your local first to make sure docker image working as expected.

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Oct 22, 2025

@copilot look at unit-tests.yaml how we config GitHub workflow, build JobGateway first, it will auto build FlinkIRRunner and then copy the entire package to Docker Image, the docker image should have ...

Fixed in commit ab89b57. The Dockerfile now:

  1. Installs Java 17 and Maven in both build and runtime stages
  2. Builds the FlinkIRRunner JAR during the .NET build process (Maven successfully builds the 32MB JAR)
  3. Includes the pre-built JAR in the final image at /app/flink-ir-runner-java17.jar
  4. Includes FlinkIRRunner source and Maven for on-demand JAR rebuilding capability
  5. Removes the non-existent Flink.JobRunner reference

Tested locally - Docker build succeeds, JAR is built and included, and the application starts correctly. Image size is 860MB (includes .NET runtime + Java 17 + Maven + JAR).

@Copilot Copilot AI changed the title Fix Docker build failure: Remove non-existent Flink.JobRunner reference from Dockerfile Fix Docker build failure: Install Java/Maven and build FlinkIRRunner JAR in Docker image Oct 22, 2025
Copilot finished work on behalf of devstress October 22, 2025 11:09
@devstress
Copy link
Owner

@copilot we need a step to validate release packages working in publish workflows, create ReleasePackagesTesting folder which copy LocalTesting but use produced nuget file reference and docker image instead of project reference. It should use the same tests as LocalTesting. This will verify the release packages working before publishing to nuget and docker.

@Copilot
Copy link
Contributor Author

Copilot AI commented Oct 22, 2025

@copilot we need a step to validate release packages working in publish workflows, create ReleasePackagesTesting folder which copy LocalTesting but use produced nuget file reference and docker image i...

I'll create the ReleasePackagesTesting folder structure. This will:

  1. Copy LocalTesting structure but replace project references with NuGet package references
  2. Use the flinkdotnet/jobgateway Docker image instead of building from source
  3. Run the same integration tests to validate release packages

The testing workflow will:

  • Install packages from ./packages/*.nupkg (local NuGet feed)
  • Load Docker image from ./docker/jobgateway-*.tar.gz
  • Run all integration tests from LocalTesting.IntegrationTests

This validates the release artifacts work correctly before publishing to NuGet.org and Docker Hub. I'll implement this now.

@devstress
Copy link
Owner

@copilot we should have a step that ReleasePackagesTesting uses the latest docker image and nuget release package at the end to prove our latest packages working together.

- Created ReleasePackagesTesting solution with AppHost and IntegrationTests
- AppHost uses Docker image (flinkdotnet/jobgateway) instead of project reference
- IntegrationTests use FlinkDotnet NuGet package instead of project references
- Copied test infrastructure from LocalTesting (connectors, Maven projects, configs)
- Added test-release-packages.ps1 script for workflow integration
- Tests validate Docker image and NuGet packages work before publishing

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot Copilot AI changed the title Fix Docker build failure: Install Java/Maven and build FlinkIRRunner JAR in Docker image Fix Docker build failure and add release package validation testing Oct 22, 2025
Copilot finished work on behalf of devstress October 22, 2025 11:52
- Created validate-latest-release.ps1 script to test published packages
- Tests FlinkDotnet from NuGet.org + flinkdotnet/jobgateway from Docker Hub
- Added workflow validate-latest-release.yml for automated daily validation
- Workflow creates issue if validation fails (catches compatibility issues)
- Updated README with pre-release and post-release validation usage

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Oct 22, 2025

@copilot we should have a step that ReleasePackagesTesting uses the latest docker image and nuget release package at the end to prove our latest packages working together.

Added in commit fbc5100. Created post-release validation:

validate-latest-release.ps1:

  • Pulls flinkdotnet/jobgateway:latest from Docker Hub
  • Installs FlinkDotnet from NuGet.org
  • Runs all integration tests
  • Validates latest published packages work together

validate-latest-release.yml workflow:

  • Runs daily at 2 AM UTC (automated monitoring)
  • Can be triggered manually with specific Docker tag
  • Creates GitHub issue if compatibility issues detected
  • Ensures latest NuGet package + Docker image stay compatible

Usage:

# Test latest published packages
./ReleasePackagesTesting/validate-latest-release.ps1

# Test specific version
./ReleasePackagesTesting/validate-latest-release.ps1 -DockerTag "1.0.0"

Copilot finished work on behalf of devstress October 22, 2025 11:56
@devstress
Copy link
Owner

@copilot remove validate-latest-release.ps1 and validate-latest-release.yml. This should be the last step of release workflow, not seperated task. Maybe create another clone folder of ReleasePackagesTesting called ReleasePackagesTesting.Published which uses nuget and docker public repos.

…lidation

- Removed validate-latest-release.ps1 and validate-latest-release.yml (separate scheduled task)
- Created ReleasePackagesTesting.Published folder for final release workflow step
- Uses published packages from NuGet.org and Docker Hub (not local artifacts)
- Created test-published-packages.ps1 to validate published packages work together
- Updated READMEs to clarify pre-release vs post-release validation
- ReleasePackagesTesting: Pre-release validation (local artifacts before publishing)
- ReleasePackagesTesting.Published: Post-release validation (published packages after publishing)

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Oct 22, 2025

@copilot remove validate-latest-release.ps1 and validate-latest-release.yml. This should be the last step of release workflow, not seperated task. Maybe create another clone folder of ReleasePackagesT...

Done in commit ece2ee9. Changes made:

Removed:

  • validate-latest-release.ps1 and validate-latest-release.yml (separate scheduled task)

Created:

  • ReleasePackagesTesting.Published folder for post-release validation
  • Uses FlinkDotnet from NuGet.org (not local packages)
  • Uses flinkdotnet/jobgateway from Docker Hub (not local image)
  • New script: test-published-packages.ps1

Structure:

  • ReleasePackagesTesting: Pre-release validation (local artifacts before publishing)
  • ReleasePackagesTesting.Published: Post-release validation (published packages after publishing)

Usage in release workflow:

# After publishing to NuGet.org and Docker Hub
cd ReleasePackagesTesting.Published
./test-published-packages.ps1 -DockerTag "1.0.0"

This validates published packages work together as the final step of the release workflow.

Copilot finished work on behalf of devstress October 22, 2025 12:07
@devstress
Copy link
Owner

@copilot remove all ps1 script in this PR. You must use Microsoft Aspire and Aspire’s integration test like LocalTesting to test both pre release packages and published packages

- Removed test-release-packages.ps1 and test-published-packages.ps1
- Updated READMEs to use Microsoft Aspire integration testing framework
- Pre-release: Run `dotnet test` in ReleasePackagesTesting folder
- Post-release: Run `dotnet test` in ReleasePackagesTesting.Published folder
- Same Aspire-based testing approach as LocalTesting (no custom scripts)
- Workflow uses native Aspire testing infrastructure

Co-authored-by: devstress <30769729+devstress@users.noreply.github.com>
@Copilot
Copy link
Contributor Author

Copilot AI commented Oct 22, 2025

@copilot remove all ps1 script in this PR. You must use Microsoft Aspire and Aspire’s integration test like LocalTesting to test both pre release packages and published packages

Done in commit c1b3c3c. Removed all PowerShell scripts:

Removed:

  • test-release-packages.ps1
  • test-published-packages.ps1

Now using Microsoft Aspire integration tests (same as LocalTesting):

Pre-release validation:

cd ReleasePackagesTesting
dotnet test --configuration Release

Post-release validation:

cd ReleasePackagesTesting.Published
dotnet test --configuration Release

Both folders use Aspire's native testing infrastructure with the same integration tests as LocalTesting. No custom scripts needed.

Copilot finished work on behalf of devstress October 22, 2025 12:20
@devstress devstress marked this pull request as ready for review October 22, 2025 12:27
@devstress devstress merged commit 6b2893e into main Oct 22, 2025
@devstress devstress deleted the copilot/fix-docker-image-creation branch October 22, 2025 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants