Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated code lifecycle: Improve reinitialization when pausing build agents #9939

Merged
merged 33 commits into from
Dec 22, 2024

Conversation

BBesrour
Copy link
Member

@BBesrour BBesrour commented Dec 3, 2024

Checklist

General

Server

  • Important: I implemented the changes with a very good performance and prevented too many (unnecessary) and too complex database calls.
  • I strictly followed the principle of data economy for all database calls.
  • I strictly followed the server coding and design guidelines.
  • I added multiple integration tests (Spring) related to the features (with a high test coverage).
  • I added pre-authorization annotations according to the guidelines and checked the course groups for all new REST Calls (security).
  • I documented the Java code using JavaDoc style.

Changes affecting Programming Exercises

  • High priority: I tested all changes and their related features with all corresponding user types on a test server configured with the integrated lifecycle setup (LocalVC and LocalCI).

Motivation and Context

We sometimes get this error when a build agent has been running for some time

Nov 28 21:48:53 ---[3392294]: java.lang.IllegalStateException: Connection pool shut down
...
com.github.dockerjava.httpclient5.ApacheDockerHttpClient.execute(ApacheDockerHttpClient.java:9)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269)
Nov 28 21:48:53 artemis-production-agent23.artemis.cit.tum.de java[3392294]:         at java.base/java.lang.Thread.run(Thread.java:1583)

Description

We reinitialize Docker Client and ThreadPoolExecutor when pausing build agent.

Steps for Testing

Prerequisites:

  • 1 Admin
  1. Submit a few times to a programming exercise (Pls count how many submissions to verify that they all finished)
  2. Pause build agents
  3. Wait for the running build jobs to finish while making sure there are some queued jobs (you can submit a few more times after pausing the agent)
  4. Resume build agent
  5. wait for the build jobs to finish and make sure that they ran correctly (you can check this on the build overview page)

Testserver States

Note

These badges show the state of the test servers.
Green = Currently available, Red = Currently locked
Click on the badges to get to the test servers.







Review Progress

Performance Review

  • I (as a reviewer) confirm that the client changes (in particular related to REST calls and UI responsiveness) are implemented with a very good performance even for very large courses with more than 2000 students.
  • I (as a reviewer) confirm that the server changes (in particular related to database calls) are implemented with a very good performance even for very large courses with more than 2000 students.

Code Review

  • Code Review 1
  • Code Review 2

Manual Tests

  • Test 1
  • Test 2

Test Coverage

Screenshots

Summary by CodeRabbit

  • New Features

    • Enhanced management of build jobs and Docker client interactions.
    • Added lifecycle management methods for build executor and Docker client.
    • Improved error handling and logging for build job execution.
  • Bug Fixes

    • Enhanced error handling in container management and job execution.
  • Documentation

    • Updated method documentation to reflect changes in configuration and management.
  • Tests

    • Improved mock setup for Docker client interactions in tests.

@BBesrour BBesrour self-assigned this Dec 3, 2024
@BBesrour BBesrour requested a review from a team as a code owner December 3, 2024 23:16
@github-actions github-actions bot added server Pull requests that update Java code. (Added Automatically!) buildagent Pull requests that affect the corresponding module labels Dec 3, 2024
Copy link

coderabbitai bot commented Dec 3, 2024

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 pmd (7.8.0)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java

The following rules are missing or misspelled in your ruleset file category/vm/bestpractices.xml: BooleanInstantiation, DontImportJavaLang, DuplicateImports, EmptyFinallyBlock, EmptyIfStmt, EmptyInitializer, EmptyStatementBlock, EmptyStatementNotInLoop, EmptySwitchStatements, EmptySynchronizedBlock, EmptyTryBlock, EmptyWhileStmt, ExcessiveClassLength, ExcessiveMethodLength, ImportFromSamePackage, MissingBreakInSwitch, SimplifyBooleanAssertion. Please check your ruleset configuration.

src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java

The following rules are missing or misspelled in your ruleset file category/vm/bestpractices.xml: BooleanInstantiation, DontImportJavaLang, DuplicateImports, EmptyFinallyBlock, EmptyIfStmt, EmptyInitializer, EmptyStatementBlock, EmptyStatementNotInLoop, EmptySwitchStatements, EmptySynchronizedBlock, EmptyTryBlock, EmptyWhileStmt, ExcessiveClassLength, ExcessiveMethodLength, ImportFromSamePackage, MissingBreakInSwitch, SimplifyBooleanAssertion. Please check your ruleset configuration.

Walkthrough

The pull request introduces significant modifications to the build agent configuration and service classes, focusing on centralizing Docker client and build job executor management. The changes primarily involve replacing direct dependencies on DockerClient and ExecutorService with a new BuildAgentConfiguration class. This refactoring aims to improve modularity, resource management, and error handling across multiple services involved in build job processing, Docker interactions, and container management.

Changes

File Change Summary
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java - Added new private fields for build executor and Docker client
- Introduced methods to manage build executor and Docker client lifecycle
- Added event listener for application startup
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java - Replaced dockerClient with buildAgentConfiguration
- Updated methods to retrieve Docker client through configuration
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java - Replaced DockerClient with BuildAgentConfiguration
- Updated container management methods
- Enhanced error handling in container stopping logic
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java - Replaced ExecutorService with BuildAgentConfiguration
- Updated build job execution method
- Improved error handling and cancellation logic
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java - Added BuildAgentConfiguration and BuildAgentDockerService
- Updated job processing and service management methods
src/test/java/de/tum/cit/aet/artemis/programming/icl/TestBuildAgentConfiguration.java - Removed mocked dockerClient() method
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java - Added BuildAgentConfiguration mock
- Introduced methods for mocking Docker client

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant BAConfig as BuildAgentConfiguration
    participant DockerClient as Docker Client
    participant Executor as Thread Pool Executor

    App->>BAConfig: onApplicationReady()
    BAConfig->>Executor: createBuildExecutor()
    BAConfig->>DockerClient: createDockerClient()
    
    App->>BAConfig: getBuildExecutor()
    App->>BAConfig: getDockerClient()
    
    App->>BAConfig: closeBuildAgentServices()
    BAConfig->>Executor: shutdownBuildExecutor()
    BAConfig->>DockerClient: closeDockerClient()
Loading

Possibly related PRs

Suggested reviewers

  • SimonEntholzer
  • krusche

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (2)

Line range hint 175-182: Restrict visibility of createDockerClient() method

The method createDockerClient() is only used within the BuildAgentConfiguration class. To follow the principle of least privilege, consider changing its access modifier from public to private.

Apply this diff to change the method's visibility:

-public DockerClient createDockerClient() {
+private DockerClient createDockerClient() {

Line range hint 183-200: Handle unexpected memory units in parseMemoryString

The parseMemoryString method does not handle cases where memory strings have unexpected units or formats (e.g., missing quotes, uppercase letters). Consider adding validation or default cases to prevent potential NumberFormatException.

Apply this diff to enhance robustness:

         else {
+            try {
                 return Long.parseLong(memoryString);
+            } catch (NumberFormatException e) {
+                throw new LocalCIException("Invalid memory string format: " + memoryString, e);
+            }
         }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)

208-244: Optimize executor service usage in stopUnresponsiveContainer

Creating a new ExecutorService every time stopUnresponsiveContainer is called may lead to resource overhead. Consider reusing a shared executor service or refactoring the method to avoid unnecessary executor creation.

Apply this diff to use a shared executor service:

+private final ExecutorService executor = Executors.newSingleThreadExecutor();

 public void stopUnresponsiveContainer(String containerId) {
-    try (ExecutorService executor = Executors.newSingleThreadExecutor()) {
         try {
             // Existing code...
         }
         finally {
-            executor.shutdown();
+            // No need to shutdown the shared executor here
         }
-    }
 }

Ensure the shared executor service is properly shut down when the application is stopped, possibly in a @PreDestroy method.


Line range hint 406-433: Handle potential resource leaks in executeDockerCommand

In the executeDockerCommand method, if the dockerClient.execStartCmd does not call onComplete, the CountDownLatch may not be decremented, leading to a potential indefinite wait in latch.await(). Consider adding a timeout or ensuring that onComplete is always called.

Apply this diff to prevent indefinite blocking:

 try {
     dockerClient.execStartCmd(execCreateCmdResponse.getId()).withDetach(detach).exec(new ResultCallback.Adapter<>() {
         @Override
         public void onNext(Frame item) {
             // Existing code...
         }
+        @Override
+        public void onError(Throwable throwable) {
+            latch.countDown();
+            super.onError(throwable);
+        }
+        @Override
+        public void close() throws IOException {
+            latch.countDown();
+            super.close();
+        }
     });
 } catch (ConflictException e) {
     throw new LocalCIException("Could not execute Docker command: " + String.join(" ", command), e);
 }

 try {
-    latch.await();
+    if (!latch.await(60, TimeUnit.SECONDS)) {
+        throw new LocalCIException("Timeout while executing Docker command: " + String.join(" ", command));
+    }
 } catch (InterruptedException e) {
     Thread.currentThread().interrupt();
     throw new LocalCIException("Interrupted while executing Docker command: " + String.join(" ", command), e);
 }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (3)

340-342: Enhance exception logging in updateLocalBuildAgentInformation

The catch block logs a generic error without specific information about the exception. Consider adding the exception to the log statement to aid in debugging.

Apply this diff to include the exception in the log:

         catch (Exception e) {
-            log.error("Error while updating build agent information for agent with address {}", memberAddress);
+            log.error("Error while updating build agent information for agent with address {}", memberAddress, e);
         }

272-275: Avoid logging sensitive or excessive information

The log statement in the catch block logs the build job and executor service details, which may contain sensitive information or clutter the logs. Review the logged data to ensure it complies with logging policies.

Apply this diff to adjust the log statement:

-    log.error("Couldn't add build job to thread pool: {}\n Concurrent Build Jobs Count: {} Active tasks in pool: {}, Concurrent Build Jobs Size: {}", buildJob,
+    log.error("Couldn't add build job to thread pool. Concurrent Build Jobs Count: {} Active tasks in pool: {}, Maximum Pool Size: {}",
             localProcessingJobs.get(), buildExecutorService.getActiveCount(), buildExecutorService.getMaximumPoolSize(), e);

578-582: Simplify condition in nodeIsAvailable() method

The condition in nodeIsAvailable() checks both localProcessingJobs and buildExecutorService active count against the maximum pool size. Since both represent the number of running tasks, consider simplifying the condition to avoid redundancy.

Apply this diff to simplify the condition:

 return localProcessingJobs.get() < buildExecutorService.getMaximumPoolSize()
-    && buildExecutorService.getActiveCount() < buildExecutorService.getMaximumPoolSize()
     && buildExecutorService.getQueue().isEmpty();
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Line range hint 193-199: Preserve interrupt status when handling InterruptedException

In the finishBuildJobExceptionally method, when catching InterruptedException, consider resetting the thread's interrupt status to ensure proper thread interruption handling.

Apply this diff to reset the interrupt status:

     if (containerId != null) {
         buildJobContainerService.stopUnresponsiveContainer(containerId);
     }
+    if (exception instanceof InterruptedException) {
+        Thread.currentThread().interrupt();
+    }
 }
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (2)

325-327: Handle potential exceptions during Docker image removal

While catching NotFoundException is appropriate, consider catching other exceptions that might occur during image removal, such as ConflictException, and log them accordingly to ensure reliable cleanup.

Apply this diff to handle additional exceptions:

 try {
     buildAgentConfiguration.getDockerClient().removeImageCmd(dockerImage).exec();
 } catch (NotFoundException e) {
     log.warn("Docker image {} not found during cleanup", dockerImage);
+} catch (ConflictException e) {
+    log.warn("Conflict occurred while removing Docker image {}: {}", dockerImage, e.getMessage());
+} catch (Exception e) {
+    log.error("Unexpected error while removing Docker image {}: {}", dockerImage, e.getMessage());
 }

Line range hint 348-385: Avoid potential race conditions in disk space check

In checkUsableDiskSpaceThenCleanUp(), the method does not lock the Docker client or the image list during the cleanup process. Consider adding synchronization to prevent race conditions when multiple threads might modify the Docker images concurrently.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0f12a54 and 74d8477.

📒 Files selected for processing (5)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (8 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (8 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (10 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (4 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (9 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentDockerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/BuildAgentConfiguration.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Line range hint 1-290: Consider adding resilience patterns

Given that this service manages distributed build jobs and Docker containers, consider implementing additional resilience patterns:

  1. Circuit breaker pattern for Docker client operations
  2. Retry mechanism for failed executor submissions
  3. Health check mechanism for the build executor service

This would help handle temporary failures and improve system stability during connection issues.

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (2)

208-245: Consider extracting timeout values as configurable constants

The implementation handles container stopping and killing gracefully with proper resource management. However, the timeout values (15s for stop, 10s for kill) could be made configurable.

+ private static final int CONTAINER_STOP_TIMEOUT_SECONDS = 15;
+ private static final int CONTAINER_KILL_TIMEOUT_SECONDS = 10;
+ private static final int CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS = 20;

  public void stopUnresponsiveContainer(String containerId) {
      try (ExecutorService executor = Executors.newSingleThreadExecutor()) {
          try {
              Future<Void> future = executor.submit(() -> {
-                 buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId).withTimeout(15).exec();
+                 buildAgentConfiguration.getDockerClient().stopContainerCmd(containerId)
+                     .withTimeout(CONTAINER_STOP_TIMEOUT_SECONDS).exec();
                  return null;
              });
-             future.get(20, TimeUnit.SECONDS);
+             future.get(CONTAINER_STOP_COMMAND_TIMEOUT_SECONDS, TimeUnit.SECONDS);
          }
          // ... existing catch blocks ...
          try {
              Future<Void> killFuture = executor.submit(() -> {
                  buildAgentConfiguration.getDockerClient().killContainerCmd(containerId).exec();
                  return null;
              });
-             killFuture.get(10, TimeUnit.SECONDS);
+             killFuture.get(CONTAINER_KILL_TIMEOUT_SECONDS, TimeUnit.SECONDS);
          }

Line range hint 445-447: Consider enhancing path validation pattern

The current regex pattern might be too restrictive. Consider allowing additional valid path characters while maintaining security.

-    if (path == null || path.contains("..") || !path.matches("[a-zA-Z0-9_*./-]+")) {
+    if (path == null || path.contains("..") || !path.matches("^[a-zA-Z0-9_\\-.*/@\\s]+$")) {
         throw new LocalCIException("Invalid path: " + path);
     }

This pattern adds support for @ and spaces while maintaining protection against path traversal.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 74d8477 and 7fa6a0e.

📒 Files selected for processing (2)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (10 hunks)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (7)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobManagementService.java (4)

33-33: LGTM: Clean dependency injection changes

The replacement of direct ExecutorService with BuildAgentConfiguration aligns with the PR objective of managing connection lifecycles centrally.

Also applies to: 51-51


86-88: LGTM: Constructor follows dependency injection best practices

The constructor properly initializes the BuildAgentConfiguration field through constructor injection, maintaining clean dependency management.


Line range hint 170-180: LGTM: Improved exception handling and logging

The enhanced exception handling for cancelled jobs and the addition of the exception parameter to log.error improves error tracking and debugging capabilities.

Also applies to: 237-237


147-148: ⚠️ Potential issue

Check for uninitialized executor service

The direct use of buildAgentConfiguration.getBuildExecutor() without a null check could lead to a NullPointerException.

Apply this diff to add a null check:

+if (buildAgentConfiguration.getBuildExecutor() == null) {
+    throw new LocalCIException("Build executor service is not initialized.");
+}
 future = buildAgentConfiguration.getBuildExecutor().submit(buildJob);
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobContainerService.java (3)

62-62: LGTM: Clean dependency injection of BuildAgentConfiguration

The refactoring to use BuildAgentConfiguration instead of direct DockerClient injection improves modularity and aligns with the PR's objective of managing Docker client lifecycle.

Also applies to: 80-83


400-403: LGTM: Proper Docker command execution handling

The Docker command execution methods have been properly updated to use BuildAgentConfiguration while maintaining robust error handling and resource cleanup.

Also applies to: 406-439


452-454: LGTM: Efficient container lookup implementation

The container listing and filtering implementation is clean and efficient, properly handling null cases and container name matching.

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (2)

Line range hint 320-365: Consider improving error handling and cleanup logic

The current implementation has several areas that could be enhanced:

  1. Extract the cleanup logic into a separate method to improve readability and maintainability
  2. Consider using a more structured approach for handling cleanup failures
  3. Implement proper exception chaining to preserve the root cause

Here's a suggested refactor:

 finally {
-    buildJobContainerService.stopContainer(containerName);
-
-    // Delete the cloned repositories
-    deleteCloneRepo(assignmentRepositoryUri, assignmentRepoCommitHash, buildJob.id(), assignmentRepositoryPath);
-    deleteCloneRepo(testRepositoryUri, assignmentRepoCommitHash, buildJob.id(), testsRepositoryPath);
-    // do not try to delete the temp repository if it does not exist or is the same as the assignment reposity
-    if (solutionRepositoryUri != null && !Objects.equals(assignmentRepositoryUri.repositorySlug(), solutionRepositoryUri.repositorySlug())) {
-        deleteCloneRepo(solutionRepositoryUri, assignmentRepoCommitHash, buildJob.id(), solutionRepositoryPath);
-    }
-
-    for (int i = 0; i < auxiliaryRepositoriesUris.length; i++) {
-        deleteCloneRepo(auxiliaryRepositoriesUris[i], assignmentRepoCommitHash, buildJob.id(), auxiliaryRepositoriesPaths[i]);
-    }
-
-    try {
-        deleteRepoParentFolder(assignmentRepoCommitHash, assignmentRepositoryPath, testRepoCommitHash, testsRepositoryPath);
-    }
-    catch (IOException e) {
-        msg = "Could not delete " + CHECKED_OUT_REPOS_TEMP_DIR + " directory";
-        buildLogsMap.appendBuildLogEntry(buildJob.id(), msg);
-        log.error(msg, e);
-    }
+    cleanupResources(containerName, buildJob.id(), 
+        new CleanupContext(
+            assignmentRepositoryUri, testRepositoryUri, solutionRepositoryUri,
+            auxiliaryRepositoriesUris, assignmentRepositoryPath, testsRepositoryPath,
+            solutionRepositoryPath, auxiliaryRepositoriesPaths,
+            assignmentRepoCommitHash, testRepoCommitHash
+        )
+    );
 }

+ private record CleanupContext(
+     VcsRepositoryUri assignmentRepositoryUri, VcsRepositoryUri testRepositoryUri,
+     VcsRepositoryUri solutionRepositoryUri, VcsRepositoryUri[] auxiliaryRepositoriesUris,
+     Path assignmentRepositoryPath, Path testsRepositoryPath,
+     Path solutionRepositoryPath, Path[] auxiliaryRepositoriesPaths,
+     String assignmentRepoCommitHash, String testRepoCommitHash
+ ) {}
+
+ private void cleanupResources(String containerName, String buildJobId, CleanupContext ctx) {
+     List<Exception> suppressedExceptions = new ArrayList<>();
+
+     try {
+         buildJobContainerService.stopContainer(containerName);
+     } catch (Exception e) {
+         suppressedExceptions.add(e);
+         log.error("Failed to stop container {}", containerName, e);
+     }
+
+     cleanupRepositories(buildJobId, ctx, suppressedExceptions);
+
+     if (!suppressedExceptions.isEmpty()) {
+         LocalCIException ex = new LocalCIException("Cleanup failed with multiple errors");
+         suppressedExceptions.forEach(ex::addSuppressed);
+         log.warn("Cleanup completed with {} errors", suppressedExceptions.size(), ex);
+     }
+ }

Line range hint 1-24: Consider architectural improvements for better maintainability

The class has several areas that could benefit from architectural improvements:

  1. The class handles multiple concerns (build execution, Docker management, Git operations)
  2. The runScriptAndParseResults method has 12 parameters, making it hard to maintain
  3. The logging pattern is duplicated across methods

Consider:

  1. Extracting Docker-specific operations into a dedicated BuildDockerService
  2. Creating a BuildContext record to encapsulate build parameters
  3. Implementing a proper logging facade to reduce duplication

Example of a BuildContext record:

public record BuildContext(
    BuildJobQueueItem buildJob,
    String containerName,
    String containerId,
    RepositoryContext repositories,
    String assignmentRepoCommitHash,
    String testRepoCommitHash
) {
    public record RepositoryContext(
        VcsRepositoryUri assignmentUri,
        VcsRepositoryUri testUri,
        VcsRepositoryUri solutionUri,
        VcsRepositoryUri[] auxiliaryUris,
        Path assignmentPath,
        Path testsPath,
        Path solutionPath,
        Path[] auxiliaryPaths
    ) {}
}

Also applies to: 303-314

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7fa6a0e and a4d612d.

📒 Files selected for processing (1)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (1)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/BuildJobExecutionService.java (1)

303-306: LGTM: Well-structured logging additions

The new logging statements enhance observability by tracking critical steps in the build process, following the established logging pattern.

Also applies to: 311-314

coderabbitai[bot]
coderabbitai bot previously approved these changes Dec 4, 2024
@krusche krusche added this to the 7.8.0 milestone Dec 13, 2024
@BBesrour BBesrour requested a review from a team as a code owner December 16, 2024 18:48
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (3)

343-345: Consider using more specific exception types

While the error handling is improved, catching a generic Exception might mask specific issues. Consider catching more specific exceptions that could occur during the update process.

-        catch (Exception e) {
+        catch (IllegalStateException | HazelcastInstanceNotActiveException e) {
             log.error("Error while updating build agent information for agent with address {}", memberAddress, e);
         }

559-563: Consider adding error handling for container cleanup

While the service reinitialization looks good, the container cleanup operation might fail silently. Consider adding error handling to log any issues during cleanup.

             buildAgentConfiguration.openBuildAgentServices();

             // Cleanup docker containers
-            buildAgentDockerService.cleanUpContainers();
+            try {
+                buildAgentDockerService.cleanUpContainers();
+                log.info("Successfully cleaned up docker containers");
+            } catch (Exception e) {
+                log.error("Failed to clean up docker containers", e);
+            }

582-586: Consider caching the executor reference

While the availability check is comprehensive, calling getBuildExecutor() multiple times could be optimized by storing the reference locally.

-        var buildExecutorService = buildAgentConfiguration.getBuildExecutor();
+        final var buildExecutorService = buildAgentConfiguration.getBuildExecutor();
         log.debug("Currently processing jobs on this node: {}, active threads in Pool: {}, maximum pool size of thread executor : {}", localProcessingJobs.get(),
                 buildExecutorService.getActiveCount(), buildExecutorService.getMaximumPoolSize());
-        return localProcessingJobs.get() < buildExecutorService.getMaximumPoolSize() && buildExecutorService.getActiveCount() < buildExecutorService.getMaximumPoolSize()
-                && buildExecutorService.getQueue().isEmpty();
+        final int maxPoolSize = buildExecutorService.getMaximumPoolSize();
+        return localProcessingJobs.get() < maxPoolSize 
+                && buildExecutorService.getActiveCount() < maxPoolSize
+                && buildExecutorService.getQueue().isEmpty();
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (2)

135-137: Consider making the dockerClientMock field final.

The dockerClientMock field is only assigned once in @BeforeAll and never modified afterward. Making it final would better express its immutable nature and thread-safety intentions.

-    private static DockerClient dockerClientMock;
+    private static final DockerClient dockerClientMock;

Also applies to: 226-227


287-293: Consider extracting the callback logic into a separate method.

The lambda expression in the doAnswer block could be more readable if extracted into a named method that clearly expresses its purpose.

+private static void simulateCommandCompletion(ResultCallback.Adapter<?> callback) {
+    callback.onComplete();
+}

-doAnswer(invocation -> {
-    ResultCallback.Adapter<?> callback = invocation.getArgument(0);
-    callback.onComplete();
-    return null;
-}).when(execStartCmd).exec(any());
+doAnswer(invocation -> {
+    simulateCommandCompletion(invocation.getArgument(0));
+    return null;
+}).when(execStartCmd).exec(any());
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98342c5 and 4a44991.

📒 Files selected for processing (2)
  • src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (9 hunks)
  • src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (1)

Pattern src/test/java/**/*.java: test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true

src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (1)

Pattern src/main/java/**/*.java: naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports

🔇 Additional comments (7)
src/main/java/de/tum/cit/aet/artemis/buildagent/service/SharedQueueProcessingService.java (4)

131-140: LGTM: Constructor changes align with dependency injection best practices

The addition of BuildAgentConfiguration and BuildAgentDockerService as constructor dependencies follows proper dependency injection patterns and supports the reinitialization improvements.


272-275: LGTM: Improved error handling with comprehensive executor metrics

The error handling now includes detailed metrics about the executor's state, which will help with debugging rejected executions. The retry mechanism with a maximum of 5 attempts provides a good balance between reliability and avoiding infinite retries.


355-356: LGTM: Robust handling of executor configuration

The code properly handles both cases where the executor is available and when it's not, falling back to the thread pool size configuration. This makes the system more resilient to different states of the build agent.


524-525: LGTM: Proper cleanup of resources during pause

The code properly closes the build executor and docker client when pausing the build agent, which aligns with the PR's objective of improving reinitialization.

src/test/java/de/tum/cit/aet/artemis/shared/base/AbstractSpringIntegrationLocalCILocalVCTest.java (3)

12-19: LGTM: Import statements are well-organized and specific.

The new Mockito-related imports are correctly added and properly organized.


342-346: LGTM: Build agent services mock setup is concise and clear.

The mockBuildAgentServices method correctly sets up the Docker client mock for each test.


228-340: Add unit tests for the Docker client mock setup.

The extensive Docker client mocking setup should be verified with unit tests to ensure it behaves correctly across all scenarios.

Copy link
Contributor

@az108 az108 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on TS3, always queued 10 jobs and paused artemis node 1 and 2. Jobs were successfully finished and no new jobs were queued to those nodes. After that i queued in 10 build jobs again and resumed one of the artemis nodes. Then 2 agents were working. Then I tried it again by using the resume all button. Everything works as described 👍

Copy link
Contributor

@coolchock coolchock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on TS3:

  1. Submitted a programming exercise 5 times with build agents active; all submissions were built successfully.
  2. Paused the build agents and submitted 4 more times; these submissions went into the queue as expected.
  3. Resumed the agents and observed that all queued submissions were built without issues.

works as expected 👍

Copy link
Contributor

@muradium muradium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM

Copy link
Contributor

@SimonEntholzer SimonEntholzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code reapprove

@krusche krusche merged commit 7ae2791 into develop Dec 22, 2024
52 of 55 checks passed
@krusche krusche deleted the bugfix/integrated-code-lifecycle/re-init branch December 22, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buildagent Pull requests that affect the corresponding module programming Pull requests that affect the corresponding module ready for review server Pull requests that update Java code. (Added Automatically!) tests
Projects
Status: Merged
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants