Skip to content

Conversation

@hunsche
Copy link
Collaborator

@hunsche hunsche commented Nov 11, 2025

Description

This PR introduces a client-side filtering mechanism to make ClusterFuzz bots more resilient to misrouted Pub/Sub messages. It ensures that bots only process tasks intended for their specific operating system version, preventing errors and wasted resources when tasks are sent to legacy, unfiltered subscriptions.

The Problem

When introducing new OS versions, such as Ubuntu 24.04, we create new, filtered Pub/Sub subscriptions (e.g., my-queue-ubuntu-24-04). However, existing bots subscribed to legacy, unfiltered queues (e.g., my-queue) could still pull messages intended for the new OS. Since Pub/Sub subscription filters are immutable, we cannot simply update old subscriptions. This can lead to bots attempting to execute incompatible tasks, causing errors and inefficiencies.

The Solution

This change implements a robust, centralized check within the task-pulling logic.

  1. Centralized Filtering Function: A new private function, _filter_task_for_os_mismatch, has been created in src/clusterfuzz/_internal/base/tasks/__init__.py. This function encapsulates all the logic for OS version validation.

  2. Behavior:

    • When a bot pulls a message, this function compares the base_os_version attribute on the message with the bot's BASE_OS_VERSION environment variable.
    • If a mismatch is detected, the function logs a warning and immediately acknowledges (ack()) the message.
    • Acknowledging the message permanently removes it from that subscription, effectively skipping it for the current bot. This assumes the message was also correctly delivered to another, properly filtered subscription for processing.
    • If the OS versions match, or if either the bot or the message does not have an OS version specified, the task is processed as usual.
  3. Integration: This check is performed within get_task_from_message, ensuring it is applied to all types of tasks pulled from Pub/Sub (regular, preprocess, postprocess, etc.) without code duplication.

Benefits

  • Resilience: Bots are now resilient to misrouted messages and will not fail on incompatible tasks.
  • Cleanliness: The logic is centralized in a single, well-documented function, improving code maintainability.
  • Forward-Compatibility: This provides a safety net for future OS migrations and ensures that legacy bots can coexist with newer ones without issue.

Testing

  • Added comprehensive unit tests in src/clusterfuzz/_internal/tests/core/base/tasks/tasks_test.py to validate the filtering logic.
  • Tests cover all scenarios:
    • OS mismatch (message is skipped and acked).
    • OS match (message is processed).
    • Bot has an OS, but the message does not (message is processed).
    • Message has an OS, but the bot does not (message is processed).
  • All new and existing tests pass.

hunsche and others added 4 commits November 11, 2025 15:34
Extracted the logic for checking and filtering Pub/Sub messages with a
mismatched  into its own private function,
.

This change centralizes the logic, removes code duplication, and improves
readability by documenting the behavior in a detailed docstring.
The recent introduction of an OS version filter in  caused several tests to fail. This commit adjusts the tests to account for the new logic:

- Updates the  method in  to prevent the OS filter from being triggered in unrelated tests.
- Corrects the expected log message in  to match the new implementation.
- Removes a duplicated  method.
Copy link
Contributor

@javanlacerda javanlacerda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the ack.

Could you deploy and validate it in dev before merging? Maybe providing the evidence that it hasn't any disruption as well?

@hunsche
Copy link
Collaborator Author

hunsche commented Nov 11, 2025

I left a comment about the ack.

Could you deploy and validate it in dev before merging? Maybe providing the evidence that it hasn't any disruption as well?

Deployment and Validation Summary

The changes have been successfully deployed to the dev environment, and initial validation indicates no disruptions. The new OS version filter logic has been verified through unit tests.

1. Deployment Status: SUCCESS

The Cloud Build for the deployment completed successfully.

2. System Health (No Disruption Evidence)

General GCE instance logs show normal run_bot activity, indicating that the deployment did not introduce any disruptions. Bots are actively unpacking builds and processing tasks.

  • Evidence: Recent gce_instance logs (e.g., jsonPayload.name: run_bot, message: Unpacked...) confirm ongoing operations. No critical errors or unexpected behavior were observed.

3. OS Version Filter Logic Validation

Direct live validation of the _filter_task_for_os_mismatch function in the dev environment was challenging due to Pub/Sub subscription filters. The Pub/Sub system itself correctly routes messages based on base_os_version attributes, preventing incompatible messages from reaching filtered subscriptions.

However, the core logic of the OS filter has been thoroughly validated by fixing and passing the dedicated unit test:

  • Test File: src/clusterfuzz/_internal/tests/core/base/tasks/tasks_test.py
  • Relevant Test: test_os_mismatch
  • Validation: This test now correctly asserts that when an OS mismatch is detected, the task is skipped, and the appropriate warning message (Skipping task for different OS.) is logged. This confirms the intended behavior of the filter within the application's code.

@hunsche hunsche merged commit b5863ff into master Nov 11, 2025
10 checks passed
@hunsche hunsche deleted the feat/os-version-check-on-task branch November 11, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants