Skip to content

Conversation

@hunsche
Copy link
Collaborator

@hunsche hunsche commented Nov 13, 2025

Description:

Summary:

This PR addresses a flaw in the client-side OS version filtering for Pub/Sub tasks. Legacy bots, which do not have the BASE_OS_VERSION environment variable set, were not correctly skipping tasks intended for specific, newer OS versions. This could lead to bots attempting to execute incompatible tasks, causing errors and wasting resources, particularly during OS version migrations.

This change makes the task handling more resilient by ensuring that bots only process tasks compatible with their environment.

Changes:

  • Modified _filter_task_for_os_mismatch: The core filtering logic in src/clusterfuzz/_internal/base/tasks/__init__.py has been updated. It now correctly skips a task if the incoming message has a base_os_version attribute and the bot's own OS version is either different or not set (None).
  • Updated Unit Tests: The corresponding unit tests in src/clusterfuzz/_internal/tests/core/base/tasks/tasks_test.py have been updated to validate the corrected logic and ensure there are no regressions.
  • Improved Documentation: The docstring for the filtering function has been refined to be more explicit and conform to the Google Python Style Guide. Redundant inline comments were removed to improve code clarity.

Testing:

All changes are covered by unit tests. The relevant test suite passes successfully.

This commit refines the client-side OS version filtering logic for Pub/Sub tasks
to correctly handle scenarios where a legacy bot (without a defined
BASE_OS_VERSION) receives a task intended for a specific OS version.

Previously, the filtering logic only triggered a skip if both the message and
the bot had a defined OS version, and these versions differed. This meant that
legacy bots, which do not have BASE_OS_VERSION set, would incorrectly attempt
to process tasks explicitly tagged for newer OS versions, leading to errors and
inefficiencies.

The updated logic now ensures that a task is skipped if:
1. The Pub/Sub message specifies a .
2. The bot's  is either different from the message's OS version
   OR the bot's  is not defined (e.g.,  for legacy bots).

This change improves the resilience of ClusterFuzz bots by preventing them from
processing incompatible tasks, especially during OS migrations or when mixing
legacy and modern bot deployments.

Unit tests have been updated to reflect this new behavior, ensuring all
scenarios are covered and pass.
@ViniciustCosta
Copy link
Collaborator

ViniciustCosta commented Nov 13, 2025

I wonder if we could have better tested this in our dev/staging environment for #5023 to avoid having these constant hotfixes.

@hunsche hunsche merged commit b42d3da into master Nov 13, 2025
10 checks passed
@hunsche hunsche deleted the fix/os-task-filtering-logic branch November 13, 2025 18:22
@hunsche
Copy link
Collaborator Author

hunsche commented Nov 13, 2025

I wonder if we could have better tested this in our dev/staging environment for #5023 to avoid having these constant hotfixes.

You're right. In this case, it is genuinely difficult to test since we don't have all behaviors mapped in these environments. The issue actually slipped through an unmapped edge case.

@ViniciustCosta
Copy link
Collaborator

I wonder if we could have better tested this in our dev/staging environment for #5023 to avoid having these constant hotfixes.

You're right. In this case, it is genuinely difficult to test since we don't have all behaviors mapped in these environments. The issue actually slipped through an unmapped edge case.

What kind of legacy bots don't have the variable set and prevented us from catching this case in dev/staging?

@hunsche
Copy link
Collaborator Author

hunsche commented Nov 13, 2025

I wonder if we could have better tested this in our dev/staging environment for #5023 to avoid having these constant hotfixes.

You're right. In this case, it is genuinely difficult to test since we don't have all behaviors mapped in these environments. The issue actually slipped through an unmapped edge case.

What kind of legacy bots don't have the variable set and prevented us from catching this case in dev/staging?

All bots not running on Ubuntu 24.04 are missing this environment variable. As I mentioned, we should have tested more thoroughly, but we missed catching this in our executed tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants