Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python 3.11 Update] Primitive progressive rollout implementation #4039

Merged
merged 30 commits into from
Jul 24, 2024

Conversation

vitorguidi
Copy link
Collaborator

@vitorguidi vitorguidi commented Jun 19, 2024

Motivation

There is no available mechanism by which two clusterfuzz versions can coexist in production. This PR assumes that, during the GCE VM creation, an user data script will:

  • Startup scripts will be able to query the GCP api and figure out candidate/prod weights in the fleet composition, and choose docker image (linux) or python interpreter (windows) during initialization.
  • The clusterfuzz docker container will be launched, during VM startup, with the CLUSTERFUZZ_STAGE environment variable, which can assume the values "prod" and "candidate".

If these assumptions hold, we can assume that clusterfuzz will start with the correct package ({platform}-3.zip or {platfor}-3-candidate.zip), and pull the corresponding file in all future update task runs.

This also requires change to butler deploy/package/remote, in order to take into account the two possible releases.
This PR belongs to this effort.

@vitorguidi vitorguidi changed the title Primitive progressive rollout implementation [WIP] Primitive progressive rollout implementation Jun 19, 2024
@vitorguidi vitorguidi force-pushed the feature/proto-progressive-rollout branch from 5bdbcf7 to 29587f7 Compare June 20, 2024 14:15
docker/base/Dockerfile Outdated Show resolved Hide resolved
docker/base/start.sh Outdated Show resolved Hide resolved
@@ -92,7 +92,8 @@ def _deploy_app_prod(project,
yaml_paths,
package_zip_paths,
deploy_appengine=True,
test_deployment=False):
test_deployment=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we delete the test_deployment feature?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this feature? I have never used it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it is the test-targets thing. We could delete it if this proves successful, we leave the test cluster alive and perform our releases on it with this approach (might be a good idea to create a windows one for that purpose too)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think of it, I would rather keep this feature. At some point in the future I would like to pin releases (git commit) in our deployments, so we can have granular control over what code is executed in each cluster.

For now, this is still useful.

src/local/butler/package.py Outdated Show resolved Hide resolved
butler.py Outdated Show resolved Hide resolved
docker/base/Dockerfile Outdated Show resolved Hide resolved
@jonathanmetzman
Copy link
Collaborator

Do we also need to change the windows startup script? And Android and Mac?

@vitorguidi vitorguidi force-pushed the feature/proto-progressive-rollout branch from 53593fb to 8d7fc81 Compare July 8, 2024 18:40
@vitorguidi
Copy link
Collaborator Author

Do we also need to change the windows startup script? And Android and Mac?

I would rather not touch Android and Mac. It is too much friction to implement prog rollout in an environment we do not control.

As far as windows goes, since we do have our own fleet, a change to the windows startup is on the way. Will be in the PR soon

@vitorguidi vitorguidi force-pushed the feature/proto-progressive-rollout branch from 9e70992 to 3ebc508 Compare July 10, 2024 19:53
@vitorguidi vitorguidi changed the title [WIP] Primitive progressive rollout implementation Primitive progressive rollout implementation Jul 16, 2024
@vitorguidi vitorguidi requested a review from oliverchang July 16, 2024 05:41
docker/build.sh Outdated Show resolved Hide resolved
docker/base/setup_clusterfuzz.sh Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/bot/tasks/update_task.py Outdated Show resolved Hide resolved
TESTS_LAST_UPDATE_KEY = 'tests_last_update'
TESTS_UPDATE_INTERVAL_DAYS = 1

MANIFEST_FILENAME = 'clusterfuzz-source.manifest.3'
REMOTE_MANIFEST_FILENAME = 'clusterfuzz-source.manifest'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like there's a lot of repetition everywhere with this logic to compute the suffixes. Can we instead just create helpers for this that can be used everywhere?

e.g. something along the lines of

def get_source_zip_suffix():
  ...

def get_source_zip_name():
  ...

We can remove some of these constants if necessary and just make them dynamically computed based on reading from env.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to clusterfuzz._internal.base.utils so butler package/deploy and update_task can share this code

src/local/butler/deploy.py Outdated Show resolved Hide resolved
src/local/butler/package.py Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/metrics/logs.py Show resolved Hide resolved
Copy link
Collaborator

@jonathanmetzman jonathanmetzman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with nits

docker/build.sh Outdated Show resolved Hide resolved
@@ -13,6 +13,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

DEPLOYMENT_ZIP="linux-3.zip"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to rebuild the google images for this to be used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to use progressive rollout to validate on internal, we should not need to perform this operation for now.

docker/oss-fuzz/host/Dockerfile Outdated Show resolved Hide resolved
src/clusterfuzz/_internal/base/utils.py Outdated Show resolved Hide resolved
@vitorguidi vitorguidi requested a review from oliverchang July 24, 2024 00:53
@vitorguidi vitorguidi merged commit 412b2e1 into master Jul 24, 2024
6 of 7 checks passed
@vitorguidi vitorguidi deleted the feature/proto-progressive-rollout branch July 24, 2024 12:42
@vitorguidi vitorguidi changed the title Primitive progressive rollout implementation [Python 3.11 Update] Primitive progressive rollout implementation Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants