Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed RFC Feature - Automated Review in Github Actions #101

Open
amzn-changml opened this issue Oct 18, 2024 · 0 comments
Open

Proposed RFC Feature - Automated Review in Github Actions #101

amzn-changml opened this issue Oct 18, 2024 · 0 comments
Assignees
Labels
rfc-feature Request for Comments for a Feature

Comments

@amzn-changml
Copy link
Contributor

amzn-changml commented Oct 18, 2024

Summary:

This proposal recommends transitioning the O3DE project's Automated Review (AR) process from the current AWS Jenkins setup to GitHub Actions. Leveraging GitHub Actions for Continuous Integration (CI) offers benefits such as cost savings, improved accessibility for contributors, and streamlined workflows that align with modern open-source development practices.

What is the relevance of this feature?

The O3DE project currently relies on AWS-hosted Jenkins runners for its AR process, which introduces costs and barriers for contributors. By adopting GitHub Actions, which are free for open-source projects using standard runners, we can reduce operational expenses and lower the entry threshold for new contributors. GitHub Actions also integrates seamlessly with GitHub repositories, making it easy for contributors to replicate the build and test environment without needing to host hardware or VMs locally.

Feature design description:

The proposed shift to GitHub Actions involves configuring workflows that replicate the existing AR process performed by Jenkins. These workflows will handle tasks such as compilation, asset processing, and running tests across various platforms. Contributors will benefit from the ability to run CI workflows on their forks without incurring costs, facilitating better code quality and alignment with the project's standards before submitting pull requests.

To address limitations of GitHub's standard runners, such as fewer CPU cores and limited storage, we will implement strategies like using ccache to speed up builds and unity builds without debug symbols to reduce build sizes. Artifacts like 3rdParty libraries and cached data exceeding GitHub's storage limits will be managed by storing them as GitHub Actions Artifacts, ensuring they are accessible for development and stabilization branch builds as well as for pull requests. These artifacts are stored per build and will also allow contributors to download them to verify the output.

Technical design description:

Transitioning to GitHub Actions requires the creation of workflow files that define the steps necessary for the AR process. These YAML-based configurations will specify jobs for different stages like building, testing, and deployment across supported platforms. Given the standard runners have 3-4 cores and 100GB of storage, optimizations are crucial. Enabling ccache across all platforms will reuse compiled objects, significantly reducing build times. Unity builds will be configured with the -DLY_STRIP_DEBUG_SYMBOLS=ON flag to minimize build sizes. In some tests, we can reduce build times to 30 mins per stage and build output by 50% using these strategies. See this build for an example: https://github.com/amzn-changml/o3de/actions/runs/11444415406/job/31839246165

Since GitHub Actions' cache is limited to 10GB, we will store larger caches and essential artifacts as GitHub Actions Artifacts. These artifacts will be generated from builds on the development and stabilization branches and then downloaded during pull request workflows through a global variable. This approach ensures that dependencies and caches are readily available without exceeding storage limits.

To accommodate this transition, we will be updating build scripts to be compatible with the GitHub Actions environment and modifying paths or environment variables as needed, such as modifying a compiler flag to support ccache (we use /Zi, ccache only works with /Z7). In addition, we will try to mirror the existing AR workflow for Android, Windows, and Ubuntu Linux through nested steps. Future usage with Mac and Fedora can be possible.

Here's an overview of the flow:

image

Here's a screenshot of a AR run in a PR with this implemented (amzn-changml/o3de#36)

image

What are the advantages of the feature?

  • Cost Efficiency: Eliminates the expenses associated with AWS Jenkins runners, as GitHub Actions are free for open-source projects.
  • Accessibility: Lowers the barrier for contributors, who can now run AR workflows on their forks without setting up Jenkins or having to deploy machines for each AR platform
  • Integration: Provides seamless integration with GitHub repositories, without having to login to Jenkins
  • Scalability: Simplifies scaling the AR process without managing additional infrastructure.
  • Community Engagement: Encourages more contributions by simplifying the process and reducing overhead.

What are the disadvantages of the feature?

  • Performance Limitations: Standard GitHub runners have fewer cores and limited storage (currently 3-4 cores and 100GB storage), which may lead to longer build and test times. Some tests may have to be disabled as they will exceed the test and build timeouts (6 hrs)
  • Storage Constraints: The 10GB limit on Github Actions Cache and 100GB overall storage may require careful management of artifacts and caches.
  • Learning Curve: Team members familiar with Jenkins may need time to adapt to GitHub Actions workflows.
  • Customization: Jenkins offers more customization for complex pipelines, which may require workarounds in GitHub Actions.

How will this be implemented or integrated into the O3DE environment?

  1. Workflow Configuration: Writing GitHub Actions workflow files (.github/workflows/ar.yml) to define the CI processes.
  2. Script Modification: Updating existing build and test scripts to ensure compatibility with the GitHub Actions environment.
  3. Cache Management: Setting up ccache and artifact storage within the workflows to optimize build times and manage storage limitations.
  4. Testing and Validation: Running the new workflows alongside Jenkins to validate their effectiveness before fully transitioning.
  5. Documentation: Providing clear instructions and documentation for contributors on how to use the new CI system.

Are there any alternatives to this feature?

Alternatives considered include:

  • Self-Hosted Runners: Using self-hosted GitHub runners to overcome the limitations of standard runners. However, this reintroduces infrastructure management and costs, along with security vulnerabilities.
  • Other CI Services: Evaluating other CI platforms like Travis CI or CircleCI, but they may have similar limitations, such as lack of Windows support or no free usage.
  • Optimizing Jenkins: Continuing with Jenkins but optimizing the current setup to reduce costs, but it will never be completely eliminated. This doesn't address the accessibility issues for contributors.

Not implementing this change would mean continued operational costs and barriers for contributors, potentially hindering community growth and project scalability due to cost.

How will users learn this feature?

Users and contributors will be informed through:

  • Documentation Updates: Providing updated guides in the project's repository on how to interact with the new CI system.
  • Tutorials and Examples: Offering step-by-step tutorials on setting up and running the workflows on personal forks.
  • Community Communication: Announcing the changes via the project's communication channels, including Discord and mailing lists.
  • Support Resources: Establishing support mechanisms like FAQs and forums for troubleshooting.

Are there any open questions?

  • Cache and Artifact Storage Limits: How will we manage cache invalidation and storage to stay within GitHub's limits over time?
  • Performance Optimization: Are there additional optimizations we can implement to mitigate the longer build times on standard runners?
  • Platform Compatibility: Will there be any unforeseen issues with specific platforms that need special attention in the workflows? What if we can't support the update cadence?
  • Security Considerations: How will we handle sensitive information or credentials that were previously managed within the Jenkins environment?
@amzn-changml amzn-changml added the rfc-feature Request for Comments for a Feature label Oct 18, 2024
@amzn-changml amzn-changml self-assigned this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc-feature Request for Comments for a Feature
Projects
None yet
Development

No branches or pull requests

1 participant