Skip to content

Project: Backend Operators Redesign #147

@xutongNV

Description

@xutongNV

Have you read the Project Process docs?

  • Yes, I have read and understood the RFC docs

Summary

This project aims to redesign the OSMO backend operator component to address critical scaling, reliability, and performance issues.

Person in Charge (PIC)

@xutongNV

Motivation

To prepare for future growth, OSMO's backend operators require architectural enhancements to support production workloads for Kubernetes backends at large scale. The current system provides a solid foundation, but can be optimized in several key areas: system stability under sustained load, event delivery consistency, and workflow status update latency.

This redesign will proactively strengthen OSMO's ability to support production-scale deployments reliably and efficiently as customer workloads continue to grow. Rewriting the backend listener using Golang will allow us to leverage native Kubernetes Go library for more performant operations and more built-in features (such as node/pod events caching).

Problem

  1. Scaling - Single-threaded listener design limits workflow throughput
  2. Reliability - Frequently restarts, and event drops
  3. Performance - Memory leaks consume several GB per instance, and workflow status updates experience high latency

References

No response

Sub-issues

Metadata

Metadata

Labels

projectA new project or major change

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions