Add support for retries in StateManager

## Summary

Add built-in retry handling in the State Manager so ERRORED states are requeued with backoff until a max attempts limit, aligning with the documented lifecycle and fault-tolerance goals. ([[Exosphere Docs](https://docs.exosphere.host/exosphere/architecture/)][1])

## Why

The architecture mentions retry mechanisms and error handling, but concrete behavior and knobs are not yet specified in the State Manager service. Implementing this at the service that manages state lifecycles keeps it consistent across runtimes and APIs. ([[Exosphere Docs](https://docs.exosphere.host/exosphere/architecture/)][1])

## Scope

* On transition to ERRORED, if attempts < max\_retries, schedule a retry and move the state back to QUEUED after backoff. Keep the existing lifecycle names. ([[Exosphere Docs](https://docs.exosphere.host/exosphere/architecture/)][1])
* Persist per-state counters and next attempt time so retries survive restarts.
* Defaults configurable at service level; optional per-graph override can follow later.
* Idempotency guard in runtimes by state id to avoid duplicate execution.
* Logs and basic metrics for attempts, successes after retry, and exhausted retries.

## Config

* Users should be able to add as config in graph template with some default value and method of retry
* Keep existing required envs unchanged. ([[Exosphere Docs](https://docs.exosphere.host/exosphere/state-manager-setup/)][2])

## Acceptance criteria

* Failing node is retried up to MAX\_RETRIES with exponential backoff.
* Final outcome remains ERRORED when retries are exhausted; otherwise proceeds normally.
* Counters and next attempt timestamps are persisted and visible via API or logs.
* Unit and integration tests demonstrate requeue from ERRORED to QUEUED and success after retry. ([[Exosphere Docs](https://docs.exosphere.host/exosphere/architecture/)][1])

---

[1]: https://docs.exosphere.host/exosphere/architecture/ "Architecture - Docs Exosphere"
[2]: https://docs.exosphere.host/exosphere/state-manager-setup/ "State Manager Setup - Docs Exosphere"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for retries in StateManager #181

Summary

Why

Scope

Config

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for retries in StateManager #181

Description

Summary

Why

Scope

Config

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions