Skip to content

Conversation

@mpartipilo
Copy link
Collaborator

@mpartipilo mpartipilo commented Nov 7, 2025

Overview

This PR adds full replication management capabilities to the cluster API surface of the C# client. Users can initiate replica COPY/MOVE operations for shards, monitor progress via a tracking abstraction, cancel in-flight operations, list/query operations with filters, and clean up (delete single/all) completed or cancelled operations.

Rebase Context

The branch feat/cluster-api was rebased onto feat/rbac-apis to incorporate the latest RBAC infrastructure and shared error-handling improvements. No RBAC functionality was modified here, but the rebase ensures consistent exception types, OpenAPI-driven REST patterns, and documentation structure.

Summary of Changes

  • Core client: Added ClusterClient.Replicate() and ReplicateSync() plus a new lazily-initialized Replications property (ReplicationsClient).
  • Models: Introduced replication domain models (ReplicationType, ReplicationOperationState, ReplicationOperationStatus, ReplicationOperationError, ReplicationOperation, ReplicateRequest, ReplicationClientConfig).
  • Tracking: Implemented ReplicationOperationTracker with background polling, manual refresh, synchronous wait (WaitForCompletion), and cancellation helpers (Cancel, CancelSync). Implements IDisposable / IAsyncDisposable for resource cleanup.
  • REST layer: Added replication endpoints to WeaviateRestClient (ReplicateAsync, ReplicationDetailsAsync, ListReplicationsAsync, CancelReplicationAsync, DeleteReplicationAsync, DeleteAllReplicationsAsync) plus URL builders in Endpoints.cs.
  • Error handling: Extended ResourceType with Replication and added new internal WeaviateNotFoundException overload supporting contextual data (operationId).
  • Documentation: Added comprehensive guides docs/REPLICATION_API_USAGE.md and implementation summary docs/REPLICATION_IMPLEMENTATION_SUMMARY.md.
  • Tests: Added integration test suite TestReplication (covers start, get, list/filter, cancel, delete, bulk delete, tracker properties, external cancellation detection, successful completion). Uses dedicated ports to avoid conflicts with other suites.
  • CI / Compose: Added multi-node cluster Docker Compose configuration exposing ports 8087, 8088, 8089 for replication operations.

API Surface

// Start async and track
var tracker = await client.Cluster.Replicate(new ReplicateRequest(
    Collection: "Articles",
    Shard: "shard-xyz",
    SourceNode: "node1",
    TargetNode: "node2",
    Type: ReplicationType.Copy));

// Synchronous wait pattern
var final = await client.Cluster.ReplicateSync(request, timeout: TimeSpan.FromMinutes(10));

// Management
await client.Cluster.Replications.Get(id);
await client.Cluster.Replications.List(collection: "Articles", shard: "shard-xyz", targetNode: "node2");
await client.Cluster.Replications.Cancel(id);
await client.Cluster.Replications.Delete(id);
await client.Cluster.Replications.DeleteAll();

Naming Decision (Post-Rebase Adjustment)

During rebase a temporary alias NodesVerbose was introduced to bridge earlier usage. This alias has been removed and all usages now rely on the canonical client.Cluster.Nodes.ListVerbose(...) method. Documentation and tests were updated accordingly. No deprecated method will ship—avoiding API clutter.

Documentation Additions

  • REPLICATION_API_USAGE.md: 400+ line usage guide (overview, types, initiating operations, monitoring, management, filtering, advanced patterns, best practices).
  • REPLICATION_IMPLEMENTATION_SUMMARY.md: Architectural summary (files, patterns, tracker design, future enhancements).
    Updated existing replication references to use ListVerbose consistently.

Testing & Verification

  • Integration tests require Weaviate ≥ 1.32.0 (replication endpoint availability); enforced via RequireVersion("1.32.0").
  • Tests validate: create/get, cancel, delete (with async disappearance polling), list with filters, delete-all cleanup, tracker completion, external cancellation detection.
  • Tracker auto-poll interval default: 500ms (configurable via ReplicationClientConfig).
  • Build passes post-refactor (removal of the alias). CI status checks are re-running after force push.

Migration Notes for Users

  • No breaking changes for existing cluster functionality.
  • To adopt replication: ensure server version ≥ 1.32.0; use ListVerbose(collection) to discover shards and node names.
  • Replace any experimental NodesVerbose() references (if copied from early branch previews) with Nodes.ListVerbose().

Error Handling Patterns

  • ReplicateAsync: expects 200 on creation; subsequent GET used for full details.
  • ReplicationDetailsAsync: returns null on 404 (non-existent operation); tracker converts missing operations to WeaviateNotFoundException when refreshing.
  • Cancel/Delete operations are idempotent (204 regardless of prior state) matching OpenAPI spec.

Performance Considerations

  • Background polling is lightweight (single request per interval) and stops automatically on terminal state.
  • Users performing many simultaneous replications can increase PollInterval to reduce load.

Future Enhancements (Deferred)

  • Event-driven updates (webhook / server push) to replace polling.
  • Batch / composite replication operations.
  • Progress metrics (bytes or objects transferred).
  • Retry/backoff policies for transient network errors in tracker loop.

Breaking Changes

None—purely additive features plus internal extension of error handling.

Review Checklist

  • API surface adheres to existing fluent + async patterns.
  • No modification of generated DTO files.
  • All public types documented.
  • RBAC rebase conflicts resolved; build green.
  • Temporary alias removed; docs/tests aligned.

Please focus review on: API ergonomics (Replicate vs ReplicateSync), tracker lifecycle/disposal semantics, endpoint status handling, and naming consistency.


Let me know if you'd prefer splitting docs into shorter sections or squashing commits before merge.

Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca

@mpartipilo mpartipilo force-pushed the feat/cluster-api branch 3 times, most recently from 8837868 to 01343c1 Compare November 11, 2025 01:05
…nd DTO conversions

- Enhanced PropertyHelper to support nested properties and collection types.
- Introduced DataTypeForCollectionType method for better handling of array types.
- Updated DataTypeForType method to streamline type resolution.
- Refactored ObjectHelper to handle GeoCoordinate and PhoneNumber conversions more efficiently.
- Added extension methods for converting models to DTOs and vice versa.
- Improved error handling for unsupported property types.
- Updated QueryClient to simplify object fetching logic.
- Enhanced gRPC result handling for date and phone number types.
- Introduced ReplicationsClient for handling replication operations.
- Added methods for initiating, listing, cancelling, and deleting replication operations.
- Implemented ReplicationOperationTracker for monitoring replication status asynchronously.
- Enhanced error handling with WeaviateNotFoundException for replication resources.
- Created models for replication operations, including ReplicationType and ReplicationOperationState.
- Updated WeaviateRestClient with new endpoints for replication operations.
- Added necessary DTOs for replication requests and responses.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants