-
Notifications
You must be signed in to change notification settings - Fork 3
Add metadata-first collection infrastructure for efficient list syncing #1062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Design doc for a new generic collection type that uses lightweight metadata fetches (id + modified_gmt) to define list structure, then selectively fetches only missing or stale entities. Key design decisions: - Metadata defines list structure (enables loading placeholders) - KV store for metadata persistence (memory or disk-backed) - Generic over entity type (posts, media, etc.) - Batch fetch via `include` param for efficiency
Foundation types for metadata-based sync: - `SyncableEntity` trait: entities with `id` and `modified_gmt` fields - `EntityMetadata<Id>`: lightweight struct holding just id + modified_gmt Also adds `Clone` and `Copy` derives to `WpGmtDateTime` since the inner `DateTime<Utc>` is `Copy`.
Represents items in a metadata-backed list where entities may be: - `Loaded`: full entity available from cache - `Loading`: fetch in progress, shows placeholder with metadata - `Failed`: fetch failed, includes error message for retry UI Includes `HasId` helper trait for extracting IDs from loaded entities.
Abstracts metadata persistence so we can swap between in-memory and disk-backed storage. The in-memory implementation is useful for prototyping; can be replaced with SQLite/file-based later. Includes unit tests for all KvStore operations.
Lightweight fetch that returns only id + modified_gmt for posts, enabling the metadata-first sync strategy. Unlike `fetch_posts_page`, this does NOT upsert to the database - the metadata is used transiently to determine which posts need full fetching. Also adds `Hash` derive to `wp_content_i64_id` and `wp_content_u64_id` macros so ID types can be used as HashMap keys.
Batch fetch full post data for specific IDs using the `include` parameter. Used for selective sync - fetching only posts that are missing or stale in the cache. Returns early with empty Vec if no IDs provided.
Core collection type that: - Uses KV store to persist list metadata (id + modified_gmt) - Builds list with Loaded/Loading states based on cache status - Provides methods to find missing/stale entities for selective fetch - Supports pagination with append for subsequent pages Includes comprehensive unit tests for all operations.
- Format derive macros across multiple lines - Use or_default() instead of or_insert_with(Vec::new) - Add type aliases for complex closure types - Collapse nested if-let into single condition - Simplify test entity to unit struct
Documents what was built, where each component lives, test coverage, and differences from the original sketch. Also lists next steps.
Consolidated design after discussion covering: - Service-owned stores (`EntityStateStore`, `ListMetadataStore`) - Read-only traits for collection access - `MetadataCollection<F>` generic only over fetcher - Cross-collection state consistency - State transitions (Missing/Fetching/Cached/Stale/Failed) Also updated v1 doc with intermediate v2 notes.
Implements Phase 1 of the v3 MetadataCollection design: - `EntityMetadata` - Non-generic struct with `i64` id + `Option<WpGmtDateTime>` (optional modified_gmt for entities like Comments that lack this field) - `EntityState` - Enum tracking fetch lifecycle (Missing, Fetching, Cached, Stale, Failed) - `CollectionItem` - Combines metadata with state for list items - `SyncResult` - Result of sync operations with counts and pagination info - `MetadataFetchResult` - Updated to non-generic version Removes superseded prototype code: - Old generic `EntityMetadata<Id>` - `KvStore<Id>` trait and `InMemoryKvStore<Id>` - `ListItem<T, Id>` enum - `MetadataCollection<T, Id>` (old version) - `SyncableEntity` trait Updates `PostService::fetch_posts_metadata` to use new non-generic types.
Implements Phase 2 of the v3 MetadataCollection design: - `EntityStateStore` - Memory-backed store for entity fetch states - Maps `i64` ID to `EntityState` (Missing, Fetching, Cached, etc.) - Thread-safe via `RwLock<HashMap>` - `filter_fetchable()` excludes currently-fetching IDs to prevent duplicates - `EntityStateReader` trait - Read-only access for collections - `ListMetadataStore` - Memory-backed store for list structure - Maps filter key (String) to `Vec<EntityMetadata>` - Supports `set` (replace) and `append` (pagination) - `ListMetadataReader` trait - Read-only access for collections Both stores are memory-only; state resets on app restart.
Implements Phase 3 of the v3 MetadataCollection design: - `MetadataFetcher` trait - Async trait for fetching metadata and entities - `fetch_metadata(page, per_page, is_first_page)` - Fetch list structure - `ensure_fetched(ids)` - Fetch full entities by ID - `MetadataCollection<F>` - Generic collection over fetcher type - `refresh()` - Fetch page 1, replace metadata, sync missing - `load_next_page()` - Fetch next page, append, sync missing - `items()` - Get `CollectionItem` list with states - `is_relevant_update()` - Check DB updates for relevance - Batches large fetches into 100-item chunks (API limit) Also adds `tokio` as dev-dependency for async tests.
Add metadata sync infrastructure to PostService for efficient list syncing: - Add `state_store_with_edit_context` field for tracking per-entity fetch state (Missing, Fetching, Cached, Stale, Failed). Each context needs its own state store since the same entity can have different states across contexts. - Add `metadata_store` field for list structure per filter key. Shared across all contexts - callers include context in the key string (e.g., "site_1:edit:posts:publish"). - Add `fetch_and_store_metadata()` method that fetches lightweight metadata (id + modified_gmt) and stores it in the metadata store. - Update `fetch_posts_by_ids()` to track entity state: - Filters out already-fetching IDs to prevent duplicate requests - Sets Fetching state before API call - Sets Cached on success, Failed on error or missing posts - Add `PostMetadataFetcherWithEditContext` implementing `MetadataFetcher` trait, delegating to PostService methods. - Add reader accessor methods for collections to get read-only access: `state_reader_with_edit_context()`, `metadata_reader()`, `get_entity_state_with_edit_context()`.
Create the concrete type that wraps MetadataCollection for UniFFI:
- Add `PostMetadataCollectionWithEditContext` struct combining:
- `MetadataCollection<PostMetadataFetcherWithEditContext>` for sync logic
- Service reference for loading full entity data
- Filter for this collection
- Add `PostMetadataCollectionItem` record type with:
- `id`: Post ID
- `state`: EntityState (Missing, Fetching, Cached, Stale, Failed)
- `data`: Optional FullEntityAnyPostWithEditContext
- Add `create_post_metadata_collection_with_edit_context` to PostService
- Make types UniFFI-compatible:
- Add `uniffi::Enum` to EntityState
- Add `uniffi::Record` to SyncResult (change usize to u64)
- Use interior mutability (RwLock<PaginationState>) in MetadataCollection
for compatibility with UniFFI's Arc-wrapped objects
- Add `read_posts_by_ids_from_db` helper to PostService for bulk loading
- Document state representation approaches in design doc:
- Two-dimensional (DataState + FetchStatus)
- Flattened explicit states enum
Add Kotlin wrapper for PostMetadataCollectionWithEditContext to enable reactive UI updates when database changes occur. Changes: - Add `ObservableMetadataCollection` class with observer pattern - Update `DatabaseChangeNotifier` to support metadata collections - Add `getObservablePostMetadataCollectionWithEditContext` extension on `PostService` - Update implementation plan with Phase 7 completion
Add example screen demonstrating the metadata-first sync strategy with visual state indicators for each item (Missing, Fetching, Cached, Stale, Failed). Changes: - Add `PostMetadataCollectionViewModel` with manual refresh/loadNextPage controls - Add `PostMetadataCollectionScreen` with filter controls and state indicators - Wire up navigation and DI for the new screen - Update implementation plan marking Phase 8 complete
WordPress REST API defaults to filtering by 'publish' status, which caused drafts, pending, and other non-published posts to return "Not found" when fetching by ID. Changes: - Add explicit status filter including all standard post statuses (publish, draft, pending, private, future)
When fetching metadata, compare the API's `modified_gmt` against cached posts in the database. Posts with different timestamps are marked as `Stale`, triggering a re-fetch on the next sync. Changes: - Add `select_modified_gmt_by_ids` to PostRepository for efficient batch lookup - Add `detect_and_mark_stale_posts` to PostService for staleness comparison - Call staleness detection in `fetch_and_store_metadata` after storing metadata
Design for moving list metadata from in-memory KV store to database: - Three DB tables: list_metadata, list_metadata_items, list_metadata_state - MetadataService owns list storage, state transitions, version management - PostService orchestrates sync, owns entity state, does staleness detection - Split observers for data vs state changes in Kotlin wrapper - Version-based concurrency control for handling concurrent refreshes
Detailed plan with 5 phases, 17 commits, ordered from low-level to high-level: - Phase 1: Database foundation (DbTable, migration, types, repository) - Phase 2: MetadataService in wp_mobile - Phase 3: Integration with PostService, collection refactor - Phase 4: Observer split (data vs state) - Phase 5: Testing and cleanup Includes dependency order, risk areas, and verification checkpoints.
Implement the database layer for list metadata storage, replacing the in-memory KV store. This enables proper observer patterns and persistence between app launches. Database schema (3 tables): - list_metadata: headers with pagination, version for concurrency control - list_metadata_items: ordered entity IDs (rowid = display order) - list_metadata_state: sync state (idle, fetching_first_page, fetching_next_page, error) Changes: - Add DbTable variants: ListMetadata, ListMetadataItems, ListMetadataState - Add migration 0007-create-list-metadata-tables.sql - Add list_metadata module with DbListMetadata, DbListMetadataItem, DbListMetadataState structs and ListState enum - Add db_types/db_list_metadata.rs with column enums and from_row impls - Add repository/list_metadata.rs with read and write operations
Add helper methods for atomic state transitions during sync operations: - `begin_refresh()`: Atomically increment version, set state to FetchingFirstPage, and return info needed for the fetch - `begin_fetch_next_page()`: Check pagination state, set state to FetchingNextPage, and return page number and version for stale check - `complete_sync()`: Set state to Idle on success - `complete_sync_with_error()`: Set state to Error with message These helpers ensure correct state transitions and enable version-based concurrency control to detect when a refresh invalidates an in-flight load-more operation. Also adds `RefreshInfo` and `FetchNextPageInfo` structs to encapsulate the data returned from begin operations.
Implement MetadataService in wp_mobile to provide persistence for list metadata (ordered entity IDs, pagination, sync state). This enables data to survive app restarts unlike the in-memory ListMetadataStore. The service wraps ListMetadataRepository and implements ListMetadataReader trait, allowing MetadataCollection to use either memory or database storage through the same interface. Features: - Read operations: get_entity_ids, get_metadata, get_state, get_pagination - Write operations: set_items, append_items, update_pagination, delete_list - State management: set_state, complete_sync, complete_sync_with_error - Concurrency helpers: begin_refresh, begin_fetch_next_page Also fixes pre-existing clippy warnings in posts.rs (collapsible_if).
Add MetadataService as a field in PostService to provide database-backed list metadata storage. This enables list structure and pagination to persist across app restarts. Changes: - Add `metadata_service` field to PostService - Add `persistent_metadata_reader()` and `metadata_service()` accessors - Add `sync_post_list()` method that orchestrates full sync flow using MetadataService for persistence - Extend SyncResult with `current_page` and `total_pages` fields for pagination tracking The existing in-memory `metadata_store` is preserved for backwards compatibility with existing code paths. Future work will migrate callers to use the persistent service.
Mark completed phases and update with actual commit hashes: - Phase 1 (Database Foundation): Complete - Phase 2 (MetadataService): Complete - Phase 3 (Integration): Partial (3.2 done, 3.1 deferred) - Phase 5 (Testing): Partial (tests inline with implementation) Add status summary table and update dependency diagram with completion markers.
Replace `let _ = self.complete_sync_with_error(...)` pattern with proper error logging. This ensures we're notified if cleanup operations fail, rather than silently ignoring potential issues that could leave lists stuck in incorrect states.
Move entity fetching responsibility from MetadataCollection to PostService. The service layer now decides whether to fetch missing/stale entities based on the SyncStrategy parameter. Changes: - Add SyncStrategy enum (MetadataOnly, Full) to control sync behavior - Add sync_list and sync_list_with_strategy to PostService - Simplify MetadataFetcher trait to single sync() method - Remove sync_missing_and_stale from MetadataCollection - Update PersistentPostMetadataFetcherWithEditContext to delegate to sync_list
Replace trait-based MetadataFetcher with composition pattern: - Rename `MetadataCollection` to `MetadataCollectionCore` - Remove generic parameter - core handles query infrastructure only - Entity-specific collections compose core and own their fields - `PostMetadataCollectionWithEditContext` now stores filter, endpoint_type - Delete `MetadataFetcher` trait and `PersistentPostMetadataFetcherWithEditContext` This simplifies the architecture by eliminating the indirection through the fetcher trait. Entity-specific collections call the service directly for sync operations while delegating query methods to the core.
- Add metadata_collection_architecture.md explaining design decisions - Add DOCUMENTATION_UPDATE_REPORT.md identifying outdated docs - Remove metadata_collection_composition.md (implementation doc)
Refactor the sync module to use proper Option types instead of magic values, improving type safety and clarity throughout the codebase. Type System Improvements: - Replace `current_page: u32` (0 = not loaded) with `Option<u32>` (None = not loaded) - Replace `has_more_pages: bool` with `Option<bool>` to distinguish "unknown" from "false" - Handle type conversions at DB boundary (i64 -> u32) cleanly in one place - Update FFI types (`SyncResult`, `ListInfo`) to use Option types Code Organization: - Move `SyncStrategy` enum to its own module file - Remove trivial unit tests that only verify constructors and basic Rust features - Keep meaningful tests that verify business logic and edge cases Documentation: - Add clear doc comments explaining None vs Some semantics - Update Kotlin extension functions to handle nullable types properly - Improve comments to explain "why" rather than restate "what" Breaking Changes (FFI): - `SyncResult.current_page`: u32 -> Option<u32> - `SyncResult.has_more_pages`: bool -> Option<bool> - `ListInfo.current_page`: i64 -> Option<u32> - `PostMetadataCollectionWithEditContext.current_page()`: u32 -> Option<u32> - `PostMetadataCollectionWithEditContext.has_more_pages()`: bool -> Option<bool> - `PostService.createPostMetadataCollectionWithEditContext()`: added `per_page` parameter Kotlin clients updated to handle new Option types properly.
…ling Refactored metadata and post services to eliminate code smells, improve type safety, and ensure correct state management. This polish pass focuses on the two most critical services in the metadata collection feature. Changes: - Use `u32` for page/count types in service layer, convert to `i64` only at repository boundary - Add `FetchByIdsResult` struct to replace unclear tuple return type - Fix atomicity: handle database upsert failures in `fetch_posts_by_ids` to prevent stuck `Fetching` state - Fix race condition: don't modify state on version mismatch in `load_more` - Move failure counting into `fetch_posts_by_ids` where state transitions occur - Add `execute_with_error_handling` helper to eliminate repetitive error handling - Add `to_list_metadata_items` helper to eliminate duplicate conversion logic - Add `FIRST_PAGE` constant to replace magic value - Change `current_page` from `i64` to `Option<u32>` for correct semantics - Document status filter as UI-level decision (excludes trash, not a network default) - Add comprehensive unit tests for staleness detection and state transitions - Make `detect_and_mark_stale_posts` and `fetch_missing_and_stale_posts` `pub(crate)` for testing
Improves naming consistency to clearly communicate function contracts: Breaking Changes: - `fetch_posts_page` → `sync_posts_page` (fetches + saves to DB) - `fetch_posts_by_ids` → `load_posts_by_ids` (fetches + saves + state mgmt) - `fetch_posts_metadata` → `pub(crate)` (internal implementation detail) Documentation: - Remove layering violations (lower-level functions no longer document their callers) - Focus on what each function does, not how it's used downstream - Update design docs to reflect new naming API Convention Established: - `fetch_x`: Network only, no DB writes (reserved for future) - `sync_x`: Fetch from network + save to DB - `load_x`: Fetch + save + state management - `read_x`: DB reads only - `get_x`: Return handles/references - `create_x`: Create new objects - `delete_x`: Delete from DB Rationale: Function names should explicitly communicate side effects. "Fetch" implies read-only operations, while "sync" and "load" signal mutations (DB writes, state management).
Breaking Changes: - `detect_and_mark_stale_posts()` → `find_stale_posts_by_timestamp()` Improvements: 1. Renamed function to explicitly state detection method (timestamp comparison) 2. Separated detection logic from state mutation 3. Function now returns Vec<i64> of stale IDs instead of mutating state directly 4. Caller controls what to do with stale IDs (mark, log, fetch, etc.) 5. Added state_reader parameter for explicit dependency injection Benefits: - Clear single responsibility: detection only - Testable without side effects - Composable: can reuse in other contexts - Caller has full control over state mutations - Name clearly communicates "how" (by timestamp) and "what" (find stale) Architecture: Following the principle of separating detection from action: - Detection functions return data (findings) - Callers decide what actions to take with that data - No hidden side effects or implicit mutations
Removed helpers: - post_ids_to_i64: Trivial one-liner, used once - i64_to_post_ids: Trivial one-liner, used once - upsert_posts_and_collect_ids: Non-trivial but only 2 usages Rationale: 1. Trivial helpers add context-switching cost without clarity benefit - Inline version is just as clear: ids.iter().map(|id| id.0).collect() 2. Borderline helpers create discoverability problems - Future developers won't know they exist, duplicating logic anyway - Not worth maintaining for 2 usages 3. Inlining allows context-specific customization - Can add better error messages per call site - Each usage site is now self-contained and obvious Philosophy: Prefer obvious local code over DRY when extraction doesn't meaningfully improve clarity or maintenance burden.
Changes to new tests in this branch:
1. Move use statements to module level
- Consolidates all imports at top of test module
- Removes inline use statements from individual tests
2. Shorten test names for better readability
- test_find_stale_posts_by_timestamp_ignores_posts_without_modified_gmt_in_metadata
→ test_find_stale_requires_modified_gmt (73 → 35 chars)
- test_find_stale_posts_by_timestamp_ignores_non_cached_posts
→ test_find_stale_only_checks_cached_posts (56 → 40 chars)
3. Remove redundant state verification
- Removed unnecessary state checks after asserting empty return
- Tests verify function output, not side effects that don't occur
- find_stale_posts_by_timestamp returns IDs, doesn't mutate state
4. Extract setup helper for network error test
- Created service_with_network_error() helper
- Reduces test from 61 lines to 22 lines (64% reduction)
- Separates complex setup from test logic
- Setup-to-test ratio improved from 2.4:1 to balanced
Result: Cleaner, more focused tests that are easier to understand.
Removes helper with only 2 usages (lines 347, 489) per discoverability guidelines. Reasoning: - Medium complexity helper needs 3+ usages AND discoverability - Only 2 call sites makes future discovery unlikely - Inline code with comment is clear and self-contained - Maintains consistency with helper function guidelines Changes: - Removed to_list_metadata_items() function - Inlined conversion at both call sites in refresh() and load_more() - Added clarifying comment: "Convert domain EntityMetadata to database ListMetadataItemInput" Result: More direct code without sacrificing clarity.
Make QueryPairsExtension trait public in wp_api to allow reuse across workspace, removing the duplicated QueryPairsExt trait from wp_mobile. Also replace println debug statements in PostMetadataCollection with proper log::debug calls.
Move the EntityState + data → PostItemState conversion logic from inline implementation in load_items() into dedicated From trait implementations. This makes the conversion logic reusable, testable, and provides a clear pattern for colleagues to follow when implementing other entity collections. Reduces load_items() from 66 lines to 33 lines while improving clarity.
…strator - Add load_next_page_with() method to MetadataCollectionCore - Refactor PostMetadataCollectionWithEditContext to use core orchestrator - Reduces duplication for future collection types
- Rename sync/metadata_collection.rs to collection/core.rs - Update imports to use crate::sync instead of super:: - Export MetadataCollectionCore from collection module - Remove export from sync module - Update all references in post_metadata_collection.rs and service/posts.rs More developer-friendly organization: collection code lives together.
- Extend wp_mobile_metadata_item! macro to generate both From traits: 1. From<(EntityState, Option<FullEntity>)> for ItemState 2. From<(CollectionItem, Option<FullEntity>)> for MetadataCollectionItem - Remove manual From implementations from post_metadata_collection.rs - Eliminates ~63 lines of boilerplate per entity type - Ensures consistent conversion logic across all future entity types
These docs were already outdated and would require constant maintenance: - Referenced deleted APIs (fetch_and_store_metadata_persistent, MetadataFetcher) - Referenced old architecture (MetadataCollection<F> generic) - Mixed implementation notes with design rationale The architecture is now documented through: - Code structure and comprehensive doc comments - Type system (composition, trait bounds) - Commit messages explaining design decisions - Diagram will be included in PR description as a visual reference
Changed from KDoc links `[PostItemState.Cached]` to backticks `Cached` since sealed class variants don't work with KDoc link syntax. This makes the documentation clearer and prevents broken link references. The @Suppress("TooManyFunctions") remains necessary - detekt threshold is 11 functions, ObservableMetadataCollection has 12 (which is correct for the dual-observer pattern: 3 add + 3 remove + data/listInfo/both).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a new "metadata-first" sync strategy for list collections, enabling efficient fetching and caching of WordPress entity lists (posts, pages, etc.) by separating lightweight metadata from full entity data.
Why?
The existing collection pattern fetches full entity data for entire lists. This new approach offers an alternative that:
Missing,Fetching,Cached,Stale,Failed)Two-phase sync:
id+modified_gmt) to define list structureArchitecture
Key components:
MetadataCollectionCore- Shared query infrastructure joining metadata + statePostMetadataCollectionWithEditContext- Entity-specific collection with sync logicMetadataService- Persistent list metadata management with paginationEntityStateStore- In-memory per-entity fetch state trackingObservableMetadataCollection(Kotlin) - Reactive wrapper with split data/state observersWhat's New
This is almost entirely new code - a parallel implementation that doesn't modify existing collection infrastructure:
list_metadata,list_metadata_items,list_metadata_statetablesListMetadataRepositorywith concurrency helpersMetadataServiceorchestrating refresh/load-more workflowsMetadataCollectionCore+ entity-specific collectionsObservableMetadataCollectionwith dual-observer pattern