-
Notifications
You must be signed in to change notification settings - Fork 0
fix: create collections lazily, remove qdrant config, update docker compose #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughImplements lazy, thread-safe Qdrant collection creation in the import tool, revises overwrite handling to drop existing collections conditionally, ensures collections before upserts, updates collection config (shards, segments, quantization, payload indexing), and removes external Qdrant config by deleting the compose volume bind and the qdrant-config.yaml file. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as Import CLI
participant QC as Qdrant Client
participant QS as Qdrant Server
User->>CLI: Run import
rect rgba(230,247,255,0.5)
note over CLI: For each target collection during insert
CLI->>CLI: ensureCollectionExists(name)
alt Not cached
CLI->>QS: Check collection exists
opt Missing
CLI->>QS: Create collection (shards, segments, quantization, payload indexes)
end
CLI->>CLI: Cache created/exists
else Cached
CLI->>CLI: Skip check/create
end
end
CLI->>QC: Upsert points
QC->>QS: Upsert request
QS-->>QC: Ack
QC-->>CLI: Result
sequenceDiagram
autonumber
actor User
participant CLI as Import CLI
participant QS as Qdrant Server
User->>CLI: Run import -overwrite
note over CLI: For each collection in scope
loop Collections
CLI->>QS: Check exists
alt Exists
CLI->>QS: Drop collection
QS-->>CLI: Drop result
else Not found
CLI->>CLI: No action (lazy create later)
end
end
note over CLI: Proceed with lazy creation during inserts
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cmd/import/main.go(5 hunks)docker-compose.qdrant.yml(0 hunks)qdrant-config.yaml(0 hunks)
💤 Files with no reviewable changes (2)
- docker-compose.qdrant.yml
- qdrant-config.yaml
🔇 Additional comments (5)
cmd/import/main.go (5)
58-61: LGTM! Proper concurrent state management.The global map and mutex are correctly initialized for tracking created collections in a thread-safe manner. This is an appropriate pattern for lazy initialization in a CLI tool with concurrent workers.
349-352: LGTM! Optimized shard and segment configuration.Reducing
ShardNumberto 2 andDefaultSegmentNumberto 4 prevents memory spikes during bulk import, aligning with the PR's goal to optimize resource usage.
357-363: LGTM! Binary quantization for memory efficiency.Adding binary quantization with
AlwaysRamkeeps quantized vectors in memory for fast access while reducing memory footprint. This is an appropriate optimization for vector search workloads.
370-398: LGTM! Payload indexing improves query performance.Creating indexes on
purl,version,url,category, andrankfields enables efficient filtering and grouping operations. The error handling with warning logs is appropriate for index creation.
665-672: LGTM! Lazy creation correctly implemented.Collections are created on-demand before the first upsert, preventing unnecessary memory allocation for unused language collections. The error handling properly propagates failures. This approach aligns with the PR objective to prevent memory spikes.
| if *overwrite { | ||
| for _, collectionName := range collections { | ||
| collectionExists, err := client.CollectionExists(ctx, collectionName) | ||
| if err != nil { | ||
| cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err) | ||
| log.Printf("Error checking if collection %s exists: %v", collectionName, err) | ||
| return | ||
| } | ||
| log.Printf("Collection '%s' dropped successfully", collectionName) | ||
| collectionExists = false | ||
| } | ||
|
|
||
| // Create collection if it doesn't exist | ||
| if !collectionExists { | ||
| createCollection(ctx, client, collectionName) | ||
| } else { | ||
| log.Printf("Using existing collection: %s", collectionName) | ||
| if collectionExists { | ||
| log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName) | ||
| err = client.DeleteCollection(ctx, collectionName) | ||
| if err != nil { | ||
| cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err) | ||
| } | ||
| log.Printf("Collection '%s' dropped successfully", collectionName) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incomplete error handling in overwrite loop.
Line 132 uses a bare return when checking collection existence fails, which exits the entire overwrite loop early. This leaves the system in an inconsistent state where some collections may be dropped but others are never checked.
Apply this diff to ensure consistent error handling:
for _, collectionName := range collections {
collectionExists, err := client.CollectionExists(ctx, collectionName)
if err != nil {
- log.Printf("Error checking if collection %s exists: %v", collectionName, err)
- return
+ cleanupAndExit(client, "Error checking if collection %s exists: %v", collectionName, err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if *overwrite { | |
| for _, collectionName := range collections { | |
| collectionExists, err := client.CollectionExists(ctx, collectionName) | |
| if err != nil { | |
| cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err) | |
| log.Printf("Error checking if collection %s exists: %v", collectionName, err) | |
| return | |
| } | |
| log.Printf("Collection '%s' dropped successfully", collectionName) | |
| collectionExists = false | |
| } | |
| // Create collection if it doesn't exist | |
| if !collectionExists { | |
| createCollection(ctx, client, collectionName) | |
| } else { | |
| log.Printf("Using existing collection: %s", collectionName) | |
| if collectionExists { | |
| log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName) | |
| err = client.DeleteCollection(ctx, collectionName) | |
| if err != nil { | |
| cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err) | |
| } | |
| log.Printf("Collection '%s' dropped successfully", collectionName) | |
| } | |
| } | |
| } | |
| if *overwrite { | |
| for _, collectionName := range collections { | |
| collectionExists, err := client.CollectionExists(ctx, collectionName) | |
| if err != nil { | |
| cleanupAndExit(client, "Error checking if collection %s exists: %v", collectionName, err) | |
| } | |
| if collectionExists { | |
| log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName) | |
| err = client.DeleteCollection(ctx, collectionName) | |
| if err != nil { | |
| cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err) | |
| } | |
| log.Printf("Collection '%s' dropped successfully", collectionName) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In cmd/import/main.go around lines 127 to 144, the overwrite loop currently uses
a bare `return` when CollectionExists returns an error which aborts the loop
without consistent cleanup; replace that bare `return` with a call to
`cleanupAndExit(client, "Error checking if collection %s exists: %v",
collectionName, err)` so the error is handled consistently (matching the other
error paths), ensuring proper cleanup and process exit when the existence check
fails.
| // ensureCollectionExists checks if a collection exists and creates it lazily if not. | ||
| func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error { | ||
| // Check if we've already created this collection | ||
| collectionCreateLock.Lock() | ||
| if createdCollections[collectionName] { | ||
| collectionCreateLock.Unlock() | ||
| return nil | ||
| } | ||
| collectionCreateLock.Unlock() | ||
|
|
||
| // Check if collection exists in Qdrant | ||
| exists, err := client.CollectionExists(ctx, collectionName) | ||
| if err != nil { | ||
| return fmt.Errorf("error checking collection existence: %w", err) | ||
| } | ||
|
|
||
| if exists { | ||
| // Mark as created to avoid checking again | ||
| collectionCreateLock.Lock() | ||
| createdCollections[collectionName] = true | ||
| collectionCreateLock.Unlock() | ||
| log.Printf("Collection %s already exists, using it", collectionName) | ||
| return nil | ||
| } | ||
|
|
||
| // Collection doesn't exist, create it (thread-safe) | ||
| collectionCreateLock.Lock() | ||
| defer collectionCreateLock.Unlock() | ||
|
|
||
| // Double-check pattern: another goroutine might have created it | ||
| if createdCollections[collectionName] { | ||
| return nil | ||
| } | ||
|
|
||
| log.Printf("Creating collection: %s", collectionName) | ||
| createCollection(ctx, client, collectionName) | ||
| createdCollections[collectionName] = true | ||
| log.Printf("Collection %s created and marked as ready", collectionName) | ||
|
|
||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: createCollection terminates program in concurrent context.
Line 300 calls createCollection which uses cleanupAndExit on errors, terminating the entire program. In this concurrent context with multiple worker goroutines, a collection creation failure will abruptly kill all workers mid-processing, potentially losing in-flight data.
Refactor to return errors gracefully:
Step 1: Modify createCollection to return an error instead of calling cleanupAndExit:
-func createCollection(ctx context.Context, client *qdrant.Client, collectionName string) {
+func createCollection(ctx context.Context, client *qdrant.Client, collectionName string) error {
log.Printf("Creating language-based collection with named vectors: %s", collectionName)
// ... rest of function ...
})
if err != nil {
- cleanupAndExit(client, "Error creating collection %s: %v", collectionName, err)
+ return fmt.Errorf("error creating collection %s: %w", collectionName, err)
}
log.Printf("Collection '%s' with named vectors created successfully", collectionName)
+ return nil
}Step 2: Update Line 300 to handle the error:
log.Printf("Creating collection: %s", collectionName)
- createCollection(ctx, client, collectionName)
+ if err := createCollection(ctx, client, collectionName); err != nil {
+ return err
+ }
createdCollections[collectionName] = trueThis allows worker goroutines to handle creation failures gracefully and report errors through the errorsChan mechanism already in place (lines 171, 187).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // ensureCollectionExists checks if a collection exists and creates it lazily if not. | |
| func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error { | |
| // Check if we've already created this collection | |
| collectionCreateLock.Lock() | |
| if createdCollections[collectionName] { | |
| collectionCreateLock.Unlock() | |
| return nil | |
| } | |
| collectionCreateLock.Unlock() | |
| // Check if collection exists in Qdrant | |
| exists, err := client.CollectionExists(ctx, collectionName) | |
| if err != nil { | |
| return fmt.Errorf("error checking collection existence: %w", err) | |
| } | |
| if exists { | |
| // Mark as created to avoid checking again | |
| collectionCreateLock.Lock() | |
| createdCollections[collectionName] = true | |
| collectionCreateLock.Unlock() | |
| log.Printf("Collection %s already exists, using it", collectionName) | |
| return nil | |
| } | |
| // Collection doesn't exist, create it (thread-safe) | |
| collectionCreateLock.Lock() | |
| defer collectionCreateLock.Unlock() | |
| // Double-check pattern: another goroutine might have created it | |
| if createdCollections[collectionName] { | |
| return nil | |
| } | |
| log.Printf("Creating collection: %s", collectionName) | |
| createCollection(ctx, client, collectionName) | |
| createdCollections[collectionName] = true | |
| log.Printf("Collection %s created and marked as ready", collectionName) | |
| return nil | |
| } | |
| // ensureCollectionExists checks if a collection exists and creates it lazily if not. | |
| func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error { | |
| // Check if we've already created this collection | |
| collectionCreateLock.Lock() | |
| if createdCollections[collectionName] { | |
| collectionCreateLock.Unlock() | |
| return nil | |
| } | |
| collectionCreateLock.Unlock() | |
| // Check if collection exists in Qdrant | |
| exists, err := client.CollectionExists(ctx, collectionName) | |
| if err != nil { | |
| return fmt.Errorf("error checking collection existence: %w", err) | |
| } | |
| if exists { | |
| // Mark as created to avoid checking again | |
| collectionCreateLock.Lock() | |
| createdCollections[collectionName] = true | |
| collectionCreateLock.Unlock() | |
| log.Printf("Collection %s already exists, using it", collectionName) | |
| return nil | |
| } | |
| // Collection doesn't exist, create it (thread-safe) | |
| collectionCreateLock.Lock() | |
| defer collectionCreateLock.Unlock() | |
| // Double-check pattern: another goroutine might have created it | |
| if createdCollections[collectionName] { | |
| return nil | |
| } | |
| log.Printf("Creating collection: %s", collectionName) | |
| if err := createCollection(ctx, client, collectionName); err != nil { | |
| return err | |
| } | |
| createdCollections[collectionName] = true | |
| log.Printf("Collection %s created and marked as ready", collectionName) | |
| return nil | |
| } |
🤖 Prompt for AI Agents
In cmd/import/main.go around lines 265 to 305, the call to createCollection at
line ~300 can terminate the whole process via cleanupAndExit which is unsafe in
a concurrent worker context; change createCollection to return an error instead
of exiting on failure, remove any direct calls to cleanupAndExit from it, and
update this function to call createCollection(ctx, client, collectionName) and
check the returned error; if non-nil, release locks as appropriate and forward
the error to the worker error channel (errorsChan) or return the error up to the
caller so the worker goroutine can handle/report it gracefully rather than
killing the process.
Summary by CodeRabbit
New Features
Refactor
Chores