fix: create collections lazily, remove qdrant config, update docker compose #26

matiasdaloia · 2025-10-14T11:40:26Z

Summary by CodeRabbit

New Features
- Lazy, thread-safe creation of Qdrant collections, automatically created on insert/import.
- Overwrite option now drops existing collections individually with clearer feedback.
Refactor
- Updated collection defaults (fewer shards/segments), enabled quantization, and added payload indexing.
- Maintains post-import HNSW indexing with the new lazy creation flow.
- Enhanced logging for collection lifecycle events.
Chores
- Removed external Qdrant config file and its Docker bind mount to simplify setup.

…ompose

coderabbitai · 2025-10-14T11:40:52Z

Walkthrough

Implements lazy, thread-safe Qdrant collection creation in the import tool, revises overwrite handling to drop existing collections conditionally, ensures collections before upserts, updates collection config (shards, segments, quantization, payload indexing), and removes external Qdrant config by deleting the compose volume bind and the qdrant-config.yaml file.

Changes

Cohort / File(s)	Summary of changes
Import tool: lazy collection management and overwrite flow `cmd/import/main.go`	Adds ensureCollectionExists with double-checked locking and in-memory cache; switches insert/import paths to lazy creation; updates overwrite flag behavior to drop existing collections iteratively; adjusts createCollection settings (reduced shards/segments, quantization, payload indexing); logging updates.
Container orchestration: config bind removal `docker-compose.qdrant.yml`	Removes read-only bind mount of host Qdrant config (`./qdrant-config.yaml:/qdrant/config/production.yaml:ro`) from the qdrant service.
Qdrant config: file removal `qdrant-config.yaml`	Deletes the configuration file entirely; no replacement content.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as Import CLI
  participant QC as Qdrant Client
  participant QS as Qdrant Server

  User->>CLI: Run import
  rect rgba(230,247,255,0.5)
    note over CLI: For each target collection during insert
    CLI->>CLI: ensureCollectionExists(name)
    alt Not cached
      CLI->>QS: Check collection exists
      opt Missing
        CLI->>QS: Create collection (shards, segments, quantization, payload indexes)
      end
      CLI->>CLI: Cache created/exists
    else Cached
      CLI->>CLI: Skip check/create
    end
  end
  CLI->>QC: Upsert points
  QC->>QS: Upsert request
  QS-->>QC: Ack
  QC-->>CLI: Result

sequenceDiagram
  autonumber
  actor User
  participant CLI as Import CLI
  participant QS as Qdrant Server

  User->>CLI: Run import -overwrite
  note over CLI: For each collection in scope
  loop Collections
    CLI->>QS: Check exists
    alt Exists
      CLI->>QS: Drop collection
      QS-->>CLI: Drop result
    else Not found
      CLI->>CLI: No action (lazy create later)
    end
  end
  note over CLI: Proceed with lazy creation during inserts

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I hop through shards with gentle grace,
Creating homes only when in place.
If overwrite’s moon is bright, I sweep—
Old nests gone, new roots run deep.
No config scrolls, just burrow lore—
Lazily built, then indexed for more.
Thump-thump, imports soar! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly and accurately summarizes the main changes—introducing lazy creation of collections, removing the Qdrant config file, and updating the Docker Compose setup—in a concise, descriptive sentence.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/mdaloia/import-script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 05320e7 and 27ce608.

📒 Files selected for processing (3)

cmd/import/main.go (5 hunks)
docker-compose.qdrant.yml (0 hunks)
qdrant-config.yaml (0 hunks)

💤 Files with no reviewable changes (2)

docker-compose.qdrant.yml
qdrant-config.yaml

🔇 Additional comments (5)

cmd/import/main.go (5)

58-61: LGTM! Proper concurrent state management.

The global map and mutex are correctly initialized for tracking created collections in a thread-safe manner. This is an appropriate pattern for lazy initialization in a CLI tool with concurrent workers.

349-352: LGTM! Optimized shard and segment configuration.

Reducing ShardNumber to 2 and DefaultSegmentNumber to 4 prevents memory spikes during bulk import, aligning with the PR's goal to optimize resource usage.

357-363: LGTM! Binary quantization for memory efficiency.

Adding binary quantization with AlwaysRam keeps quantized vectors in memory for fast access while reducing memory footprint. This is an appropriate optimization for vector search workloads.

370-398: LGTM! Payload indexing improves query performance.

Creating indexes on purl, version, url, category, and rank fields enables efficient filtering and grouping operations. The error handling with warning logs is appropriate for index creation.

665-672: LGTM! Lazy creation correctly implemented.

Collections are created on-demand before the first upsert, preventing unnecessary memory allocation for unused language collections. The error handling properly propagates failures. This approach aligns with the PR objective to prevent memory spikes.

coderabbitai · 2025-10-14T11:44:52Z

cmd/import/main.go

+	if *overwrite {
+		for _, collectionName := range collections {
+			collectionExists, err := client.CollectionExists(ctx, collectionName)
 			if err != nil {
-				cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err)
+				log.Printf("Error checking if collection %s exists: %v", collectionName, err)
+				return
 			}
-			log.Printf("Collection '%s' dropped successfully", collectionName)
-			collectionExists = false
-		}

-		// Create collection if it doesn't exist
-		if !collectionExists {
-			createCollection(ctx, client, collectionName)
-		} else {
-			log.Printf("Using existing collection: %s", collectionName)
+			if collectionExists {
+				log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName)
+				err = client.DeleteCollection(ctx, collectionName)
+				if err != nil {
+					cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err)
+				}
+				log.Printf("Collection '%s' dropped successfully", collectionName)
+			}
 		}
 	}


⚠️ Potential issue | 🟠 Major

Fix incomplete error handling in overwrite loop.

Line 132 uses a bare return when checking collection existence fails, which exits the entire overwrite loop early. This leaves the system in an inconsistent state where some collections may be dropped but others are never checked.

Apply this diff to ensure consistent error handling:

for _, collectionName := range collections { collectionExists, err := client.CollectionExists(ctx, collectionName) if err != nil { - log.Printf("Error checking if collection %s exists: %v", collectionName, err) - return + cleanupAndExit(client, "Error checking if collection %s exists: %v", collectionName, err) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if *overwrite {

for _, collectionName := range collections {

collectionExists, err := client.CollectionExists(ctx, collectionName)

if err != nil {

cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err)

log.Printf("Error checking if collection %s exists: %v", collectionName, err)

return

}

log.Printf("Collection '%s' dropped successfully", collectionName)

collectionExists = false

}

// Create collection if it doesn't exist

if !collectionExists {

createCollection(ctx, client, collectionName)

} else {

log.Printf("Using existing collection: %s", collectionName)

if collectionExists {

log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName)

err = client.DeleteCollection(ctx, collectionName)

if err != nil {

cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err)

}

log.Printf("Collection '%s' dropped successfully", collectionName)

}

}

}

if *overwrite {

for _, collectionName := range collections {

collectionExists, err := client.CollectionExists(ctx, collectionName)

if err != nil {

cleanupAndExit(client, "Error checking if collection %s exists: %v", collectionName, err)

}

if collectionExists {

log.Printf("Collection %s exists and overwrite flag is set. Dropping collection...", collectionName)

err = client.DeleteCollection(ctx, collectionName)

if err != nil {

cleanupAndExit(client, "Error dropping collection %s: %v", collectionName, err)

}

log.Printf("Collection '%s' dropped successfully", collectionName)

}

}

}

🤖 Prompt for AI Agents

In cmd/import/main.go around lines 127 to 144, the overwrite loop currently uses a bare `return` when CollectionExists returns an error which aborts the loop without consistent cleanup; replace that bare `return` with a call to `cleanupAndExit(client, "Error checking if collection %s exists: %v", collectionName, err)` so the error is handled consistently (matching the other error paths), ensuring proper cleanup and process exit when the existence check fails.

coderabbitai · 2025-10-14T11:44:53Z

cmd/import/main.go

+// ensureCollectionExists checks if a collection exists and creates it lazily if not.
+func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error {
+	// Check if we've already created this collection
+	collectionCreateLock.Lock()
+	if createdCollections[collectionName] {
+		collectionCreateLock.Unlock()
+		return nil
+	}
+	collectionCreateLock.Unlock()
+
+	// Check if collection exists in Qdrant
+	exists, err := client.CollectionExists(ctx, collectionName)
+	if err != nil {
+		return fmt.Errorf("error checking collection existence: %w", err)
+	}
+
+	if exists {
+		// Mark as created to avoid checking again
+		collectionCreateLock.Lock()
+		createdCollections[collectionName] = true
+		collectionCreateLock.Unlock()
+		log.Printf("Collection %s already exists, using it", collectionName)
+		return nil
+	}
+
+	// Collection doesn't exist, create it (thread-safe)
+	collectionCreateLock.Lock()
+	defer collectionCreateLock.Unlock()
+
+	// Double-check pattern: another goroutine might have created it
+	if createdCollections[collectionName] {
+		return nil
+	}
+
+	log.Printf("Creating collection: %s", collectionName)
+	createCollection(ctx, client, collectionName)
+	createdCollections[collectionName] = true
+	log.Printf("Collection %s created and marked as ready", collectionName)
+
+	return nil
+}


⚠️ Potential issue | 🔴 Critical

Critical: createCollection terminates program in concurrent context.

Line 300 calls createCollection which uses cleanupAndExit on errors, terminating the entire program. In this concurrent context with multiple worker goroutines, a collection creation failure will abruptly kill all workers mid-processing, potentially losing in-flight data.

Refactor to return errors gracefully:

Step 1: Modify createCollection to return an error instead of calling cleanupAndExit:

-func createCollection(ctx context.Context, client *qdrant.Client, collectionName string) { +func createCollection(ctx context.Context, client *qdrant.Client, collectionName string) error { log.Printf("Creating language-based collection with named vectors: %s", collectionName) // ... rest of function ... }) if err != nil { - cleanupAndExit(client, "Error creating collection %s: %v", collectionName, err) + return fmt.Errorf("error creating collection %s: %w", collectionName, err) } log.Printf("Collection '%s' with named vectors created successfully", collectionName) + return nil }

Step 2: Update Line 300 to handle the error:

log.Printf("Creating collection: %s", collectionName) - createCollection(ctx, client, collectionName) + if err := createCollection(ctx, client, collectionName); err != nil { + return err + } createdCollections[collectionName] = true

This allows worker goroutines to handle creation failures gracefully and report errors through the errorsChan mechanism already in place (lines 171, 187).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// ensureCollectionExists checks if a collection exists and creates it lazily if not.

func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error {

// Check if we've already created this collection

collectionCreateLock.Lock()

if createdCollections[collectionName] {

collectionCreateLock.Unlock()

return nil

}

collectionCreateLock.Unlock()

// Check if collection exists in Qdrant

exists, err := client.CollectionExists(ctx, collectionName)

if err != nil {

return fmt.Errorf("error checking collection existence: %w", err)

}

if exists {

// Mark as created to avoid checking again

collectionCreateLock.Lock()

createdCollections[collectionName] = true

collectionCreateLock.Unlock()

log.Printf("Collection %s already exists, using it", collectionName)

return nil

}

// Collection doesn't exist, create it (thread-safe)

collectionCreateLock.Lock()

defer collectionCreateLock.Unlock()

// Double-check pattern: another goroutine might have created it

if createdCollections[collectionName] {

return nil

}

log.Printf("Creating collection: %s", collectionName)

createCollection(ctx, client, collectionName)

createdCollections[collectionName] = true

log.Printf("Collection %s created and marked as ready", collectionName)

return nil

}

// ensureCollectionExists checks if a collection exists and creates it lazily if not.

func ensureCollectionExists(ctx context.Context, client *qdrant.Client, collectionName string) error {

// Check if we've already created this collection

collectionCreateLock.Lock()

if createdCollections[collectionName] {

collectionCreateLock.Unlock()

return nil

}

collectionCreateLock.Unlock()

// Check if collection exists in Qdrant

exists, err := client.CollectionExists(ctx, collectionName)

if err != nil {

return fmt.Errorf("error checking collection existence: %w", err)

}

if exists {

// Mark as created to avoid checking again

collectionCreateLock.Lock()

createdCollections[collectionName] = true

collectionCreateLock.Unlock()

log.Printf("Collection %s already exists, using it", collectionName)

return nil

}

// Collection doesn't exist, create it (thread-safe)

collectionCreateLock.Lock()

defer collectionCreateLock.Unlock()

// Double-check pattern: another goroutine might have created it

if createdCollections[collectionName] {

return nil

}

log.Printf("Creating collection: %s", collectionName)

if err := createCollection(ctx, client, collectionName); err != nil {

return err

}

createdCollections[collectionName] = true

log.Printf("Collection %s created and marked as ready", collectionName)

return nil

}

🤖 Prompt for AI Agents

In cmd/import/main.go around lines 265 to 305, the call to createCollection at line ~300 can terminate the whole process via cleanupAndExit which is unsafe in a concurrent worker context; change createCollection to return an error instead of exiting on failure, remove any direct calls to cleanupAndExit from it, and update this function to call createCollection(ctx, client, collectionName) and check the returned error; if non-nil, release locks as appropriate and forward the error to the worker error channel (errorsChan) or return the error up to the caller so the worker goroutine can handle/report it gracefully rather than killing the process.

fix: create collections lazily, remove qdrant config, update docker c…

27ce608

…ompose

matiasdaloia self-assigned this Oct 14, 2025

coderabbitai bot reviewed Oct 14, 2025

View reviewed changes

matiasdaloia merged commit 0c8d1ad into main Oct 14, 2025
3 checks passed

matiasdaloia deleted the fix/mdaloia/import-script branch October 14, 2025 11:47

coderabbitai bot mentioned this pull request Oct 14, 2025

fix: refactor import script, dynamically calculate num of workers, add progress tracking #27

Merged

This was referenced Oct 22, 2025

chore: enhance import script, add qdrant's docker compose to release artifacts #28

Merged

chore: update docker compose file name #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: create collections lazily, remove qdrant config, update docker compose #26

fix: create collections lazily, remove qdrant config, update docker compose #26

Uh oh!

matiasdaloia commented Oct 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 14, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 14, 2025

Uh oh!

coderabbitai bot Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: create collections lazily, remove qdrant config, update docker compose #26

fix: create collections lazily, remove qdrant config, update docker compose #26

Uh oh!

Conversation

matiasdaloia commented Oct 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matiasdaloia commented Oct 14, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 14, 2025 •

edited

Loading