Skip to content

Conversation

@icecrasher321
Copy link
Collaborator

@icecrasher321 icecrasher321 commented Jul 27, 2025

Description

Can set natural language tag names for knowledge base records to pre-filter before vector search.

System design:

  • Natural language tags latch onto an actual column "tag slot" in the docs / embeddings table.
  • Caps at 7 per KB for now but can add more columns
  • Works through KB Block + Doc page
  • Can only search 1 KB at once as a result. For multi KB search use -- multiple KB blocks with search tool

Type of Change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Test adding, editing tags in KB Block [Create Document Tool]. Search using filters in search tool.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • All tests pass locally and in CI (bun run test)
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • I have updated version numbers as needed (if needed)
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Security Considerations:

  • My changes do not introduce any new security vulnerabilities
  • I have considered the security implications of my changes

@vercel
Copy link

vercel bot commented Jul 27, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sim ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 29, 2025 1:51am
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
docs ⬜️ Skipped (Inspect) Jul 29, 2025 1:51am

@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 00:56 Inactive
@icecrasher321 icecrasher321 marked this pull request as ready for review July 29, 2025 01:05
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive natural language tagging system for knowledge base documents that enables pre-filtering before vector search operations. The system allows users to create custom, human-readable tag names (like 'Department' or 'Priority') instead of generic numbered labels, while maintaining efficient search performance through actual database columns.

The core architecture maps natural language tag names to fixed database columns (tag1-tag7) through a new knowledge_base_tag_definitions table. This design choice provides good performance for filtering operations while capping the system at 7 tags per knowledge base. The implementation includes:

  • Database Layer: New migration adds knowledge_base_tag_definitions table and indexes tag columns on both document and embedding tables for efficient filtering
  • API Layer: New endpoints for managing tag definitions at the knowledge base level, with proper CRUD operations and cleanup utilities
  • Frontend Components: New React components (KnowledgeTagFilters, DocumentTagEntry, KnowledgeTagFilter) that integrate with the existing sub-block workflow system
  • Block Integration: Updated Knowledge block to use the new tag system, replacing individual tag1-tag7 inputs with dynamic tag management
  • Search Enhancement: Modified search functionality to support OR logic within tag groups and restrict to single knowledge base searches

The system maintains backward compatibility with existing tag workflows while providing a more intuitive user experience. Users can now define meaningful tag categories and filter knowledge base content using natural language terms before expensive vector similarity searches.

Confidence score: 4/5

  • This PR introduces significant new functionality with proper architectural design and maintains backward compatibility
  • The implementation follows established patterns and includes comprehensive error handling and validation
  • Potential concerns include some type assertions, magic number usage for ID generation, and the restriction to single knowledge base searches which may impact existing workflows

27 files reviewed, 15 comments

Edit Code Review Bot Settings | Greptile

@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 01:08 Inactive
@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 01:09 Inactive
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 01:10 Inactive
@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 01:11 Inactive
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@vercel vercel bot temporarily deployed to Preview – docs July 29, 2025 01:42 Inactive
@icecrasher321 icecrasher321 merged commit 5b1f948 into staging Jul 29, 2025
3 of 4 checks passed
@waleedlatif1 waleedlatif1 deleted the feat/kb-tags-natural-desc branch July 30, 2025 00:44
arenadeveloper02 pushed a commit to arenadeveloper02/p2-sim that referenced this pull request Sep 19, 2025
…ase searches (simstudioai#800)

* fix lint

* checkpoint

* works

* simplify

* checkpoint

* works

* fix lint

* checkpoint - create doc ui

* working block

* fix import conflicts

* fix tests

* add blockers to going past max tag slots

* remove console logs

* forgot a few

* Update apps/sim/tools/knowledge/search.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove console.warn

* Update apps/sim/hooks/use-tag-definitions.ts

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* use tag slots consts in more places

* remove duplicate title

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants