-
Notifications
You must be signed in to change notification settings - Fork 9
LlamaIndex: Add example using MCP #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
## Walkthrough
The changes add two new demonstration scripts for querying CrateDB using LlamaIndex with natural language SQL and MCP interfaces, update environment files to export variables, introduce new dependencies, enhance the README with setup and usage details, revise tests to run the new demos, and extend the CI workflow to include MCP service testing.
## Changes
| File(s) | Change Summary |
|------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| README.md | Updated title and intro; added MCP usage instructions; changed example run command from `python main.py` to `python demo_nlsql.py`; improved formatting and added external links. |
| boot.py | Added new `configure_llm()` function to initialize LLM and embedding models based on environment variables for OpenAI/Azure. |
| demo_nlsql.py, demo_mcp.py | Added new demo scripts illustrating querying CrateDB with LlamaIndex via natural language SQL and MCP protocol respectively. |
| env.azure, env.standalone | Added `export` keyword to all environment variable declarations for proper shell export. |
| requirements.txt | Added `cratedb-about` (version 0.0.6) and `llama-index-tools-mcp` (<0.3) dependencies. |
| test.py | Renamed `test_main` to `test_nlsql` to test `demo_nlsql.py`; added `test_mcp` to test `demo_mcp.py`, both asserting expected output. |
| .github/workflows/ml-llamaindex.yml | Updated Python version matrix; added `cratedb-mcp` service and matrix dimension for MCP version in CI workflow. |
| main.py | Deleted legacy script replaced by new demos. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant DemoScript (demo_nlsql.py/demo_mcp.py)
participant LLM (LlamaIndex)
participant CrateDB
User->>DemoScript: Run main()
DemoScript->>LLM: Send natural language query
LLM->>CrateDB: Generate and execute SQL query
CrateDB-->>LLM: Return query result
LLM-->>DemoScript: Return answer
DemoScript-->>User: Print answer and metadataEstimated code review effort3 (~45 minutes) Suggested reviewers
Poem
Learnt from: amotl Learnt from: amotl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (1)
topic/machine-learning/llama-index/test.py (1)
43-58: Add connection error handling for MCP test.The test fails due to MCP server connection issues as indicated in the pipeline failures. The test should handle connection failures gracefully or skip when the MCP server is unavailable.
def test_mcp(cratedb, capsys): """ Execute `demo_mcp.py` and verify outcome. """ # Load the standalone configuration also for software testing. # On CI, `OPENAI_API_KEY` will need to be supplied externally. load_dotenv("env.standalone") - # Invoke the workload, in-process. - from demo_mcp import main - main() + # Invoke the workload, in-process. + try: + from demo_mcp import main + main() + except Exception as e: + # Skip test if MCP server is not available + if "connection" in str(e).lower() or "connect" in str(e).lower(): + pytest.skip(f"MCP server not available: {e}") + raise # Verify the outcome. out = capsys.readouterr().out assert "Answer was: The average value for sensor 1 is approximately 17.03." in out
🧹 Nitpick comments (3)
topic/machine-learning/llama-index/README.md (1)
95-95: Update reflects new demo structure correctly.The command update from
python main.pytopython demo_nlsql.pycorrectly reflects the new demo structure.Consider documenting both available demos since the PR introduces multiple examples:
python demo_nlsql.py + +# Alternative: Run the MCP server demo +python demo_mcp.pytopic/machine-learning/llama-index/demo_nlsql.py (1)
41-41: Consider making the query configurable.The query string is hardcoded, which limits the demo's flexibility for different use cases.
-QUERY_STR = "What is the average value for sensor 1?" +QUERY_STR = os.getenv("DEMO_QUERY", "What is the average value for sensor 1?")topic/machine-learning/llama-index/demo_mcp.py (1)
49-49: Consider making the LLM model configurable.The GPT-4o model is hardcoded, which may not be suitable for all environments or cost requirements.
+import os + async def get_agent(self): return FunctionAgent( name="Agent", description="CrateDB text-to-SQL agent", - llm=OpenAI(model="gpt-4o"), + llm=OpenAI(model=os.getenv("OPENAI_MODEL", "gpt-4o")), tools=await self.get_tools(), system_prompt=Instructions.full(), )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
topic/machine-learning/llama-index/README.md(1 hunks)topic/machine-learning/llama-index/boot.py(1 hunks)topic/machine-learning/llama-index/demo_mcp.py(1 hunks)topic/machine-learning/llama-index/demo_nlsql.py(1 hunks)topic/machine-learning/llama-index/env.azure(1 hunks)topic/machine-learning/llama-index/env.standalone(1 hunks)topic/machine-learning/llama-index/requirements.txt(1 hunks)topic/machine-learning/llama-index/test.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
topic/machine-learning/llama-index/requirements.txt (1)
Learnt from: amotl
PR: crate/cratedb-examples#937
File: topic/machine-learning/llm-langchain/requirements-dev.txt:2-2
Timestamp: 2025-05-12T20:10:38.614Z
Learning: The cratedb-toolkit package supports various extras including "io", "datasets", "influxdb", "mongodb", "testing", and many others.
🧬 Code Graph Analysis (2)
topic/machine-learning/llama-index/test.py (2)
topic/machine-learning/llama-index/demo_nlsql.py (1)
main(17-46)topic/machine-learning/llama-index/demo_mcp.py (1)
main(61-78)
topic/machine-learning/llama-index/demo_nlsql.py (1)
topic/machine-learning/llama-index/boot.py (1)
configure_llm(12-47)
🪛 GitHub Actions: LlamaIndex
topic/machine-learning/llama-index/requirements.txt
[error] 1-1: Failed to get requirements to build wheel for cratedb-about package due to invalid pyproject.toml license configuration.
topic/machine-learning/llama-index/demo_mcp.py
[error] 43-76: Runtime error in async tool fetching and query execution: HTTP connection attempts failed, resulting in unhandled exception group and test failure.
topic/machine-learning/llama-index/test.py
[error] 54-54: Test failure in test_mcp: ConnectError 'All connection attempts failed' during async HTTP request in mcp client, causing test to fail.
🪛 Gitleaks (8.27.2)
topic/machine-learning/llama-index/demo_mcp.py
16-16: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (5)
topic/machine-learning/llama-index/requirements.txt (1)
7-7: LGTM: Appropriate version constraint for MCP tools.The version constraint
<0.3forllama-index-tools-mcpis appropriate for ensuring compatibility with the demo while allowing patch updates.topic/machine-learning/llama-index/boot.py (1)
1-48: LGTM: Clean refactoring preserves essential LLM configuration.The refactoring successfully extracts the main application logic while preserving the essential
configure_llmfunction and its required imports. This modular approach allows the new demo scripts (demo_nlsql.pyanddemo_mcp.py) to reuse the LLM configuration logic.The remaining code is well-structured and maintains support for both OpenAI and Azure OpenAI configurations.
topic/machine-learning/llama-index/env.azure (1)
1-8: Excellent improvement: Export keywords ensure proper environment variable availability.Adding
exportkeywords to all environment variable declarations ensures they are properly available to subprocesses and the Python demo scripts. This is a best practice for shell environment configuration files.topic/machine-learning/llama-index/env.standalone (1)
2-4: Consistent improvement: Export keywords properly applied.The addition of
exportkeywords is consistent with theenv.azurefile changes and correctly ensures environment variables are available to the demo scripts. The commentedOPENAI_API_KEYline appropriately remains unchanged.topic/machine-learning/llama-index/test.py (1)
25-40: LGTM! Test structure follows good practices.The test properly loads configuration, executes the demo in-process, and verifies expected output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.github/workflows/ml-llamaindex.yml (1)
49-51: Hard-coding a PR tag (pr-50) makes the workflow fragileBuilding against a transient image tag tied to another PR means nightly and manual runs will suddenly break once that PR is rebased or the tag is removed. Consider defaulting to a released version (e.g.
mainor a semver tag) and overriding it only in the rare jobs that explicitly need the PR build.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/ml-llamaindex.yml(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
.github/workflows/ml-llamaindex.yml (1)
Learnt from: amotl
PR: crate/cratedb-examples#937
File: topic/machine-learning/llm-langchain/requirements-dev.txt:2-2
Timestamp: 2025-05-12T20:10:38.614Z
Learning: The cratedb-toolkit package supports various extras including "io", "datasets", "influxdb", "mongodb", "testing", and many others.
b51777e to
26648cb
Compare
| cratedb-mcp-version: [ | ||
| 'pr-50', | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be adjusted after the next release of cratedb-mcp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved with a43ee4e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
topic/machine-learning/llama-index/README.md (5)
63-66: Add language identifier to the venv code blockMarkdown-lint flags this fence (MD040).
Adding the language (shell) improves syntax highlighting and keeps the doc lint-clean.-``` +```shell python3 -m venv .venv source .venv/bin/activate--- `105-105`: **Specify a language for the “expected output” block** Same MD040 issue here. Use `text` (or `console`) to satisfy the linter and give readers proper formatting. ```diff -``` +```text
149-149: Missing language identifier in second output blockRepeat of the previous comment for the MCP expected output section.
-``` +```text
81-89: Remove superfluous quotes around the connection string
.envparsers typically treat surrounding quotes as literal characters, breaking the URL.
Safer to drop them so copy-pasting “just works”.-CRATEDB_SQLALCHEMY_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true" +CRATEDB_SQLALCHEMY_URL=crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true
95-97: Minor grammar / consistency tweak for the link label“Apostrophe-s” reads better and matches earlier “NL2SQL” terminology.
-### NLSQL -[LlamaIndex' NLSQLTableQueryEngine] is a natural language SQL table query engine. +### NL2SQL +[LlamaIndex's NLSQLTableQueryEngine] is a natural-language SQL table query engine.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/ml-llamaindex.yml(2 hunks)topic/machine-learning/llama-index/README.md(3 hunks)topic/machine-learning/llama-index/demo_mcp.py(1 hunks)topic/machine-learning/llama-index/demo_nlsql.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- topic/machine-learning/llama-index/demo_nlsql.py
- .github/workflows/ml-llamaindex.yml
- topic/machine-learning/llama-index/demo_mcp.py
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: amotl
PR: crate/cratedb-examples#1032
File: topic/machine-learning/llama-index/demo_nlsql.py:28-29
Timestamp: 2025-07-20T00:14:38.691Z
Learning: In demonstration and example code within the cratedb-examples repository, prefer simpler code without extensive error handling to maintain clarity and readability of the examples.
🪛 markdownlint-cli2 (0.17.2)
topic/machine-learning/llama-index/README.md
149-149: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (1)
topic/machine-learning/llama-index/README.md (1)
136-142: Clarifyuvxprerequisite
uvxisn’t a standard tool; newcomers may hit “command not found”.
Please mention how to install it (e.g.pipx install uvxorpip install cratedb-mcp[cli]) or offer an alternative launcher.
| async def get_agent(self): | ||
| return FunctionAgent( | ||
| name="Agent", | ||
| description="CrateDB text-to-SQL agent", | ||
| llm=OpenAI(model="gpt-4o"), | ||
| tools=await self.get_tools(), | ||
| system_prompt=Instructions.full(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use the full instructions (general+mcp) including how to use the provided tools after this patch has been merged.
kneth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
About
For Text-to-SQL purposes, exercise the llama-index-tools-mcp package together with the CrateDB MCP Server, specifically to validate its OCI standard image published to GHCR.
References
cratedb-mcpandcratedb-mcpocratedb-mcp#50Backlog