-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create config.azure.yaml #1248
Create config.azure.yaml #1248
Conversation
Configuration settings for Azure Embeddings as well as Azure LLM
WalkthroughThis pull request introduces a comprehensive Azure configuration file for the Wren AI service. The configuration establishes a detailed setup for integrating various AI components, including Large Language Models (LLM), embedders, document stores, and pipelines. It specifically defines Azure-specific providers for different service components, setting up parameters for model deployment, API interactions, and AI-driven operational workflows. Changes
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
wren-ai-service/docs/config_examples/config.azure.yaml (2)
49-139
: Consider optimizing pipeline configuration
- Consider extracting common model references to variables to reduce redundancy and make updates easier
- Consider adding error handling configurations for critical pipelines (e.g.,
sql_generation
,sql_correction
)Example optimization:
defaults: &defaults llm: litellm_llm.azure/gpt-4 embedder: azure_openai_embedder.text-embedding-ada-002 document_store: qdrant pipes: - name: sql_generation <<: *defaults engine: wren_ui🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 51-51: wrong indentation: expected 0 but found 2
(indentation)
[error] 112-112: trailing spaces
(trailing-spaces)
1-151
: Fix YAML formatting issuesThe following formatting issues should be addressed:
- Remove trailing spaces (lines 10, 21, 25, 45, 112)
- Fix indentation (lines 23, 51)
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 21-21: trailing spaces
(trailing-spaces)
[warning] 23-23: wrong indentation: expected 0 but found 2
(indentation)
[error] 25-25: trailing spaces
(trailing-spaces)
[error] 45-45: trailing spaces
(trailing-spaces)
[warning] 51-51: wrong indentation: expected 0 but found 2
(indentation)
[error] 112-112: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/docs/config_examples/config.azure.yaml
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
wren-ai-service/docs/config_examples/config.azure.yaml
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 21-21: trailing spaces
(trailing-spaces)
[warning] 23-23: wrong indentation: expected 0 but found 2
(indentation)
[error] 25-25: trailing spaces
(trailing-spaces)
[error] 45-45: trailing spaces
(trailing-spaces)
[warning] 51-51: wrong indentation: expected 0 but found 2
(indentation)
[error] 112-112: trailing spaces
(trailing-spaces)
🔇 Additional comments (7)
wren-ai-service/docs/config_examples/config.azure.yaml (7)
6-6
: Replace placeholder Azure endpoint URLThe configuration contains a placeholder Azure endpoint URL. This needs to be replaced with your actual Azure endpoint before deployment.
9-9
: Consider using a stable API versionThe configuration uses a preview API version (
2024-02-15-preview
). Consider using a stable version for production deployments to avoid potential breaking changes.
15-15
: Verify max_tokens limitThe
max_tokens
is set to 4096, which might exceed GPT-4's context window depending on your deployment. Verify this matches your Azure deployment's limits.
26-28
: Update embedder API configurationSeveral issues need attention:
- Replace the placeholder Azure endpoint URL
- Consider updating to a newer API version (currently using
2023-05-15
while LLM uses2024-02-15-preview
)- Verify that the dimension (1536) matches your deployed model's output dimension
31-33
: Verify wren-ui service endpointThe configuration assumes
wren-ui
service is accessible athttp://wren-ui:3000
. Ensure this matches your deployment environment's service discovery setup (Docker/Kubernetes).
39-39
: Review Qdrant security configurationThe comment suggests not setting QDRANT_API_KEY for Docker deployments. Ensure this aligns with your security requirements and document any security measures in place.
146-150
: Review production settingsSeveral settings need verification for production deployment:
- Verify if cache size (1000) and TTL (3600s) are appropriate for your workload
- Ensure Langfuse host URL is correct and credentials are securely managed
- Consider setting logging level to INFO or WARN instead of DEBUG in production
|
||
embedding_model_dim: 1536 # Must match model dimension from embedder | ||
timeout: 120 | ||
recreate_index: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recreate_index: true
will destroy and recreate the index on startup. This should be false
in production to prevent data loss.
- model: azure/gpt-4 # Your Azure deployment name, put 'azure/' before deployment name | ||
api_base: https://endpoint.openai.azure.com/ #Replace with your custom Azure endpoint | ||
api_key_name: LLM_AZURE_OPENAI_API_KEY | ||
deployment_name: gpt-4 # Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshchan is this required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the sample config for the LLM model right, so yes it is required . let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you show me error messages if deployment name is removed? Since I think it's not required according to litellm docs. Thank you!
api_base: https://endpoint.openai.azure.com/ #Replace with your custom Azure endpoint | ||
api_key_name: LLM_AZURE_OPENAI_API_KEY | ||
deployment_name: gpt-4 # Optional | ||
api_version: "2024-02-15-preview" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove double quotes here
n: 1 | ||
seed: 0 | ||
max_tokens: 4096 | ||
response_format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove response_format here
provider: azure_openai_embedder | ||
models: | ||
- model: text-embedding-ada-002 # Your Azure deployment name | ||
dimension: 1536 # Must match model output check for your model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove dimension here
dimension: 1536 # Must match model output check for your model | ||
|
||
api_base: https://endpoint.openai.azure.com/ # Replace with your custom Azure endpoint | ||
api_version: "2023-05-15" # Your Azure deployment name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could remove double quotes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshchan Thanks for your contributions, could you check the comments I left?
Worked on the comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worked on the Changes sugggested by you
- model: azure/gpt-4 # Your Azure deployment name, put 'azure/' before deployment name | ||
api_base: https://endpoint.openai.azure.com/ #Replace with your custom Azure endpoint | ||
api_key_name: LLM_AZURE_OPENAI_API_KEY | ||
deployment_name: gpt-4 # Optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the sample config for the LLM model right, so yes it is required . let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
wren-ai-service/docs/config_examples/config.azure.yaml (1)
35-43
:⚠️ Potential issueDocument Store Configuration: Critical Setting & Minor Typos
- The setting
recreate_index: true
on line 42 is potentially dangerous in a production environment and mirrors previous concerns regarding unwanted index recreation (see past review comment by coderabbitai[bot]). Consider switching this tofalse
for production deployments.- On line 38, there is a minor typographical issue (“Donot” should be “Do not”).
🧹 Nitpick comments (4)
wren-ai-service/docs/config_examples/config.azure.yaml (4)
1-17
: LLM Configuration ReviewThe LLM section is clear and well-structured. However, YAMLlint detected trailing spaces at lines 10 and 16. Please remove these extraneous spaces to maintain consistent formatting.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 16-16: trailing spaces
(trailing-spaces)
18-28
: Embedder Configuration: Trailing Spaces & IndentationThis section is mostly correct, but note the following:
- Line 20 has extra trailing spaces.
- YAMLlint reports a potential indentation issue at line 22 (wrong indentation: expected 0 but found 2). Please verify that the intended YAML structure complies with your project’s YAML style guidelines.
- Also, remove any trailing spaces (e.g., line 24).
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 20-20: trailing spaces
(trailing-spaces)
[warning] 22-22: wrong indentation: expected 0 but found 2
(indentation)
[error] 24-24: trailing spaces
(trailing-spaces)
44-46
: Pipeline Comments: Trailing SpacesThe instructional comments in this block are useful. However, ensure that any trailing spaces (as noted by YAMLlint on line 44) are removed to avoid formatting issues.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 44-44: trailing spaces
(trailing-spaces)
111-111
: Remove Trailing Spaces in Pipeline Embedder EntryLine 111 has trailing spaces after
azure_openai_embedder.text-embedding-ada-002
. Please remove these extra spaces.🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 111-111: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/docs/config_examples/config.azure.yaml
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
wren-ai-service/docs/config_examples/config.azure.yaml
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 16-16: trailing spaces
(trailing-spaces)
[error] 20-20: trailing spaces
(trailing-spaces)
[warning] 22-22: wrong indentation: expected 0 but found 2
(indentation)
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 44-44: trailing spaces
(trailing-spaces)
[warning] 50-50: wrong indentation: expected 0 but found 2
(indentation)
[error] 111-111: trailing spaces
(trailing-spaces)
🔇 Additional comments (3)
wren-ai-service/docs/config_examples/config.azure.yaml (3)
29-33
: Engine Configuration ApprovedThe engine configuration is straightforward and correctly specified.
48-138
: Pipeline Section: Comprehensive and ConsistentThe pipeline configuration is detailed and the various pipe components are consistently defined. A few points to consider:
- YAMLlint flagged a possible indentation issue on line 50 (expected 0 but found 2). Please verify the indentation for list items meets your project’s style requirements.
- All pipeline entries reference the necessary services (e.g.,
litellm_llm.azure/gpt-4
,azure_openai_embedder.text-embedding-ada-002
,wren_ui
, andqdrant
) consistently. Ensure that these references match your Azure deployment and service endpoints.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 50-50: wrong indentation: expected 0 but found 2
(indentation)
[error] 111-111: trailing spaces
(trailing-spaces)
139-151
: Settings Section ReviewThe settings section is well-defined and appears to cover all required parameters for indexing, retrieval, caching, and logging. No major issues identified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks for your contribution :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
wren-ai-service/docs/config_examples/config.azure.yaml (1)
33-42
:⚠️ Potential issueDocument Store Configuration – Caution on
recreate_index
The document store configuration is well defined. However, note that the settingrecreate_index: true
(line 41) is potentially dangerous in a production environment since it destroys and recreates the index on startup. This has been flagged in previous reviews. Ensure that this is acceptable for your use case (e.g. for sample or non-production configurations) or update accordingly.
🧹 Nitpick comments (7)
wren-ai-service/docs/config_examples/config.azure.yaml (7)
9-9
: Remove Trailing Whitespace in LLM Block
YAMLlint has flagged trailing whitespace on line 9. Please remove any extra spaces from this blank line.-<line 9 with trailing spaces> +<line 9 with no trailing spaces>🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 9-9: trailing spaces
(trailing-spaces)
15-15
: Remove Trailing Whitespace After LLM Block
Line 15 has trailing whitespace that should be removed to adhere to YAML lint standards.-<line 15 with trailing spaces> +<line 15 with no trailing spaces>🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 15-15: trailing spaces
(trailing-spaces)
19-19
: Remove Trailing Whitespace in Embedder Block
Line 19 contains trailing whitespace. Please remove any extra spaces.-provider: azure_openai_embedder␣␣ +provider: azure_openai_embedder🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 19-19: trailing spaces
(trailing-spaces)
23-23
: Remove Trailing Whitespace in Embedder Block
Line 23 has extra trailing spaces; please remove them.-␣␣ +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 23-23: trailing spaces
(trailing-spaces)
43-43
: Remove Trailing Whitespace in Pipeline Comments
Line 43 contains trailing whitespace. Please remove the extra spaces.-# For each pipe line component ␣␣ +# For each pipe line component🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 43-43: trailing spaces
(trailing-spaces)
47-137
: Pipeline Configuration – Comprehensive but Verify Consistency
The pipeline section defines a broad range of operations with clear mappings to the respective components (LLM, embedder, document store, and engine). A couple of points to note:
- Some list items (e.g. on line 49) have indentation that YAMLlint flags as unexpected. Please ensure that the indentation of these list items is consistent with your YAML style guidelines.
- Across the pipeline, the LLM is referenced as
litellm_llm.azure/gpt-4
, while in the LLM configuration the model is defined asazure/gpt-4
. Confirm that this naming (concatenating provider and deployment name) is intentional and consistent.🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 49-49: wrong indentation: expected 0 but found 2
(indentation)
[error] 110-110: trailing spaces
(trailing-spaces)
109-110
: Remove Trailing Whitespace in Pipeline Block
Line 110 contains trailing whitespace. Please remove the extra spaces to comply with YAML formatting standards.- embedder: azure_openai_embedder.text-embedding-ada-002␣ + embedder: azure_openai_embedder.text-embedding-ada-002🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 110-110: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
wren-ai-service/docs/config_examples/config.azure.yaml
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.35.1)
wren-ai-service/docs/config_examples/config.azure.yaml
[error] 9-9: trailing spaces
(trailing-spaces)
[error] 15-15: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
[warning] 21-21: wrong indentation: expected 0 but found 2
(indentation)
[error] 23-23: trailing spaces
(trailing-spaces)
[error] 43-43: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 0 but found 2
(indentation)
[error] 110-110: trailing spaces
(trailing-spaces)
🔇 Additional comments (6)
wren-ai-service/docs/config_examples/config.azure.yaml (6)
1-8
: LLM Configuration Block – Approved and Informative
The LLM section is clear and comprehensive, providing the necessary fields (provider, timeout, models, and kwargs) for Azure LLM integration. Ensure that the placeholders (e.g. API base URL and API key name) are updated to match your deployment configuration.
10-14
: LLMkwargs
Settings – Looks Good
The parameters for temperature, number of responses, seed, and max_tokens are clearly specified. Ensure these values are aligned with your intended LLM behavior.
17-27
: Embedder Configuration – Verify Indentation and Spacing
The embedder section correctly sets the provider toazure_openai_embedder
and specifies the model and other parameters. However, please pay attention to the following:
- Line 19: Trailing whitespace detected.
- Line 21: YAMLlint reports a wrong indentation (expected 0 but found 2). Verify that this list item’s indentation complies with your project’s YAML style guide.
- Line 23: Trailing whitespace detected.
🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 19-19: trailing spaces
(trailing-spaces)
[warning] 21-21: wrong indentation: expected 0 but found 2
(indentation)
[error] 23-23: trailing spaces
(trailing-spaces)
28-32
: Engine Configuration – Simple and Correct
The engine configuration (providerwren_ui
with the specified endpoint) is straightforward and meets expectations.
139-149
: Settings Section – Clean and Configured
The settings section is well organized, providing clear values for batching, caching, logging, and Langfuse integration. No issues were identified here.
21-21
: Check List Item Indentation in Embedder Block
The list item on line 21 is indented by 2 spaces, whereas YAMLlint expected no indentation. Confirm whether this indentation is intentional per your style guide. If not, consider adjusting it as shown below:- - model: text-embedding-ada-002 # Your Azure deployment name + - model: text-embedding-ada-002 # Your Azure deployment name✅ Verification successful
Let's pull the surrounding context from the YAML file to understand the structure around line 21. This will help us determine if the two-space indentation for the list item is intentional.
Embedder Block YAML List Indentation Inconsistency
After reviewing the surrounding context, we found that the embedder block in the configuration file indents its list item by 2 spaces:
models: - model: text-embedding-ada-002 # Your Azure deployment nameThis contrasts with the llm block in the same file where the list item is unindented:
models: - model: azure/gpt-4 # Your Azure deployment name, put 'azure/' before deployment nameSince YAMLlint is flagging the embedder block as non-compliant and there’s no indication that the extra indentation is intentional per our style guide, it is recommended to adjust the embedder block’s list item by removing the extra indentation.
- Location:
wren-ai-service/docs/config_examples/config.azure.yaml
(embedder block, line 21)🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash # Print the first 30 lines of the YAML file to review context and indentation structure sed -n '1,30p' wren-ai-service/docs/config_examples/config.azure.yamlLength of output: 883
🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 21-21: wrong indentation: expected 0 but found 2
(indentation)
--- | ||
type: pipeline | ||
pipes: | ||
- name: db_schema_indexing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Fix List Item Indentation in Pipeline Block
The configuration in the file
wren-ai-service/docs/config_examples/config.azure.yaml
currently shows list items under the pipes:
key indented by 2 spaces. YAMLlint expects these list items to start at column 0 (i.e., with no preceding space), per our project’s style guidelines.
- Current Code:
pipes: - name: db_schema_indexing embedder: azure_openai_embedder.text-embedding-ada-002 document_store: qdrant # Match document_store name llm: litellm_llm.azure/gpt-4
- Expected Change (as per diff):
pipes: - name: db_schema_indexing embedder: azure_openai_embedder.text-embedding-ada-002 document_store: qdrant # Match document_store name llm: litellm_llm.azure/gpt-4
Please adjust the indentation accordingly to meet the project standards.
🔗 Analysis chain
Check List Item Indentation in Pipeline Block
YAMLlint reports that the list item on line 49 is indented by 2 spaces (expected 0). Verify whether the current indentation meets your project standards and adjust if necessary.
- - name: db_schema_indexing
+ - name: db_schema_indexing
🏁 Scripts executed
The following scripts were executed for the analysis:
Error: No shell scripts found
Script:
#!/bin/bash
echo "Displaying lines 45 to 55 of wren-ai-service/docs/config_examples/config.azure.yaml:"
sed -n '45,55p' wren-ai-service/docs/config_examples/config.azure.yaml
Length of output: 634
🧰 Tools
🪛 YAMLlint (1.35.1)
[warning] 49-49: wrong indentation: expected 0 but found 2
(indentation)
Configuration settings for Azure Embeddings as well as Azure LLM
Summary by CodeRabbit