Does guardrails project support chinese language？ #711

lizhe2004 · 2024-04-12T02:37:26Z

lizhe2004
Apr 12, 2024

Does guardrails project support chinese language？

Apr 12, 2024

Short answer

Yes! Guardrails acts between your content and the LLM, so as long as the LLM can understand Chinese you will be good.

...but I thought this was pretty interesting so I also wrote up a few examples! I always use Guardrails with pydantic models so that's what my examples below are all using, but I'm sure it works for everything.

Long answer 1 (Asking questions in Chinese)

In this example below, I have it analyze a short text in Chinese and extract/categorize information, along with validating the responses.

import openai
from pydantic import BaseModel, Field
from guardrails.hub import ValidChoices
from guardrails import Guard

prompt = """
Comment to analyze: ${text}

${gr.comp…

View full answer

jsoma · 2024-04-12T04:44:58Z

jsoma
Apr 12, 2024

Short answer

Yes! Guardrails acts between your content and the LLM, so as long as the LLM can understand Chinese you will be good.

...but I thought this was pretty interesting so I also wrote up a few examples! I always use Guardrails with pydantic models so that's what my examples below are all using, but I'm sure it works for everything.

Long answer 1 (Asking questions in Chinese)

In this example below, I have it analyze a short text in Chinese and extract/categorize information, along with validating the responses.

import openai
from pydantic import BaseModel, Field
from guardrails.hub import ValidChoices
from guardrails import Guard

prompt = """
Comment to analyze: ${text}

${gr.complete_json_suffix_v2}
"""

class Comment(BaseModel):
    food_name: str = Field(description="食品名",)
    food_category: str = Field(description="食品类", validators=[
        ValidChoices(choices=['肉类', '蔬菜', '水果', '面条/谷物'], on_fail='reask'),
    ])
    sentiment: str = Field(description="情绪", validators=[
        ValidChoices(choices=['喜欢', '不喜欢'], on_fail='reask'),
    ])

guard = Guard.from_pydantic(output_class=Comment, prompt=prompt)

result = guard(
    llm_api=openai.chat.completions.create,
    prompt_params={
        'text': '我真的不喜欢吃西兰花'
    },
    num_reasks=3
)

We can see the good result in result.validated_output.

{'food_name': '西兰花', 'food_category': '蔬菜', 'sentiment': '不喜欢'}

How does Guardrails and the LLM know to do this correctly? It's because the guard has a long conversation with GPT explaining what it wants. You can see the conversation if you run guard.history.last.tree.

Logs
└── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
    │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
    │ │                                                                                                         │ │
    │ │ Comment to analyze: 我真的不喜欢吃西兰花                                                                │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
    │ │ it into.                                                                                                │ │
    │ │                                                                                                         │ │
    │ │ <output>                                                                                                │ │
    │ │     <string name="food_name" description="食品名"/>                                                     │ │
    │ │     <string name="food_category" description="食品类" format="guardrails/valid_choices:                 │ │
    │ │ choices=['肉类', '蔬菜', '水果', '面条/谷物']"/>                                                        │ │
    │ │     <string name="sentiment" description="情绪" format="guardrails/valid_choices: choices=['喜欢',      │ │
    │ │ '不喜欢']"/>                                                                                            │ │
    │ │ </output>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
    │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
    │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
    │ │ requests for lists, objects and specific types. Be correct and concise.                                 │ │
    │ │                                                                                                         │ │
    │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior:                          │ │
    │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`                     │ │
    │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO',     │ │
    │ │ etc.]}`                                                                                                 │ │
    │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index"          │ │
    │ │ format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`                        │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭───────────────────────────────────────────── Instructions ──────────────────────────────────────────────╮ │
    │ │ You are a helpful assistant, able to express yourself purely through JSON, strictly and precisely       │ │
    │ │ adhering to the provided XML schemas.                                                                   │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
    │ │ No message history.                                                                                     │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
    │ │ {"food_name":"西兰花","food_category":"蔬菜","sentiment":"不喜欢"}                                      │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
    │ │ {'food_name': '西兰花', 'food_category': '蔬菜', 'sentiment': '不喜欢'}                                 │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Long answer 2 (Chinese, always and only)

But we can also make it more complicated!

Let's say we want to make sure the responses are all in Chinese. There's a validator that says it does that but it doesn't work very well, so I wrote another one we can try out. It uses this library to check the language, and if the library is over 50% certain it's Chinese it says "this looks great."

from typing import Dict
from lingua import Language, LanguageDetectorBuilder
from guardrails.validators import (
    FailResult,
    PassResult,
    register_validator,
    ValidationResult,
    Validator
)

@register_validator(name="is-chinese", data_type="string")
class IsChineseLanguage(Validator):
    def validate(self, value: str, metadata: Dict) -> ValidationResult:
        # Uses https://github.com/pemistahl/lingua-py because it seems to be the best!
        detector = LanguageDetectorBuilder.from_all_languages().with_preloaded_language_models().build()

        # Get the Chinese score
        scores = detector.compute_language_confidence_values(value)
        chinese_score = next(filter(lambda e: e.language == Language.CHINESE, scores))

        # Accept if it's over 50%
        if chinese_score.value > 0.5:
            return PassResult()

        # Could use a fix value that's machine-translated...
        return FailResult(
            error_message=f"Text \"{value}\" should be provided in the Chinese language."
        )

Now I will use (almost) the same model as above, but include our new validator in every part of the response. I will also make the text to analyze be in English: I love to eat buckwheat noodles.

import openai
from pydantic import BaseModel, Field
from guardrails.hub import ValidChoices
from guardrails import Guard

prompt = """
Comment to analyze: ${text}

${gr.complete_json_suffix_v2}
"""

is_chinese = IsChineseLanguage(on_fail='reask')

class Comment(BaseModel):
    food_name: str = Field(description="食品名", validators=[is_chinese])
    food_category: str = Field(description="食品类", validators=[
        ValidChoices(choices=['肉类', '蔬菜', '水果', '面条/谷物'], on_fail='reask'),
        is_chinese
    ])
    sentiment: str = Field(description="情绪", validators=[
        ValidChoices(choices=['喜欢', '不喜欢'], on_fail='reask'),
        is_chinese
    ])

guard = Guard.from_pydantic(output_class=Comment, prompt=prompt)

result = guard(
    llm_api=openai.chat.completions.create,
    prompt_params={
        'text': 'I love to eat buckwheat noodles'
    },
    num_reasks=3
)

If we look at result.validated_output, it gives us the correct answer, in Chinese.

{'food_name': '荞麦面', 'food_category': '面条/谷物', 'sentiment': '喜欢'}

It does this because even though the LLM provided "buckwheat noodles" at first, Guardrails used reask to say "no, please, use Chinese!". Again, we can see the conversation in guard.history.last.tree.

Logs
├── ╭────────────────────────────────────────────────── Step 0 ───────────────────────────────────────────────────╮
│   │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
│   │ │                                                                                                         │ │
│   │ │ Comment to analyze: I love to eat buckwheat noodles                                                     │ │
│   │ │                                                                                                         │ │
│   │ │                                                                                                         │ │
│   │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
│   │ │ it into.                                                                                                │ │
│   │ │                                                                                                         │ │
│   │ │ <output>                                                                                                │ │
│   │ │     <string name="food_name" description="食品名" format="is-chinese"/>                                 │ │
│   │ │     <string name="food_category" description="食品类" format="guardrails/valid_choices:                 │ │
│   │ │ choices=['肉类', '蔬菜', '水果', '面条/谷物']; is-chinese"/>                                            │ │
│   │ │     <string name="sentiment" description="情绪" format="guardrails/valid_choices: choices=['喜欢',      │ │
│   │ │ '不喜欢']; is-chinese"/>                                                                                │ │
│   │ │ </output>                                                                                               │ │
│   │ │                                                                                                         │ │
│   │ │                                                                                                         │ │
│   │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
│   │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
│   │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
│   │ │ requests for lists, objects and specific types. Be correct and concise.                                 │ │
│   │ │                                                                                                         │ │
│   │ │ Here are examples of simple (XML, JSON) pairs that show the expected behavior:                          │ │
│   │ │ - `<string name='foo' format='two-words lower-case' />` => `{'foo': 'example one'}`                     │ │
│   │ │ - `<list name='bar'><string format='upper-case' /></list>` => `{"bar": ['STRING ONE', 'STRING TWO',     │ │
│   │ │ etc.]}`                                                                                                 │ │
│   │ │ - `<object name='baz'><string name="foo" format="capitalize two-words" /><integer name="index"          │ │
│   │ │ format="1-indexed" /></object>` => `{'baz': {'foo': 'Some String', 'index': 1}}`                        │ │
│   │ │                                                                                                         │ │
│   │ │                                                                                                         │ │
│   │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│   │ ╭───────────────────────────────────────────── Instructions ──────────────────────────────────────────────╮ │
│   │ │ You are a helpful assistant, able to express yourself purely through JSON, strictly and precisely       │ │
│   │ │ adhering to the provided XML schemas.                                                                   │ │
│   │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│   │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
│   │ │ No message history.                                                                                     │ │
│   │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│   │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
│   │ │ {"food_name":"buckwheat noodles","food_category":"面条/谷物","sentiment":"喜欢"}                        │ │
│   │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│   │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
│   │ │ {                                                                                                       │ │
│   │ │     'food_name': FieldReAsk(                                                                            │ │
│   │ │         incorrect_value='buckwheat noodles',                                                            │ │
│   │ │         fail_results=[                                                                                  │ │
│   │ │             FailResult(                                                                                 │ │
│   │ │                 outcome='fail',                                                                         │ │
│   │ │                 metadata=None,                                                                          │ │
│   │ │                 error_message='Text "buckwheat noodles" should be provided in the Chinese language.',   │ │
│   │ │                 fix_value=None                                                                          │ │
│   │ │             )                                                                                           │ │
│   │ │         ],                                                                                              │ │
│   │ │         path=['food_name']                                                                              │ │
│   │ │     ),                                                                                                  │ │
│   │ │     'food_category': '面条/谷物',                                                                       │ │
│   │ │     'sentiment': '喜欢'                                                                                 │ │
│   │ │ }                                                                                                       │ │
│   │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│   ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
└── ╭────────────────────────────────────────────────── Step 1 ───────────────────────────────────────────────────╮
    │ ╭──────────────────────────────────────────────── Prompt ─────────────────────────────────────────────────╮ │
    │ │                                                                                                         │ │
    │ │ I was given the following JSON response, which had problems due to incorrect values.                    │ │
    │ │                                                                                                         │ │
    │ │ {                                                                                                       │ │
    │ │   "food_name": {                                                                                        │ │
    │ │     "incorrect_value": "buckwheat noodles",                                                             │ │
    │ │     "error_messages": [                                                                                 │ │
    │ │       "Text \"buckwheat noodles\" should be provided in the Chinese language."                          │ │
    │ │     ]                                                                                                   │ │
    │ │   },                                                                                                    │ │
    │ │   "food_category": "面条/谷物",                                                                         │ │
    │ │   "sentiment": "喜欢"                                                                                   │ │
    │ │ }                                                                                                       │ │
    │ │                                                                                                         │ │
    │ │ Help me correct the incorrect values based on the given error messages.                                 │ │
    │ │                                                                                                         │ │
    │ │ Given below is XML that describes the information to extract from this document and the tags to extract │ │
    │ │ it into.                                                                                                │ │
    │ │                                                                                                         │ │
    │ │ <output>                                                                                                │ │
    │ │     <string name="food_name" description="食品名" format="is-chinese"/>                                 │ │
    │ │     <string name="food_category" description="食品类" format="guardrails/valid_choices:                 │ │
    │ │ choices=['肉类', '蔬菜', '水果', '面条/谷物']; is-chinese"/>                                            │ │
    │ │     <string name="sentiment" description="情绪" format="guardrails/valid_choices: choices=['喜欢',      │ │
    │ │ '不喜欢']; is-chinese"/>                                                                                │ │
    │ │ </output>                                                                                               │ │
    │ │                                                                                                         │ │
    │ │                                                                                                         │ │
    │ │ ONLY return a valid JSON object (no other text is necessary), where the key of the field in JSON is the │ │
    │ │ `name` attribute of the corresponding XML, and the value is of the type specified by the corresponding  │ │
    │ │ XML's tag. The JSON MUST conform to the XML format, including any types and format requests e.g.        │ │
    │ │ requests for lists, objects and specific types. Be correct and concise. If you are unsure anywhere,     │ │
    │ │ enter `null`.                                                                                           │ │
    │ │                                                                                                         │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭───────────────────────────────────────────── Instructions ──────────────────────────────────────────────╮ │
    │ │ You are a helpful assistant, able to express yourself purely through JSON, strictly and precisely       │ │
    │ │ adhering to the provided XML schemas.                                                                   │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Message History ────────────────────────────────────────────╮ │
    │ │ No message history.                                                                                     │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭──────────────────────────────────────────── Raw LLM Output ─────────────────────────────────────────────╮ │
    │ │ {                                                                                                       │ │
    │ │   "food_name": "荞麦面",                                                                                │ │
    │ │   "food_category": "面条/谷物",                                                                         │ │
    │ │   "sentiment": "喜欢"                                                                                   │ │
    │ │ }                                                                                                       │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    │ ╭─────────────────────────────────────────── Validated Output ────────────────────────────────────────────╮ │
    │ │ {'food_name': '荞麦面', 'food_category': '面条/谷物', 'sentiment': '喜欢'}                              │ │
    │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ │
    ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Notice that it only had to ask for help with 荞麦面 - the LLM knew to put food_category and sentiment in Chinese automatically because Guardrails gave it suggested categories.

In conclusion

It really is like magic, but sorry 我不会中文 😅

1 reply

lizhe2004 Apr 15, 2024
Author

I really appreciate your answer. the examples are very helpful. I belive it is a good start for me .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does guardrails project support chinese language？ #711

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Does guardrails project support chinese language？ #711

lizhe2004 Apr 12, 2024

Short answer

Long answer 1 (Asking questions in Chinese)

Replies: 1 comment · 1 reply

jsoma Apr 12, 2024

Short answer

Long answer 1 (Asking questions in Chinese)

Long answer 2 (Chinese, always and only)

In conclusion

lizhe2004 Apr 15, 2024 Author

lizhe2004
Apr 12, 2024

Replies: 1 comment 1 reply

jsoma
Apr 12, 2024

lizhe2004 Apr 15, 2024
Author