Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set ensure_ascii=False in JSON dump within apply_chat_template #31079

Conversation

junrae6454
Copy link
Contributor

What does this PR do?

  • This PR modifies the apply_chat_template function to improve handling of non-ASCII characters in JSON output by setting ensure_ascii to False. This change ensures that characters such as "안녕?" and emojis are correctly rendered in their original form rather than as escaped sequences.
  • Modified the ImmutableSandboxedEnvironment in apply_chat_template to set ensure_ascii=False in jinja_env.policies['json.dumps_kwargs'].

Before

try:
    import jinja2
    from jinja2.exceptions import TemplateError
    from jinja2.sandbox import ImmutableSandboxedEnvironment
except ImportError:
    raise ImportError("apply_chat_template requires jinja2 to be installed.")

def raise_exception(message):
    raise TemplateError(message)

jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
jinja_env.globals["raise_exception"] = raise_exception
template = jinja_env.from_string("{{tools|tojson}}")
template.render(tools={"안녕?":"🤗"})
>>> '{"\\uc548\\ub155?": "\\ud83e\\udd17"}'

After

try:
    import jinja2
    from jinja2.exceptions import TemplateError
    from jinja2.sandbox import ImmutableSandboxedEnvironment
except ImportError:
    raise ImportError("apply_chat_template requires jinja2 to be installed.")

def raise_exception(message):
    raise TemplateError(message)

jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
jinja_env.globals["raise_exception"] = raise_exception
jinja_env.policies['json.dumps_kwargs']['ensure_ascii'] = False
template = jinja_env.from_string("{{tools|tojson}}")
template.render(tools={"안녕?":"🤗"})
>>> '{"안녕?": "🤗"}'

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

- Modified JSON dump function to set ensure_ascii to False, improving handling of non-ASCII characters.
@CISC
Copy link
Contributor

CISC commented May 28, 2024

Hey, very cool that there are more than just me using tojson, but it's a duplicate of #31041 :)

@Rocketknight1
Copy link
Member

Hi @junrae6454, thank you for this PR! As mentioned, this is a duplicate of the earlier PR #31041, so I'm going to close it, but we appreciate the contribution, and feel free to add any comments or suggestions on that PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants