Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log a warning if the current locale does not support all Unicode characters #186

Open
alexjpwalker opened this issue Jul 5, 2022 · 0 comments

Comments

@alexjpwalker
Copy link
Member

Problem to Solve

In typedb/typedb#6603, we ensure that our TypeDB Docker image is run in the C.UTF-8 locale, in order to guarantee that it supports all Unicode characters (e.g. Chinese and Arabic fonts).

However, we cannot make that guarantee for a user-defined execution environment, such as their local machine.

Proposed Solution

In this case, the best we can do is log a warning if the current locale does not support all Unicode characters.

@alexjpwalker alexjpwalker transferred this issue from typedb/typedb-driver Jul 5, 2022
@alexjpwalker alexjpwalker added this to the Technical Debt milestone Jul 5, 2022
flyingsilverfin pushed a commit to typedb/typedb that referenced this issue Jul 7, 2022
## What is the goal of this PR?

Previously, when running TypeDB Console from within the TypeDB Docker image, it would be impossible to insert non-ASCII characters (e.g. Chinese characters) as the values of string attributes, because the image's system locale was POSIX, which is unable to represent these characters.

We've updated the system locale of our Docker image to C.UTF-8, which can represent all Unicode characters.

## What are the changes implemented in this PR?

This PR closes #6496, an issue reported by a user using Console in their Docker container, who was unable to insert Chinese text into attribute values. 

On further investigation, we discovered that even while the Docker container's locale was POSIX, we were able to correctly interface with the DB by connecting to it from a TypeDB Console run on our local Mac machine (which supported UTF8). We were able to insert Chinese text and retrieve it correctly. This implied that the bug was not in TypeDB Server.

We then attempted to copy Chinese text from our Mac to a TypeDB Console process run inside Docker. This resulted in a number of "????????" symbols appearing in the terminal. Nonetheless, we committed the data.

When we then queried that data, either via the Docker-hosted Console, or Studio on our Mac, we found that the data had become corrupted - it was just printing question marks. This indicates that the data became corrupted the moment we pasted it from our local machine into our Docker container's Console process host, and we understand this is because the POSIX locale is unable to represent all Unicode characters - only those defined in ASCII (approximately).

There are therefore two fixes we would like to make:

1. Set the system locale of our Docker image to C.UTF-8. This locale comes pre-installed with the image so it does not require any additional download, and it supports all Unicode characters.
2. When the user runs TypeDB Console, verify that the locale of the execution environment (typically the system's default locale) is one that supports all Unicode characters (such as UTF8) and not a restricted one such as ASCII. If it doesn't support all Unicode characters, print a warning saying that the current locale ({locale_name}) is not compatible with all Unicode characters.

In this PR, we implement (1). (2) is more involved. It may also be unnecessary to actually implement it, as if the user is using Console and seeing question marks when they paste in Chinese text, it is inferrable what the problem is. Nonetheless, I've raised:
- typedb/typedb-console#186
@flyingsilverfin flyingsilverfin removed this from the Technical Debt milestone Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants