Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add common cjk encoding (gb18030 for simple Chinese, big5 for traditional Chinese, euc-kr for Korean, euc-jp for Japanese) datatype support #465

Merged
merged 4 commits into from
Feb 2, 2024

Conversation

liudonghua123
Copy link
Contributor

@liudonghua123 liudonghua123 commented Oct 30, 2023

Closes #464.

@liudonghua123 liudonghua123 changed the title add gb2312/gbk/gb18030 datatype support add common cjk encoding (gb18030 for simple Chinese, big5 for traditional Chinese, euc-kr for Korean, euc-jp for Japanese) datatype support Oct 30, 2023
@lramos15
Copy link
Member

I do worry that this will make the data inspector fairly long. I wonder if there's a way we can support all the formats that the TextDecoder supports without adding a bunch of these

@lramos15 lramos15 self-assigned this Oct 30, 2023
@liudonghua123
Copy link
Contributor Author

I do worry that this will make the data inspector fairly long. I wonder if there's a way we can support all the formats that the TextDecoder supports without adding a bunch of these

Yeah, an interactive select element which includes all supported encoding of TextDecoder as options maybe an elegant way.

However, the datatype decode is quick and low resource consumption. And there are a lot remaining free spaces there, so maybe not a problem right now.

@HaxtonFale
Copy link

Could you please also add Shift-JIS? I believe it's pretty common in a lot of JP executables.

@liudonghua123
Copy link
Contributor Author

Could you please also add Shift-JIS? I believe it's pretty common in a lot of JP executables.

@HaxtonFale Hi, do you means shift-jis is more popular and widely used then euc-jp, if only one encoding for Japanese is choosed here, do you prefer to use shift-jis in your opinion?

@liudonghua123
Copy link
Contributor Author

liudonghua123 commented Nov 13, 2023

From the docs on Encodings of Japanese. It seems shift-jis is more suitable then euc-jp. I will update the code for pr.

And for Korean, iso-2022-kr seems more suitable then euc-kr, from https://www.rfc-editor.org/rfc/rfc1557.html.

Copy link
Member

@connor4312 connor4312 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add in some filtering logic after we merge this

@connor4312 connor4312 enabled auto-merge (squash) February 1, 2024 22:59
@vscodenpa vscodenpa added this to the February 2024 milestone Feb 1, 2024
@Young-Lord
Copy link

IMHO, maybe it would be better to allow users to customize encoding(s) to use in configuration file?
或者说,从配置文件读取需要的编码是否会更好?

@connor4312 connor4312 merged commit 4ce76cb into microsoft:main Feb 2, 2024
2 checks passed
@liudonghua123 liudonghua123 deleted the gbk-support branch February 2, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add some cjk encoding support for Data Inspector.
6 participants