Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RTL languages (such as Arabic / Hebrew / Persian etc.) #86667

Open
tomerm opened this issue Dec 3, 2016 · 26 comments
Open

Support for RTL languages (such as Arabic / Hebrew / Persian etc.) #86667

tomerm opened this issue Dec 3, 2016 · 26 comments
Labels
editor-RTL Editor Right-To-Left or Bi-Di issues feature-request Request for new features or functionality
Milestone

Comments

@tomerm
Copy link

tomerm commented Dec 3, 2016

My name is Tomer Mahlin. I lead a development team in IBM named Bidi Development Lab. We are specializing (for more than 20 years) in development of support for languages with bidirectional scripts (or "bidi lang." for short) .

We recently ran a sniff assessment on Monaco capabilities with respect to bidi lang. display. We believe there are several functional areas which require improvements (please see more details below).
My team can work on necessary modifications and suggest them via separate pull request, assuming community is interested in addressing the requirements detailed below.

Plain text editing

  1. There should be a parameter through which it will be possible to communicated the default text direction for content being authored in specific instance of editor. This is a similar parameter to what is used in CKEditor: contentsLangDirection ( http://docs.ckeditor.com/#!/api/CKEDITOR.config-cfg-contentsLangDirection ).
    Possible values should be:
  • ltr (left-to-right),
  • rtl (right-to-left),
  • contextual (or auto as used in HTML)
  1. In addition to that, there should be explicit way for the end user to interactively change text direction for selected text (or for current paragraph in which cursor is positioned in case current selection is of zero length). This can be achieved via:
  • GUI buttons - similar to all rich text editors (i.e. http://ckeditor.com/addon/bidi)
    AND / OR (in case there is no toolbars for any new buttons)
  • Keyboard shortcuts (i.e. in Notepad it is Ctrl - <Left|Right-Shift>)

Programming lang. editing

  1. As opposed to plain text, programming lang has well defined syntax. Some part of this syntax is visualized via color schema used for coloring different elements (i..e comments vs variables etc.) of the language. It is critical to enforce visual appearance associated with the syntax regardless of language used for different elements (i.e. comments, variables etc.). If this is not done, it becomes virtually impossible to work with the code when bidi text is used. Simple English example:
    a = b + c; // hello world
    If bidi characters are used instead you would expect to see:
    A = B + C; // DLROW OLLEH
    Instead at the moment you see:
    DLROW OLLEH // ;C+B=A
    The more complex example can be, less intuitive the display will become.

  2. Special case is the case of comments or/and constants. Those by all means usually include bidi characters (or at least much more frequently than variables names for example). It is thus preferable to display text in those contexts using natural text direction for bidi languages (which is RTL). We can't store text direction information with text (namely source code file is still a plain text file which can't include any meta information about text such as font size, color, direction etc.). Consequently we should be able to make a smart choice while displaying the text (relying just on the text itself). Most straightforward approach is to enforce auto (aka contextual or first strong) direction of text for each paragraph included in comments.
    For example, currently the display of sample text is as follows:
    res = var1 + var2; // SI EMAN YM tomer !!!
    If we enforce auto text direction on the comment we will see:
    res = var1 + var2; // !!! tomer SI EMAN YM
    Namely text of comment will appear with actual RTL direction which is a natural one for bidi lang.
    Display of text with natural text direction makes it considerably more readable and thus should greatly enhance user experience for bidi users.

Relevant requests
At some point support for bidi lang. was requested in vscode via #4994

@adir01
Copy link

adir01 commented Jan 10, 2017

@MicrosoftSam , can you please review this issue ?

@alexdima alexdima self-assigned this Jan 13, 2017
@alexdima
Copy link
Member

@tomerm PR welcome. If anything special needs to be done, it could be done in viewLineRenderer.ts, where the actual HTML for a line is produced.

Our goal with the monaco editor and vscode is to be a code editor, i.e. the plain text case editing is not a priority.

Regarding programming language editing, have you actually tried these scenarios in the monaco-editor, I am not an expert, but we do support RTL and bi-di to quite some extent:

image

var b = 3;
var c = 2;
var x = "מיותר קודמות צ'ט של, אם לשון העברית שינויים ויש, אם";
var a = b + c; //מיותר קודמות צ'ט של, אם לשון העברית שינויים ויש, אם

@tomerm
Copy link
Author

tomerm commented Jan 13, 2017

@alexandrudima thanks a lot for your quick reply. The examples you provided are rendered properly. However, this is just one use case - you used exclusively Hebrew characters (no numbers, no English text in the middle of Hebrew text etc. ).

In other words I totally agree that Monaco editor provides correct basic reordering (taking text stored in the buffer and transforming it for the sake of presentation on the screen).

If you start using mixed (English + Hebrew) text in comments / constants (not mentioning using Hebrew in variable, class names etc.) the display will become considerably less readable. It will be very similar to display we get in Notepad. However, while Notepad is not aware of any programming language syntax (and thus is not expected to enforce it), Monaco is aware of such syntax and should assure it is preserved on the display layer.

monaco

@alexdima
Copy link
Member

alexdima commented Jan 14, 2017

Thank you for explaining the issue. Indeed, it would be better to interpret syntax characters as Left-To-Right instead of Neutral when rendering source code.

Here is a quick test I did for (notice how github doesn't get this right either):

var ת = "מיותר קודמות צ'ט של, אם לשון העברית שינויים ויש, אם";

Monaco Editor

image

Ace Editor

image

CodeMirror

image

Word

image

Notepad

image

A browser textarea

image

SublimeText (no support at all)

image


And the only one that appears to get it right?

Visual Studio

image

@tomerm
Copy link
Author

tomerm commented Jan 15, 2017

@alexandrudima , several examples of editors in which we do get expected behavior:

Orion
This is a web based editor https://orionhub.org/
orioneditor_stt

Eclipse
This is a desktop editor: https://eclipse.org/ . Mentioning it here since you mentioned SiblimeText which is also desktop based.
eclipseeditor_stt

ACE
Regarding ACE. Currently ACE has much more serious problems with support for RTL languages. My team is contributing the solution. Eventually we will support correct behavior inside ACE. By the way Github is also using ACE (I think).

Non code editors
Notepad, browser textarea, Word - all those editors are not meant to edit code written in some programming language. Thus I don't think we should expect them to enforce any syntax associated with any of programming languages.

@alexdima
Copy link
Member

Ok, here is what it looks like on master:

using JS syntax:

image

using plaintext:

image

@amirbrans
Copy link

@alexandrudima , I have tested this fix using the latest code.
Indeed it is resolved in Chrome and IE browsers. However, it is not for FF.
See the following screenshot.
monaco_ff

@alexdima
Copy link
Member

@amirbrans Thank you for verifying.

I have FF 50.1.0 on Windows 10 and it appears to work there.

Does your test URL contain editor=dev in the query param (I often mix this up myself)?
e.g. file:///C:/Alex/src/monaco-editor/test/index.html?editor=dev#Y___DefaultJS

image

@amirbrans
Copy link

@alexandrudima , actually yes. it does contain the editor=dev parameter.
(BTW, if that was not the case then it wouldn't work on other browsers as well...)

I'm using FF ESR 45.6
Very weird it doesn't work on it.

@amirbrans
Copy link

Following up, document with Bidi issues discovered on Monaco editor:

Monaco Bidi issues.docx

@tomerm
Copy link
Author

tomerm commented Jan 30, 2017

@alexandrudima "STT" mentioned in @amirbrans report above refers to "STructured Text". It is a general term we use to describe a text with internal structure which in general is not preserved by UBA (Unicode Bidi Algorithm) in case it includes bidi text.

Please let us know if you need any help with fixing issues in the report above. I just want to make sure we don't duplicate our efforts if you have plans to address those issues yourself. Many thanks in advance for your attention and help.

@philjoseph
Copy link

We're looking for a great code editor for our content authors (using Jekyll, YAML, html, css, ..) and I am interested in VS Code or may be even just in Monaco. Since the content is in hebrew, arabic and english, this RTL issue is critical for us.
Please keep us posted here about the plan to fix the issues reported in @amirbrans report, I will be glad to test once solved. Thanks!

@alexdima alexdima reopened this Feb 3, 2017
@alexdima
Copy link
Member

alexdima commented Feb 3, 2017

Thank you for the extra testing. The root cause is a bit silly: the same colors are used by the theme for identifiers and syntactical text. If the theme would give even a slight different color, the text would be split into multiple tokens and the "dir" trick might work...

@kookma
Copy link

kookma commented Jul 29, 2017

@alexandrudima
Any update for BIDI support? As you know aside from web development in VSCode , the great extension LaTex Workshop (https://github.com/James-Yu/LaTeX-Workshop) with more than 132000 downloads prove that many people around the globe use VSCode for web and docs development. There are many users with daily involvement with RTL language and needs BIDI support from VSCode.

I highly appreciate if you kindly update us how it is going in VSCode team side.

@kookma
Copy link

kookma commented Jul 29, 2017

@tomerm
Tomer, as you are with great background on BIDI and RTL language implementation is it possible for short time to develop an extension to remedy the issue as while as VSCode development team working to add this feature to VSCode.

@amer1616
Copy link

Unfortunately vs code has no this feature .I hope to see it soon. I found the best editor to support bidirectional editing of scripts written in rtl is Emacs. So the trick for time being, is editing rtl scripts in emacs and then upload it to vs code

@alexdima alexdima transferred this issue from microsoft/monaco-editor Dec 10, 2019
@jb6
Copy link

jb6 commented Dec 10, 2019

My name is Tomer Mahlin. I lead a development team in IBM named Bidi Development Lab. We are specializing (for more than 20 years) in development of support for languages with bidirectional scripts (or "bidi lang." for short) .

We recently ran a sniff assessment on Monaco capabilities with respect to bidi lang. display. We believe there are several functional areas which require improvements (please see more details below).
My team can work on necessary modifications and suggest them via separate pull request, assuming community is interested in addressing the requirements detailed below.

Plain text editing

1. There should be a parameter through which it will be possible to communicated the default text direction for content being authored in specific instance of editor. This is a similar parameter to what is used in CKEditor: contentsLangDirection ( http://docs.ckeditor.com/#!/api/CKEDITOR.config-cfg-contentsLangDirection ).
   Possible values should be:


* ltr (left-to-right),

* rtl (right-to-left),

* contextual (or auto as used in HTML)


1. In addition to that, there should be explicit way for the end user to interactively change text direction for selected text (or for current paragraph in which cursor is positioned in case current selection is of zero length). This can be achieved via:


* GUI buttons - similar to all rich text editors (i.e. http://ckeditor.com/addon/bidi)
  AND / OR (in case there is no toolbars for any new buttons)

* Keyboard shortcuts (i.e. in Notepad it is Ctrl - <Left|Right-Shift>)

Programming lang. editing

1. As opposed to plain text, programming lang has well defined syntax. Some part of this syntax is visualized via color schema used for coloring different elements (i..e comments vs variables etc.) of the language.  It is critical to enforce visual appearance associated with the syntax regardless of language used for different elements (i.e. comments, variables etc.). If this is not done, it becomes virtually impossible to work with the code when bidi text is used.  Simple English example:
   a = b + c; // hello world
   If bidi characters are used instead you would expect to see:
   A = B + C; // DLROW OLLEH
   Instead at the moment you see:
   DLROW OLLEH // ;C+B=A
   The more complex example can be, less intuitive the display will become.

2. Special case is the case of comments or/and constants. Those by all means usually include bidi characters (or at least much more frequently than variables names for example). It is thus preferable to display text in those contexts using natural text direction for bidi languages (which is RTL). We can't store text direction information with text (namely source code file is still a plain text file which can't include any meta information about text such as font size, color, direction etc.). Consequently we should be able to make a smart choice while displaying the text (relying just on the text itself). Most straightforward approach is to enforce auto (aka contextual or first strong) direction of text for each paragraph included in comments.
   For example, currently the display of sample text is as follows:
   res = var1 + var2; // SI EMAN YM tomer !!!
   If we enforce auto text direction on the comment we will see:
   res = var1 + var2; // !!! tomer SI EMAN YM
   Namely text of comment will appear with actual RTL direction which is a natural one for bidi lang.
   Display of text with natural text direction makes it considerably more readable and thus should greatly enhance user experience for bidi users.

Relevant requests
At some point support for bidi lang. was requested in vscode via #4994

@tomerm @alexdima hello, I would be VERY grateful and interested by your replies. I also need to be able to type a line of code which contains Hebrew+English + lots of parenthesizes without having the order of the parenthesizes or letters in Hebrew or the words swapped (like it is the case in most editors).

This is what I want (written with Textmate):
Screen Shot 2019-12-10 at 22 11 53

This is what one gets now on vscode (not correct) (when you copy-past corect line above on vscode --> it ruins the parenthesizes order ):
Screen Shot 2019-12-10 at 22 19 02

This is what is displayed when you copy-paste the same line of code as seen previously on Textmate(the correct one, I want to achive) , here on Github (not correct either) :

  (שלום|היי|בוקר טוב) {(מר|גב) מרקוס}ֿ

At the end of the day, @tomerm could you please please tell me what editor/platform (which enables all what you mentioned in your first post) do you advise me to use (aka works well and is nice/strong as vscode ) ?

תודה רבה / thank you @alexdima

@MSKhodadady
Copy link

Hello guys
I offer support for Unicode RTL character (200F) the editor. It ‏works in some programs like Telegram and GTK applications. It is very good because:

  1. It has not that complexities of smart finding the direction. It may be good but in my opinion it is more confusing.
  2. It will not break the style of plain text of files.

You can also put a button up in tab bar or another places that inserts this character. And I offer to show a mark for the character in text though the user knows if the character exists.

As the other guys mentioned, it is very useful for LaTeX and related tools.

@KL13NT
Copy link

KL13NT commented Aug 20, 2020

Is this issue anywhere near fixed? We're still seeing broken rendering in RTL languages (mine is Arabic).

image

The problem currently lies mainly when text starts with RTL text such as Arabic letters. The image has 5 examples, english and bidiEnglishFirst are rendering correctly, while all others (that start with an RTL letter) require modification to the <span> with class .mtk11 to have either one of the following CSS/HTML modifications:

unicode-bidi: plaintext;
unicode-bidi: bidi-override;
direction: rtl;
<span dir="auto"></span>

I suppose the second modification would require some detection, while the first one potentially breaks other stuff? The third option seems too easy to implement.

Comments also seem to be fixed with the HTML solution and the unicode-bidi: plaintext solution:

image

@Kagetsuki
Copy link

I can confirm the issue is not resolved in master for Hebrew either (@KL13NT); so I suspect there is no progress.

@VehpuS
Copy link

VehpuS commented Jan 17, 2021

Hello guys
I offer support for Unicode RTL character (200F) the editor. It ‏works in some programs like Telegram and GTK applications. It is very good because:

  1. It has not that complexities of smart finding the direction. It may be good but in my opinion it is more confusing.
  2. It will not break the style of plain text of files.

You can also put a button up in tab bar or another places that inserts this character. And I offer to show a mark for the character in text though the user knows if the character exists.

As the other guys mentioned, it is very useful for LaTeX and related tools.

A way I tried to implement this (by adding this style rule in the chrome debugger interface):

// Add per line support for RTL without alignment change

.view-line>span:before {
    content: "\200f";
}

.view-line>span:after {
    content: "\200f";
}

This has the advantage of not affecting the line's content directly (applied as a style) and can be implemented as a per line solution.

@NotWearingPants
Copy link
Contributor

NotWearingPants commented Feb 18, 2021

I believe this is a duplicate of #83365, although this has more info

@Goozoon
Copy link

Goozoon commented Apr 14, 2022

Indeed, text lines with combines Arabic and English text - huge problem for VS Code. Even copy-paste isn't working as would expect.

@AvtechScientific
Copy link

@alexdima , @tomerm ,

any plans to implement this in the near future?

@Jay-o-Way
Copy link

This is a must for any language/user that can work with RTL (or bi-di)

@movahhedi
Copy link

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editor-RTL Editor Right-To-Left or Bi-Di issues feature-request Request for new features or functionality
Projects
None yet
Development

No branches or pull requests