Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More Unicode-savvy wordDelimiters #3077

Open
egmontkob opened this issue Oct 5, 2019 · 2 comments
Open

More Unicode-savvy wordDelimiters #3077

egmontkob opened this issue Oct 5, 2019 · 2 comments
Labels
Area-Settings Issues related to settings and customizability, for console or terminal Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Terminal The new Windows Terminal.
Milestone

Comments

@egmontkob
Copy link

Description of the new feature/enhancement

"wordDelimiter" lists a couple of stop characters, such as ASCII quotation mark ", apostrophe ', hyphen/minus - and such.

However, special Unicode quotation marks like , , apostrophes like dashes , , box drawing characters, non-breaking spaces and so on an so forth remain word characters (selected on a double click) which is most likely not the best behavior, and adding such characters one by one to the set as the user encounters them is cumbersome.

Proposed technical implementation details

I think the default behavior should be based on Unicode character categories. On top of this there could be a way to add/remove certain characters to/from the set as exceptions. (Maybe even a way to add/remove entire character categories at once, although that might be an overkill.)

@egmontkob egmontkob added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Oct 5, 2019
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Oct 5, 2019
@DHowett-MSFT DHowett-MSFT added Area-Settings Issues related to settings and customizability, for console or terminal Product-Terminal The new Windows Terminal. labels Oct 7, 2019
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Oct 7, 2019
@zadjii-msft zadjii-msft added Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. and removed Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. labels Oct 24, 2019
@zadjii-msft zadjii-msft added this to the Terminal v1.0 milestone Oct 24, 2019
@DHowett-MSFT DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Oct 29, 2019
@carlos-zamora carlos-zamora self-assigned this May 12, 2020
@zadjii-msft zadjii-msft modified the milestones: Terminal v2.0, 22H2 Jan 4, 2022
@zadjii-msft
Copy link
Member

Notes from #14374 and #14392:

The variety here is staggering! 😄

Yeah, I was just experimenting with a few terminals I have on my test VM. This is what I established after a lot of double-clicking (might be a couple of errors, but you get the idea).

Xterm		"$'()*;<>[\]^`{|}
Gnome Terminal	!"$'()*:;<>[]^`{|}
Konsole		!"$'()*,;<>[\]^`{|}
Rxvt		&();<>|
Alacritty	"'(),:<>[]`{|}
Kitty		!"$'()*,:;<>[\]^`{|}

They do seem fairly consistent about avoiding the *nix path separator though, so that's perhaps something worth noting. Obviously they're less likely to care about the Windows path separator.

For the record, I couldn't care less.


I'm sorry for the entirely unrelated comment, but I think I just now realized how "crazy" those ASCII word delimiters are in an international setting. For instance this:

ねこはかわいい。

It consists of 3 words and a delimiter (ねこ , は , かわいい , 。) and your browser probably handles this correctly, whereas terminals are perpetually stuck in the "What do you mean there are people who don't speak English?". conhost's whitespace-only splitting doesn't feel any better in that regard. I feel like terminals are in a dire need of some UAX #29, Section 4.


[...] I think I just now realized how "crazy" those ASCII word delimiters are in an international setting. For instance this:

ねこはかわいい。

It consists of 3 words and a delimiter (ねこ , は , かわいい , 。) and your browser probably handles this correctly, whereas terminals are perpetually stuck in the "What do you mean there are people who don't speak English?". conhost's whitespace-only splitting doesn't feel any better in that regard. I feel like terminals are in a dire need of some UAX #29, Section 4.

Proposed technical implementation details (optional)


and your browser probably handles this correctly

For the record, my browser (Firefox) does not.

And while I agree that it's worthwhile considering a more international-friendly approach, we also need to bear in mind that command-line text selection is possibly somewhat different from typical document text selection, so sticking strictly to the UAX#29 spec may not be ideal as a default (assuming we're considering changing the default).

For example, in a terminal, the decision as to whether a punctuation character should be treated as a separator or not is often based on the semantics of that characters in the shell (e.g. slash as a path separator, or colon as a drive letter suffix). I haven't looked at the UAX#29 spec in detail, but I suspect it is unlikely to categorize punctuation symbols in the same way.

And in your example above, if those characters were used in a path, and you were trying to select that path, would you really want the selection to stop at the word boundaries? I don't know. That's really a question for the people that speak the language, and in what situations they're most likely to be double clicking on strings of Japanese characters. But my point is that it's not necessarily obvious that UAX#29 would be best default.

@radekg
Copy link

radekg commented Mar 13, 2023

For the reference:

"wordDelimiters": " ()\"':,;<>~!@#$%^&*|=[]{}?│",

to mimic the setting from iTerm2. It treats /-+\~_. as part of the word allowing a selection of a full path and a word containing dashes, for example a repository name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Settings Issues related to settings and customizability, for console or terminal Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

No branches or pull requests

7 participants