More Unicode-savvy wordDelimiters #3077

egmontkob · 2019-10-05T13:20:49Z

Description of the new feature/enhancement

"wordDelimiter" lists a couple of stop characters, such as ASCII quotation mark ", apostrophe ', hyphen/minus - and such.

However, special Unicode quotation marks like “, ”, apostrophes like ’ dashes –, —, box drawing characters, non-breaking spaces and so on an so forth remain word characters (selected on a double click) which is most likely not the best behavior, and adding such characters one by one to the set as the user encounters them is cumbersome.

Proposed technical implementation details

I think the default behavior should be based on Unicode character categories. On top of this there could be a way to add/remove certain characters to/from the set as exceptions. (Maybe even a way to add/remove entire character categories at once, although that might be an overkill.)

The text was updated successfully, but these errors were encountered:

zadjii-msft · 2022-11-15T21:11:59Z

Notes from #14374 and #14392:

The variety here is staggering! 😄

Yeah, I was just experimenting with a few terminals I have on my test VM. This is what I established after a lot of double-clicking (might be a couple of errors, but you get the idea).
Xterm		"$'()*;<>[\]^`{|}
Gnome Terminal	!"$'()*:;<>[]^`{|}
Konsole		!"$'()*,;<>[\]^`{|}
Rxvt		&();<>|
Alacritty	"'(),:<>[]`{|}
Kitty		!"$'()*,:;<>[\]^`{|}
They do seem fairly consistent about avoiding the *nix path separator though, so that's perhaps something worth noting. Obviously they're less likely to care about the Windows path separator.

For the record, I couldn't care less.

I'm sorry for the entirely unrelated comment, but I think I just now realized how "crazy" those ASCII word delimiters are in an international setting. For instance this:

ねこはかわいい。

It consists of 3 words and a delimiter (ねこ , は , かわいい , 。) and your browser probably handles this correctly, whereas terminals are perpetually stuck in the "What do you mean there are people who don't speak English?". conhost's whitespace-only splitting doesn't feel any better in that regard. I feel like terminals are in a dire need of some UAX #29, Section 4.

[...] I think I just now realized how "crazy" those ASCII word delimiters are in an international setting. For instance this:

ねこはかわいい。

It consists of 3 words and a delimiter (ねこ , は , かわいい , 。) and your browser probably handles this correctly, whereas terminals are perpetually stuck in the "What do you mean there are people who don't speak English?". conhost's whitespace-only splitting doesn't feel any better in that regard. I feel like terminals are in a dire need of some UAX #29, Section 4.

Proposed technical implementation details (optional)

Use this by default: unicode.org/reports/tr29/#Word_Boundaries

Use character-set splitting, if the user has configured wordDelimiters

and your browser probably handles this correctly

For the record, my browser (Firefox) does not.

And while I agree that it's worthwhile considering a more international-friendly approach, we also need to bear in mind that command-line text selection is possibly somewhat different from typical document text selection, so sticking strictly to the UAX#29 spec may not be ideal as a default (assuming we're considering changing the default).

For example, in a terminal, the decision as to whether a punctuation character should be treated as a separator or not is often based on the semantics of that characters in the shell (e.g. slash as a path separator, or colon as a drive letter suffix). I haven't looked at the UAX#29 spec in detail, but I suspect it is unlikely to categorize punctuation symbols in the same way.

And in your example above, if those characters were used in a path, and you were trying to select that path, would you really want the selection to stop at the word boundaries? I don't know. That's really a question for the people that speak the language, and in what situations they're most likely to be double clicking on strings of Japanese characters. But my point is that it's not necessarily obvious that UAX#29 would be best default.

radekg · 2023-03-13T11:19:35Z

For the reference:

"wordDelimiters": " ()\"':,;<>~!@#$%^&*|=[]{}?│",

to mimic the setting from iTerm2. It treats /-+\~_. as part of the word allowing a selection of a full path and a word containing dashes, for example a repository name.

egmontkob added the Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. label Oct 5, 2019

ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Oct 5, 2019

DHowett-MSFT added Area-Settings Issues related to settings and customizability, for console or terminal Product-Terminal The new Windows Terminal. labels Oct 7, 2019

ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Oct 7, 2019

zadjii-msft added Help Wanted We encourage anyone to jump in on these. Issue-Task It's a feature request, but it doesn't really need a major design. and removed Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. labels Oct 24, 2019

zadjii-msft added this to the Terminal v1.0 milestone Oct 24, 2019

DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Oct 29, 2019

cinnamon-msft modified the milestones: Terminal v1.0, Terminal v1.x Jan 23, 2020

cinnamon-msft added Priority-3 A description (P3) v1-Scrubbed labels Jan 23, 2020

carlos-zamora self-assigned this May 12, 2020

carlos-zamora mentioned this issue May 21, 2020

Scenario: TerminalControl Interactivity Improvements #6106

Open

25 tasks

cinnamon-msft modified the milestones: Terminal v1.x, Terminal v2.0 Sep 29, 2020

zadjii-msft modified the milestones: Terminal v2.0, 22H2 Jan 4, 2022

DHowett removed the v1-Scrubbed label Feb 7, 2022

zadjii-msft mentioned this issue Nov 15, 2022

Use UAX#29 word boundary rules for selections by default #14392

Closed

zadjii-msft unassigned carlos-zamora Nov 15, 2022

zadjii-msft modified the milestones: 22H2, Backlog Nov 15, 2022

zadjii-msft mentioned this issue Nov 15, 2022

Can't click select URL #14374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Unicode-savvy wordDelimiters #3077

More Unicode-savvy wordDelimiters #3077

egmontkob commented Oct 5, 2019

zadjii-msft commented Nov 15, 2022

Proposed technical implementation details (optional)

radekg commented Mar 13, 2023

More Unicode-savvy wordDelimiters #3077

More Unicode-savvy wordDelimiters #3077

Comments

egmontkob commented Oct 5, 2019

Description of the new feature/enhancement

Proposed technical implementation details

zadjii-msft commented Nov 15, 2022

Proposed technical implementation details (optional)

radekg commented Mar 13, 2023