Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add unicode superscripts and subscripts to latex substitutions #6927

Merged
merged 5 commits into from
May 28, 2014
Merged

add unicode superscripts and subscripts to latex substitutions #6927

merged 5 commits into from
May 28, 2014

Conversation

lstagner
Copy link
Contributor

Added Unicode superscripts and subscripts to latex substitutions.

@lstagner
Copy link
Contributor Author

I noticed this caused the following weird error

julia> e¹=2
2

julia> e¹
ERROR: syntax: invalid character "�"

julia> e¹
2

julia> e¹
2

julia> e¹
ERROR: syntax: invalid character "�"

@JeffBezanson
Copy link
Member

Looks like mystery issue #5712

@stevengj
Copy link
Member

I'd prefer without the braces, i.e. \^2 and not \^{2}. The backslash should be enough to differentiate from exponentiation, and braces aren't needed in LaTeX either for single-character superscripts and subscripts.

@@ -781,6 +781,51 @@ const latex_symbols = [
"\\openbracketright" => "〛",
"\\overbrace" => "︷",
"\\underbrace" => "︸",
"\\^{0}" => "⁰",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please put a comment or something to separate them from the auto-generated list.

@lstagner
Copy link
Contributor Author

@stevengj That was my original plan but there were conflicts with \^( ,\^), \_(, \_), and \^i. The curly braces guaranteed a unique match and was still valid latex.

stevengj added a commit to stevengj/julia that referenced this pull request May 23, 2014
…EPL, and allow a wider range of chars (for things like JuliaLang#6927)
@lstagner
Copy link
Contributor Author

I suppose we could have numeric super/subscripts be \^1 and \_1 since they will probably be the most common and have all the rest have the curly braces.

Any opinions?

@stevengj
Copy link
Member

Why do you need braces for single letters like \^h and \_h?

I would just dump the schwa. In general, I wanted to avoid more and more LaTeX code creeping in, which is why I omitted things like \mathbb{A} (U+1D538) from my initial table, even though they were listed in the W3C unicode.xml. I suppose we could put those back in, but I would caution against anything that includes more than one backslash, as that will make parsing much more difficult.

@lstagner
Copy link
Contributor Author

I think we should try to be as close to latex as possible. If there is a more common name, such as in the case of \hbar and \Elzxh there is no reason not to have both. We could also have \grad map to ∇ like it does for \nabla.

As for braces around \^h on my machine it list all functions that start with the letter h

julia> \^h
hankelh1  hash       help       hex        hist       homedir    hvcat
hankelh2  haskey     hessfact   hex2bytes  hist2d     htol       hypot
has       hcat       hessfact!  hex2num    histrange  hton

@stevengj
Copy link
Member

I agree that we should pick the most common name when there are several to choose from; definitely \hbar and not \Elzxh or \xh, and have a couple of common synonyms. But the \Elz names are especially ugly (and in most cases it seems that there are more common versions names without the Elz prefix)... The autogenerated list is just a starting point.

The completion of \^h should be fixable.

@stevengj
Copy link
Member

I don't see the problem. If I do:

Base.REPLCompletions.latex_symbols["\\_h"] = "ₕ"
Base.REPLCompletions.latex_symbols["\\^n"] = "ⁿ"

then completion of \_h and \^n work for me. Maybe you had a typo?

@stevengj
Copy link
Member

It also looks like we are missing most of the IPA symbols (LaTeX wsuipa package), in case someone wants to spell their variables phonetically... ;-)

In general, beware that the W3C's unicode.xml file dates from 2003 (I couldn't find any more recent comprehensive table), so it may have many omissions.

@lstagner
Copy link
Contributor Author

Hmm, seems like i confused h with n. It is hard to tell the difference when its a subscript. In anycase, the curly braces don't seem necessary. Must of fixed whatever issue I was having with it. However I still have an issue with "\\^(" => "⁽" not being substituted in.

julia> Base.REPLCompletions.latex_symbols["\\^("] = "⁽"
"⁽"

julia> \^(              ##Doesn't do anything when tab is hit

@lstagner
Copy link
Contributor Author

Funnily enough when I just go through the list \^( works fine (all without curly braces)
Seems like so long as \^( is after another sub/superscript it works fine.

julia> ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ₀₁₂₃₄₅₆₇₈₉₊₋₌₍₎ₐₑₒₓₔₕₖₗₘₙₚₛₜ

julia> a\^(        ##Nothing

julia> ω\^(       ##Nothing

julia> ₁⁽ 

@stevengj
Copy link
Member

Probably something in REPLCompletions that handles parens specially. cc: @loladiro

@Keno
Copy link
Member

Keno commented May 23, 2014

I believe it's considered a word boundary and thus not completed.

@stevengj
Copy link
Member

\^( works now (commit d707cb8).

@stevengj
Copy link
Member

Now, is still not allowed in identifiers; you should probably include a patch to src/flisp/julia_extensions.c

@JeffBezanson
Copy link
Member

Added.

@stevengj
Copy link
Member

LGTM.

"\\_p" => "ₚ",
"\\_s" => "ₛ",
"\\_t" => "ₜ",
"\\hbar" => "ħ",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are using the wrong codepoint here. This is U+0127, but \hbar should be U+210F.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have \hslash for U+210F. U+0127 looks better to me in upright text, to be honest, so it's not completely clear to me what we should use here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW Wikipedia always uses U+210F for Planck's constant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U+0127 is also used in IPA, where it is called \textcrh or \crossh, depending upon the LaTeX package.

@lstagner
Copy link
Contributor Author

Well \hbar ħ (U+0127), looks more like a bar and \hslash ℏ (U+210F) looks more like a slash. So I think the current names at least make logical sense. My preference is to keep it as is. Perhaps when this gets documented we can sort it meaningfully so it will be easier to find alternatives.

@lstagner
Copy link
Contributor Author

I am ready for this to be merged if there are no other comments.

@stevengj
Copy link
Member

Arguments in favor of U+0127 for \hbar:

  • In LaTeX (and Wikipedia), all letters in equations are italic by default, so U+210F makes sense there. Here, all of our letters are generally upright, so it makes sense to use an upright \hbar by default too.
  • We are calling it \hbar, not \planck. The old name for U+0127 in Unicode was, in fact LATIN SMALL LETTER H BAR.

Argument against: most people using \hbar will be using it for Planck's constant, and U+210F is defined as Planck's constant in Unicode.

On balance, I'm inclined to support U+0127. In this context, typographical consistency (upright vs. italic) is more important than code point definitions.

@lstagner
Copy link
Contributor Author

+1 for U+0127 \hbar

stevengj referenced this pull request May 27, 2014
…ad of U+2329/232A (angle bracket), as the former are recommended by Unicode for math & technical usage
stevengj added a commit that referenced this pull request May 28, 2014
add unicode superscripts and subscripts to latex substitutions
@stevengj stevengj merged commit 39bc1bc into JuliaLang:master May 28, 2014
@stevengj
Copy link
Member

Since there seem to be no further objections (and LaTeX abbreviations are fairly innocuous anyway), I went ahead and merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants