Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick attempt at fixing multicharacter identifiers #2865

Merged
merged 10 commits into from
Nov 13, 2021

Conversation

balacij
Copy link
Collaborator

@balacij balacij commented Oct 17, 2021

A follow-up from a previous discussion (#2856 (comment)). This PR is intended to showcase the simple attempt for a fix. It doesn't fix all the problems, but it makes the multi-character identifiers look a bit better in the variable description tables. However, the specific example you noted in #1853 (GlassBR's DD: calOfCapacity) is still problematic (almost no change in rendering). For that, we might need to add a small space (via \ , or, preferably, something smaller) between identifiers whenever there is a group of identifiers directly next to each other (for example, terms with multiplication).

@balacij balacij requested a review from smiths October 17, 2021 21:43
@smiths
Copy link
Collaborator

smiths commented Oct 18, 2021

@balacij, you are correct that the changes don't do much for the NFLGTFLSF problem, but I do think they are still an incremental improvement in the appearance of the formatting. If @JacquesCarette is okay with how you got it to work, I'm certainly okay with the what it looks like. 😄

How hard would it be to change the symbols with dashes in their names to use underscores instead? For instance, change is-safeLoad to is_safeLoad? The dash really does look like a minus sign when the LaTeX is compiled. This isn't something to spend much time on, but it feels like Drasil only has the name in one spot, so changing the name, should only involve changing it on one spot? (and changing stable, of course).

For NFLGTFLSF I don't think it will look right unless we put a multiplication operator between the symbols (like \cdot). @JacquesCarette has correctly pointed out (in the past) that we don't want to "manually" do this in Drasil, and we don't want extraneous symbols. We could look for equation display situations that we don't like, and then add rules that are only there for the rendering of the equation. Does the multiplication of multicharacter variables always look bad? Is this something we would always want to do? I haven't been able to think of a case like this that I don't like better with a \cdot to break up the side by side multicharacter symbols, but is this maybe a slippery slope where we are trying to find too many different pdf rendering rules?

@JacquesCarette
Copy link
Owner

Oops, saw the commit, but not the PR. So what's odd about this fix is that 1-letter identifiers will be treated differently than multi-letter. We should be consistent with out use of fonts. It's LaTeX's fault that it does something odd. So we should pick a font for our identifiers, and use it systematically (even though that might produce output that is not fully what would be written by hand).

We can subsequently deal with things like multiplication, which is indeed a completely different issue.

@balacij
Copy link
Collaborator Author

balacij commented Oct 19, 2021

Thank you both for checking this PR 😄

@balacij, you are correct that the changes don't do much for the NFLGTFLSF problem, but I do think they are still an incremental improvement in the appearance of the formatting. If @JacquesCarette is okay with how you got it to work, I'm certainly okay with the what it looks like. smile

Sounds good!

How hard would it be to change the symbols with dashes in their names to use underscores instead? For instance, change is-safeLoad to is_safeLoad? The dash really does look like a minus sign when the LaTeX is compiled. This isn't something to spend much time on, but it feels like Drasil only has the name in one spot, so changing the name, should only involve changing it on one spot? (and changing stable, of course).

I doubt it would take much time at all 😄, I can include it in this PR (or in a new one if that's preferred).

For NFLGTFLSF I don't think it will look right unless we put a multiplication operator between the symbols (like \cdot). @JacquesCarette has correctly pointed out (in the past) that we don't want to "manually" do this in Drasil, and we don't want extraneous symbols. We could look for equation display situations that we don't like, and then add rules that are only there for the rendering of the equation. Does the multiplication of multicharacter variables always look bad? Is this something we would always want to do? I haven't been able to think of a case like this that I don't like better with a \cdot to break up the side by side multicharacter symbols, but is this maybe a slippery slope where we are trying to find too many different pdf rendering rules?

I think the general rule is that multiplication of multicharacter variables looks ambiguous if at least 2 consecutive expressions are just plain identifiers, or if the multiplication of identifiers cause a name collision (multicharacter variables/identifiers potentially colliding with multiple shorter variables/identifiers. For example, if we have variables "ABC", "A", "B", and "C", and then we multiply "A", "B", and "C", we end up with ABC [which is not the variable "ABC"]). Maybe an alternative solution is to decrease the font size by ~25% of all characters after the first in a variable identifier? Is this easy to do in LaTeX? This way, we wouldn't need to add any sort of spacing or multiplication symbols. However, this solution could be problematic with inlined expressions, and is also uncommon (well, I've never seen it done before).

The only ambiguous area I could think of with using either \cdot or spaces to break up multiplication expressions (with at least 1 subexpression being a plain multicharacter identifier) occurs when the whole expression appears as a subexpression in an expression that contains another large multiplication of symbols which doesn't contain any multicharacter identifiers. For example, if we had variables "ABC", "D", "E", "F", "G", "H", then if we had ABCDEF, we would write it as ABC \cdot D \cdot E, but if this expression were a part of another expression: ABCDE + FGH, would we want to write it as ABC \cdot D \cdot E + FGH or as ABC \cdot D \cdot E + F \cdot G \cdot H? In other words, should it affect rendering of expressions that contain it?

So what's odd about this fix is that 1-letter identifiers will be treated differently than multi-letter. We should be consistent with out use of fonts. It's LaTeX's fault that it does something odd. So we should pick a font for our identifiers, and use it systematically (even though that might produce output that is not fully what would be written by hand).

I thought that all text written in a math environment (between $'s and $$'s (\(, and \[ too)) would always default to using a mathematical italicized font. Does this mean that there are instances where identifiers appear outside of math environments? If so, then I imagine we would also want to wrap single character identifiers with \mathit?

Though, now that I think about single character identifiers placed in math environments, this would mean that we shouldn't need \mathit for the multicharacter names, so this contradicts what I had thought. I'm a bit confused as to why \mathit fixes this problem in the first place.

Aside: If we decide to always place identifiers in \mathit{..}, then we could also place a list of macros at the top of the file for each variable name (so that we save a few bytes and improve readability slightly in the generated TeX documents).

We can subsequently deal with things like multiplication, which is indeed a completely different issue.

Sounds good.

@balacij balacij marked this pull request as ready for review October 26, 2021 19:44
@balacij
Copy link
Collaborator Author

balacij commented Oct 26, 2021

The _ looks better. However, I didn't know that the TeX renderer does not already escape the special characters. Perhaps a follow-up issue could be to escape all special characters appropriately (for which, I don't immediately know all of them)? I think this could also be a good task given to the new students working on Drasil.

@smiths
Copy link
Collaborator

smiths commented Oct 27, 2021

The results look good to me. @balacij, I think you hit on the reason why the underscore wasn't used previously. LaTeX doesn't like that symbol. I'm glad you were able to get it to work. This is certainly an improvement.

I think we should probably postpone looking at how to make the multiplication look better.

With respect to everything being in a math font inside an equation, not everything defaults to an italic font. Standard functions (like \cos, \sin, \ln, etc) are not in italic font.

The use of \mathit is to remove the spacing that LaTeX inserts. By default characters are spaced as if they are being multiplied together. With \mathit they are being spaced as for regular "text".

@balacij
Copy link
Collaborator Author

balacij commented Oct 27, 2021

The results look good to me. @balacij, I think you hit on the reason why the underscore wasn't used previously. LaTeX doesn't like that symbol. I'm glad you were able to get it to work. This is certainly an improvement.
I think we should probably postpone looking at how to make the multiplication look better.

Thank you, I agree.

With respect to everything being in a math font inside an equation, not everything defaults to an italic font. Standard functions (like \cos, \sin, \ln, etc) are not in italic font.
The use of \mathit is to remove the spacing that LaTeX inserts. By default characters are spaced as if they are being multiplied together. With \mathit they are being spaced as for regular "text".

I was not aware of these TeX rules. That's very interesting! Thank you 😄

Copy link
Owner

@JacquesCarette JacquesCarette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much changes as explanation.

@JacquesCarette
Copy link
Owner

We should really ask ourselves whether we want to let people use _ in their names. If we do, then indeed, we need to escape them in latex. I'm undecided.

@smiths
Copy link
Collaborator

smiths commented Oct 28, 2021

We should really ask ourselves whether we want to let people use _ in their names. If we do, then indeed, we need to escape them in latex. I'm undecided.

The _ was motivated to replace -. The dash is definitely worse, since it looks like a minus sign. We don't need _ though if we use camel case. For this example we would then have $\mathit{isSafe}$. I'm fine with that name.

@balacij
Copy link
Collaborator Author

balacij commented Oct 28, 2021

Personally, I have no problems with _ in symbol names. @smiths' suggestion of switching to camelCase also sounds good to me.

Regarding rules, what do we think about disallowing the characters used in the Expr/ModelExpr operators (e.g., \cdot, integrals, etc) to make expressions completely unambiguous? Though, I think I might have seen \sum/big sigma used as a variable name before.

@JacquesCarette
Copy link
Owner

For GlassBR, I think going to camlCase will be simplest.

Yes, certain operators should be disallowed. And yes, some Greek letters do get overloaded. But that should be done via using the shorthands for them, rather than directly hacking in the unicode.

@balacij
Copy link
Collaborator Author

balacij commented Oct 29, 2021

Updated now to use camel case.

There is still the related task of creating rules for which characters are allowed in symbol names, but I think that can be done in a separate ticket.

@JacquesCarette JacquesCarette merged commit d035d43 into master Nov 13, 2021
@JacquesCarette JacquesCarette deleted the useMathitOnMultiChar branch November 13, 2021 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants