Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-write ANSI color handling #225

Merged
merged 6 commits into from
Feb 15, 2016
Merged

Re-write ANSI color handling #225

merged 6 commits into from
Feb 15, 2016

Conversation

mgeier
Copy link
Contributor

@mgeier mgeier commented Feb 2, 2016

Initially, I only wanted to have less ugly colors for Python exceptions shown in LaTeX output.
But then I found out that the whole ANSI-handling code is quite a mess (which I already hinted on in #214).

The code handling HTML and LaTeX was completely separate and showed quite different behavior.

I re-implemented the whole thing, now most of the code is shared between HTML and LaTeX output. I also implemented the 256 indexed colors and 24-bit RGB colors which are supported by the live notebook. And I fixed several bugs along the way (e.g. #174).

There are a few caveats, though:

  • The behavior is still different from the live notebook, but I consider my implementation more correct in several points. I'll probably list the bugs in a separate issue there, once I got some feedback here.
  • The 8 (resp. 16) default colors are still different between HTML and LaTeX because I didn't dare to change the color definitions. But probably this should be adjusted?
  • I had to add 2 \newcommands in the LaTeX template because there were errors when I tried to use something like \textcolor[RGB]{255,0,0}{some text} within a Verbatim environment. I think it may have something to do with the commas (I read somewhere that they might be "active", whatever that means ...), but I don't know enough TeX-fu to find, let alone fix the error. Can somebody help with that?

Closes jupyter#214.
As a side effect, this also fixes jupyter#174.
@minrk
Copy link
Member

minrk commented Feb 2, 2016

Since this fixes "several bugs", would you mind adding tests covering the things you found that weren't correct before that are now fixed?

@mgeier
Copy link
Contributor Author

mgeier commented Feb 2, 2016

Oh, I forgot: sure, I'll update the tests, but I've never used them, can you please point me to the documentation how to run the tests?

Also, I wanted to wait for some feedback ... If you don't like my implementation, there's no point in making tests for it ...

@takluyver
Copy link
Member

I don't know if it's explicitly documented anywhere, but running nosetests in the root of the repository should do the trick. If you get errors about missing css, run python setup.py css to fetch it.

I'll try to have a look over the implementation if Min doesn't get round to it first.

@minrk
Copy link
Member

minrk commented Feb 2, 2016

I think the basic principal of the new implementation is sound. ANSI processing is just fiddly enough that any changes make me squeamish without tests (accepts warranted criticism for the poor test coverage in the first place).

text += '</span>'
return text
if isinstance(fg, int):
classes.append(_FG_HTML[fg])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are KeyErrors impossible on these lookups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, "impossible" is a very strong word, but yes, if fg (or bg) is a single int, it is supposed to be within range(8). AFAICT, there is no code path that allows otherwise (assuming no monkey-patching and stuff).

@mgeier
Copy link
Contributor Author

mgeier commented Feb 3, 2016

@takluyver Thanks, it would probably be helpful to add those instructions to CONTRIBUTING.md. Also how to make a "develop" installation locally ... and how to build the docs ...

@minrk Thanks for having a look! I strongly agree that this ANSI stuff is quite fiddly (and underspecified), I'll update the tests soon.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 3, 2016

While updating the tests, I found a bug in ipython_genutils.text.strip_ansi(). Is there a way to find out if this is used anywhere else?

If not, I guess it would make sense to move it to nbconvert/filters/ansi.py, right?
And fix its bugs, of course.

@minrk
Copy link
Member

minrk commented Feb 3, 2016

@mgeier yes, if there are bugs in something from genutils, it's appropriate to add an implementation here and stop using the genutils version.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 3, 2016

OK, will do. Shall I also file a PR to remove it from genutils or will it just rot there?

Another question:
Which of those regular expressions are more efficient/nicer/cooler/easier to understand...?

\x1b\\[([^@-~]*)([@-~])

\x1b\\[(.*?)([@-~])

They should do the same thing, but I kinda prefer the second one, because it uses less characters in total but more different ones.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 3, 2016

OK, I updated the tests, copied strip_ansi() from ipython_genutils (and hopefully fixed it). As expected, Travis CI doesn't complain anymore.
For now, I changed to the second regex, if you want me to change back to the first one (or an entirely different one), please tell me.

To be quite honest, I don't really want to add tests for my 256 color and 24-bit extension, but if you insist, I'll do it. Also, the test cases are quite repetitious, this should probably be fixed, too.
And why on earth are you using this self._try_*() thing, isn't that utterly useless boilerplate?

BTW, why don't you use py.test, wouldn't that be much cooler?

The third point of my initial message is still open, any TeX gurus out there?

@minrk
Copy link
Member

minrk commented Feb 3, 2016

@mgeier I'd leave it alone. It is my hope to never make another release of genutils, and making a PR removing anything would break things (like current stable nbconvert), so we shouldn't ever do that.

@minrk
Copy link
Member

minrk commented Feb 3, 2016

BTW, why don't you use py.test, wouldn't that be much cooler?

We use pytest in some new projects, but we've been using nose in IPython for years, so that's what the tests have been written against when added to existing projects. And, having used both pytest and nose, pytest doesn't really offer benefits over nose that justify anyone spending the time to port existing, working test code.

@takluyver
Copy link
Member

I just grepped through the repos I have cloned for other uses of strip_ansi(), it appears to be used only in nbconvert and in IPython (which already has its own copy). There are a couple of repos I don't have (ipyparallel, ipywidgets), but I think it's unlikely they use it.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 4, 2016

I've asked on stackexchange about the LaTeX problem mentioned above, and I promptly got an answer with a great (albeit somewhat verbose) solution. I've pushed a new commit with that.

From my side, this is ready to be merged now, do you want me to make further changes?
If not, shall I squeeze and rebase my commits?

@minrk As a side note, I want to make a few comments about py.test, but this doesn't really have anything to do with this PR:

I've never really used nosetests, so I'm probably missing something, but from what I've seen, py.test has huge benefits since it allows much more detailed failure reports with much less boilerplate. It's somewhat magical ...
I might have overlooked that in nosetests, but py.test offers very easy means to create parametrized tests, which would be a perfect fit for the ANSI tests. In contrast to the manual loop that's used now, this would make a separate test case for each individual test condition, showing in a much clearer way (in case of a test failure) what the conditions actually were. And it wouldn't give up at the first failure but continue with the other conditions within the test case.
If you want, I can try to rewrite the ANSI tests to show what I mean ...

In general, py.test has the advantage that one can use simple assert statements instead of the ugly specialized assertSomething functions/methods. The output in case of failure is still great.

... that justify anyone spending the time to port existing, working test code.

And that's the beautiful thing: py.test can run the current nbconvert tests without change! There is nothing to port!

If you change to py.test (which I think would be a good idea), you can gradually change tests to use the cooler assert statements, parametrized tests, ... whenever there is time.

@minrk
Copy link
Member

minrk commented Feb 4, 2016

Please don't change the tests to use py.test as part of this PR. I'm aware it has some benefits. I do typically choose it for new projects. I think they are significantly overstated.

@minrk minrk added this to the 4.2 milestone Feb 4, 2016
@mgeier
Copy link
Contributor Author

mgeier commented Feb 5, 2016

@minrk Don't worry, I wasn't going to do that. If anyone is interested, I can do it in a separate PR, if not, I won't do anything.

@minrk
Copy link
Member

minrk commented Feb 6, 2016

Thanks! At some point, we probably will. The fact that the tests run unmodified definitely makes it easier to start.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 11, 2016

Is this scheduled for merging or do I have to make some changes?

'\x1b[1;33mhello': '<span class="ansiyellow ansibold">hello</span>',
'\x1b[37mh\x1b[0;037me\x1b[;0037ml\x1b[00;37ml\x1b[;;37mo': '<span class="ansigray">h</span><span class="ansigray">e</span><span class="ansigray">l</span><span class="ansigray">l</span><span class="ansigray">o</span>',
'hel\x1b[0;32mlo': 'hel<span class="ansigreen">lo</span>',
'hello': 'hello',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if I'm missing it, but can we get a test case where two different formatting options overlap, e.g.:

# [BOLD]hello[YELLOW]world[RESET]
\x1b[1mhello\x1b[33mworld\x1b0m

^ Don't trust that I've constructed that example correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think you're missing it. This wasn't possible before, so there were no tests for it.

I added a test case, but first I changed to test generators and combined the test cases for ansi2html() and ansi2latex() to avoid the annoying duplication of test strings.

UPDATE: I reverted the de-duplication (1446e2a) because it seemed to be incompatible with Python 2.

@takluyver
Copy link
Member

Thanks! I think the generator tests might be causing the failure on Python 2 on Travis - I vaguely remember having problems with that machinery before and stripping out all uses of it, but I forget the details now.

@mgeier
Copy link
Contributor Author

mgeier commented Feb 14, 2016

OK, then I'm dropping 1446e2a and just add the new test case (twice).

@mgeier mgeier mentioned this pull request Feb 14, 2016
@takluyver
Copy link
Member

Thanks. I think this is looking OK; I'll give it a day or two more for other people to have a look, and we can merge if nothing else is brought up.

@Carreau
Copy link
Member

Carreau commented Feb 15, 2016

No objections.

From a quick look I think most of these might be available in Pygments, if we already depends on it.

@takluyver
Copy link
Member

nbconvert does already depend on pygments for highlighting code in HTML and Latex output. However, I don't immediately see an API in pygments for parsing ANSI escapes. If that machinery is there but not publicly exposed, I'd suggest we merge this for now as it looks like a concrete improvement, and look separately at collaborating with pygments or some other package to reduce code duplication.

@minrk
Copy link
Member

minrk commented Feb 15, 2016

👍 Thanks, @mgeier!

@Carreau
Copy link
Member

Carreau commented Feb 15, 2016

I'd suggest we merge this for now as it looks like a concrete improvement, and look separately at collaborating with pygments or some other package to reduce code duplication.

Seem like a plan. I'm in Pygments codebase these days. I'll see if I can have a look.

@takluyver
Copy link
Member

In ur codebase, killin ur bugz? ;-)

takluyver added a commit that referenced this pull request Feb 15, 2016
Re-write ANSI color handling
@takluyver takluyver merged commit 5646f0b into jupyter:master Feb 15, 2016
@mgeier mgeier deleted the ansi branch February 16, 2016 08:41
@mgeier
Copy link
Contributor Author

mgeier commented Feb 16, 2016

Thanks for merging!

I had a quick look at the Pygments code and I only found ANSI sequences as output. But if there is also code for input, it sure would make sense to re-use this instead of re-implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants