Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve HTML-to-text conversion #260

Open
wilx opened this issue Mar 10, 2015 · 4 comments
Open

Improve HTML-to-text conversion #260

wilx opened this issue Mar 10, 2015 · 4 comments
Labels

Comments

@wilx
Copy link

wilx commented Mar 10, 2015

I am seeing conversion artefacts. I have recorded a video of this: https://www.youtube.com/watch?v=bpwFqdszDUI

As you can see from the video, there seem to be two issues. First is that conversion round trip seems to add empty lines at the top of the text. Second, there is an unwanted paragraph break inserted in the middle of the text.

The version I am using appears to be v2.11.4.

@adam-p
Copy link
Owner

adam-p commented Mar 10, 2015

First, the empty lines getting added: #236. You're using a custom font in Gmail, right?

Second, the paragraph break in the middle: super weird. Does that happen to you often? Can you reliably reproduce it?

If so... Can you copy the HTML of the email before rendering (or after rendering and unrendering, if you need to check that it's broken)? Here's a GIF showing how:
https://github.com/adam-p/markdown-here/wiki/Troubleshooting#copying-html

@wilx
Copy link
Author

wilx commented Mar 10, 2015

First, the empty lines getting added: #236. You're using a custom font in Gmail, right?

I have it set to Georgia in Gmail settings. I have tried to change that back to just Sans Serif but that does not seem to help. I am using a different font for the same reasons this guy does:

#236 (comment)

@wilx
Copy link
Author

wilx commented Mar 10, 2015

The line break in the middle of a paragraph is something I have seen multiple times already.

I just copy/pasted the from the email I sent into a new message. This is the HTML before hitting C-A-m:

<div id=":291" class="Am Al editable LW-avf" hidefocus="true" aria-label="Message Body" g_editable="true" role="textbox" contenteditable="true" tabindex="1" itacorner="6,7:1,1,0,0" style="direction: ltr; min-height: 343px;"><div class="gmail_default" style="font-family: georgia, serif;"><span style="font-size: 12.8000001907349px;">Hello.</span></div><div class="gmail_default" style="font-family: georgia, serif;"><p style="font-family: arial, sans-serif; font-size: 12.8000001907349px;"></p><p dir="ltr" style="font-family: arial, sans-serif; font-size: 12.8000001907349px;"></p><div class="gmail_default" style="font-size: 12.8000001907349px; display: inline;">​I am still having issues with the navigation when it thinks my position is in a tunnel. When I get out of the tunnel anywhere else than the navigation directs me to, like when the tunnel has a fork and the navigation directs me one way and I go another way,&nbsp;</div><div class="gmail_default" style="font-size: 12.8000001907349px; display: inline;">​</div><font face="georgia, serif" style="font-size: 12.8000001907349px;">​</font><div class="gmail_default" style="font-family: arial, sans-serif; font-size: 12.8000001907349px; display: inline;"><font face="georgia, serif">​it seems to keep counting impulses as if I were still in the tunnel; whenever&nbsp;I move, the navigation moves my position _in the tunnel_, despite me being out of the tunnel for a long time. It does not reconsider my position, ignoring the GPS position effectively, until it does counts enough impulses ​to get away from the tunnel. Then&nbsp;suddenly it stops ignoring the GPS position and jumps 500 meters or 1 km towards my actual position.</font></div><p dir="ltr" style="font-family: arial, sans-serif; font-size: 12.8000001907349px;">-- VZ</p></div><div><br></div>
</div>

And after:

<div id=":2iu" class="Am Al editable LW-avf" hidefocus="true" aria-label="Message Body" g_editable="true" role="textbox" contenteditable="true" tabindex="1" itacorner="6,7:1,1,0,0" style="direction: ltr; min-height: 343px;"><div class="markdown-here-wrapper" data-md-url="mail.google.com" style=""><p style="margin: 1.2em 0px !important;margin-top: 0px !important;">​​​​​​​Hello.</p>
<p style="margin: 1.2em 0px !important;">​I am still having issues with the navigation when it thinks my position is in a tunnel. When I get out of the tunnel anywhere else than the navigation directs me to, like when the tunnel has a fork and the navigation directs me one way and I go another way,<br>​<br>​<br>​it seems to keep counting impulses as if I were still in the tunnel; whenever I move, the navigation moves my position <em>in the tunnel</em>, despite me being out of the tunnel for a long time. It does not reconsider my position, ignoring the GPS position effectively, until it does counts enough impulses ​to get away from the tunnel. Then suddenly it stops ignoring the GPS position and jumps 500 meters or 1 km towards my actual position.<br>— VZ</p>
<div title="MDH:PGRpdiBjbGFzcz0iZ21haWxfZGVmYXVsdCIgc3R5bGU9ImZvbnQtZmFtaWx5OiBhcmlhbCwgaGVs
dmV0aWNhLCBzYW5zLXNlcmlmOyI+PGRpdiBjbGFzcz0iZ21haWxfZGVmYXVsdCIgc3R5bGU9ImZv
bnQtc2l6ZTogMTIuODAwMDAwMTkwNzM0OXB4OyBmb250LWZhbWlseTogZ2VvcmdpYSwgc2VyaWY7
Ij7igIvigIvigIvigIvigIvigIvigItIZWxsby48L2Rpdj48cCBzdHlsZT0iZm9udC1mYW1pbHk6
IGFyaWFsLCBzYW5zLXNlcmlmOyBmb250LXNpemU6IDEyLjgwMDAwMDE5MDczNDlweDsiPjwvcD48
cCBkaXI9Imx0ciIgc3R5bGU9ImZvbnQtZmFtaWx5OiBhcmlhbCwgc2Fucy1zZXJpZjsgZm9udC1z
aXplOiAxMi44MDAwMDAxOTA3MzQ5cHg7Ij48L3A+PGRpdiBjbGFzcz0iZ21haWxfZGVmYXVsdCIg
c3R5bGU9ImZvbnQtc2l6ZTogMTIuODAwMDAwMTkwNzM0OXB4OyBmb250LWZhbWlseTogZ2Vvcmdp
YSwgc2VyaWY7IGRpc3BsYXk6IGlubGluZTsiPuKAi0kgYW0gc3RpbGwgaGF2aW5nIGlzc3VlcyB3
aXRoIHRoZSBuYXZpZ2F0aW9uIHdoZW4gaXQgdGhpbmtzIG15IHBvc2l0aW9uIGlzIGluIGEgdHVu
bmVsLiBXaGVuIEkgZ2V0IG91dCBvZiB0aGUgdHVubmVsIGFueXdoZXJlIGVsc2UgdGhhbiB0aGUg
bmF2aWdhdGlvbiBkaXJlY3RzIG1lIHRvLCBsaWtlIHdoZW4gdGhlIHR1bm5lbCBoYXMgYSBmb3Jr
IGFuZCB0aGUgbmF2aWdhdGlvbiBkaXJlY3RzIG1lIG9uZSB3YXkgYW5kIEkgZ28gYW5vdGhlciB3
YXksJm5ic3A7PC9kaXY+PGRpdiBjbGFzcz0iZ21haWxfZGVmYXVsdCIgc3R5bGU9ImZvbnQtc2l6
ZTogMTIuODAwMDAwMTkwNzM0OXB4OyBmb250LWZhbWlseTogZ2VvcmdpYSwgc2VyaWY7IGRpc3Bs
YXk6IGlubGluZTsiPuKAizwvZGl2Pjxmb250IGZhY2U9Imdlb3JnaWEsIHNlcmlmIiBzdHlsZT0i
Zm9udC1zaXplOiAxMi44MDAwMDAxOTA3MzQ5cHg7Ij7igIs8L2ZvbnQ+PGRpdiBjbGFzcz0iZ21h
aWxfZGVmYXVsdCIgc3R5bGU9ImZvbnQtZmFtaWx5OiBhcmlhbCwgc2Fucy1zZXJpZjsgZm9udC1z
aXplOiAxMi44MDAwMDAxOTA3MzQ5cHg7IGRpc3BsYXk6IGlubGluZTsiPjxmb250IGZhY2U9Imdl
b3JnaWEsIHNlcmlmIj7igItpdCBzZWVtcyB0byBrZWVwIGNvdW50aW5nIGltcHVsc2VzIGFzIGlm
IEkgd2VyZSBzdGlsbCBpbiB0aGUgdHVubmVsOyB3aGVuZXZlciZuYnNwO0kgbW92ZSwgdGhlIG5h
dmlnYXRpb24gbW92ZXMgbXkgcG9zaXRpb24gX2luIHRoZSB0dW5uZWxfLCBkZXNwaXRlIG1lIGJl
aW5nIG91dCBvZiB0aGUgdHVubmVsIGZvciBhIGxvbmcgdGltZS4gSXQgZG9lcyBub3QgcmVjb25z
aWRlciBteSBwb3NpdGlvbiwgaWdub3JpbmcgdGhlIEdQUyBwb3NpdGlvbiBlZmZlY3RpdmVseSwg
dW50aWwgaXQgZG9lcyBjb3VudHMgZW5vdWdoIGltcHVsc2VzIOKAi3RvIGdldCBhd2F5IGZyb20g
dGhlIHR1bm5lbC4gVGhlbiZuYnNwO3N1ZGRlbmx5IGl0IHN0b3BzIGlnbm9yaW5nIHRoZSBHUFMg
cG9zaXRpb24gYW5kIGp1bXBzIDUwMCBtZXRlcnMgb3IgMSBrbSB0b3dhcmRzIG15IGFjdHVhbCBw
b3NpdGlvbi48L2ZvbnQ+PC9kaXY+PHAgZGlyPSJsdHIiIHN0eWxlPSJmb250LWZhbWlseTogYXJp
YWwsIHNhbnMtc2VyaWY7IGZvbnQtc2l6ZTogMTIuODAwMDAwMTkwNzM0OXB4OyI+LS0gVlo8L3A+
PC9kaXY+PGRpdj48YnI+PC9kaXY+Cg==" style="height:0;width:0;max-height:0;max-width:0;overflow:hidden;font-size:0em;padding:0;margin:0;">​</div></div></div>

@adam-p adam-p added the bug label Mar 11, 2015
@adam-p
Copy link
Owner

adam-p commented Mar 11, 2015

I see the problem. Here is your before-rendering HTML, pretty-printed:

<div id=":291" class="Am Al editable LW-avf" hidefocus="true" aria-label="Message Body" 
     g_editable="true" role="textbox" contenteditable="true" tabindex="1" 
     itacorner="6,7:1,1,0,0" style="direction: ltr; min-height: 343px;">
  <div class="gmail_default" style="font-family: georgia, serif;">
    <span style="font-size: 12.8000001907349px;">
      Hello.
    </span>
  </div>
  <div class="gmail_default" style="font-family: georgia, serif;">
    <p style="font-family: arial, sans-serif; font-size: 12.8000001907349px;"></p>
    <p dir="ltr" style="font-family: arial, sans-serif; font-size: 12.8000001907349px;"></p>
    <div class="gmail_default" style="font-size: 12.8000001907349px; display: inline;">​
      I am still having issues with the navigation when it thinks my position is 
      in a tunnel. When I get out of the tunnel anywhere else than the navigation 
      directs me to, like when the tunnel has a fork and the navigation directs 
      me one way and I go another way,&nbsp;
    </div>
    <div class="gmail_default" 
         style="font-size: 12.8000001907349px; display: inline;"></div>
    <font face="georgia, serif" style="font-size: 12.8000001907349px;"></font>
    <div class="gmail_default" 
         style="font-family: arial, sans-serif; font-size: 12.8000001907349px; display: inline;">
      <font face="georgia, serif">​
        it seems to keep counting impulses as if I were still in the tunnel; 
        whenever&nbsp;I move, the navigation moves my position _in the tunnel_, 
        despite me being out of the tunnel for a long time. It does not reconsider 
        my position, ignoring the GPS position effectively, until it does counts 
        enough impulses ​to get away from the tunnel. Then&nbsp;suddenly it stops 
        ignoring the GPS position and jumps 500 meters or 1 km towards my actual 
        position.
      </font>
    </div>
    <p dir="ltr" style="font-family: arial, sans-serif; font-size: 12.8000001907349px;">
      -- VZ
    </p>
  </div>
  <div>
    <br>
  </div>
</div>

See all the <div>s in the middle? MDH's HTML-to-text processing, which extracts the MD you wrote from the email body, naively treats <div>s as display:block, but you can see above the display:inline style applied to them. That style makes all the difference between a single paragraph-looking thing and multiple paragraph-looking things.

This problem is, again, probably related to your use of a custom font. But the real problem is the HTML-to-text processing.

I'm going to turn this issue into a general "make HTML-to-text better" catch-all (so I'm renaming it). #10 is closely related, and here are some old notes about it.

For now I guess the workarounds are:

  1. Turn off the custom font in Gmail.
  2. When the rendering gets weird like that, try to delete some of the characters around the breaking point and re-type them.

Yeah, weak.

@adam-p adam-p changed the title Two Markdown Here in Chrome issues Improve HTML-to-text conversion Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants