Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with space offsets in TJs #6019

Closed
wants to merge 4 commits into from

Conversation

kalley
Copy link

@kalley kalley commented May 15, 2015

We were having issues in multiple documents with spaces in a table-like text block that used TJs when selecting text.

master:
image

this branch:
image

because of the math involved in calculating the offsets in spaced text (TJ), you can't really separate out that logic into two separate operations.

Reference equation from the spec is on page 252.

I wasn't sure if just adding the pdf to the test_manifest was enough, so let me know if you need more than that.

var font = textState.font;
var offset = (typeof previousChars === 'string' ? 0 :
previousChars / 1000) || 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can actually simplify this expression by instead doing e.g.

var offset = (isNum(previousChars) ? previousChars / 1000 : 0);

@timvandermeij
Copy link
Contributor

#5813 is also (partially) touching this code, however, in a code block that you removed here. Will this PR also fix the issue mentioned there?

@Snuffleupagus
Copy link
Collaborator

It really great that you've included a reduced test-case!
However the PDF file is unfortunately slightly malformed, since Adobe Reader won't render anything and PDF.js complains about it as well.

Since I know from first hand experiences that it's sometimes difficult to create working test-cases, I took the liberty of providing a cleaned-up version of the PDF file: https://www.dropbox.com/s/efo05mrfo1fa2k4/pr6019.pdf?dl=0. (If you use this file instead, please don't forget to update the md5 entry in the manifest.)

textChunk.str.push(' ');
}
} else if (fakeSpaces > SPACE_FACTOR) {
buildTextGeometry(items[j], textChunk, items[j - 1]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When j === 0, this will result in j - 1 being out of range of the array. As we've seen before, e.g. in #5663 (comment), that can slow things down in some browsers. Hence I'm wondering if it makes sense to try and avoid that, e.g. do this instead:

buildTextGeometry(items[j], textChunk, (j > 0 ? items[j - 1] : 0));

@kalley
Copy link
Author

kalley commented May 15, 2015

@timvandermeij in regards to #5813, it doesn't completely fix his issue, but they do work in concert when you just take the changes from within buildTextGeometry.

@Snuffleupagus what did you use to make your pdf? I only have a grasp of the obj streams well enough to screw with them.

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.21.233.14:8877/2719f1e31e43d89/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://107.22.172.223:8877/eb75e62f2da9bcc/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/eb75e62f2da9bcc/output.txt

Total script time: 17.91 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/eb75e62f2da9bcc/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/2719f1e31e43d89/output.txt

Total script time: 18.81 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/2719f1e31e43d89/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

The test results seem both good and (perhaps) less good. In e.g. taro-text there are improvements in the horizontal text, but the vertical text (on pages 2 and 4) seem to regress a bit.

@kalley
Copy link
Author

kalley commented May 15, 2015

In the vertical, if I add the offset instead of subtract it, it looks fine. But that doesn't seem to match up with the spec. I'll keep digging.

@kalley
Copy link
Author

kalley commented May 15, 2015

I don't understand the vmetric property on the glyph, which in about every case in the taro-text document seems to be glyph.width * -1. I'm wondering if that could be part of it. I just need some more context around it. It could be as easy and flipping the offset if vmetric exists.

@timvandermeij
Copy link
Contributor

Let's get this PR rolling again. @brendandahl Do you have any idea on how the vmetric property works as asked above?

@brendandahl
Copy link
Contributor

Vmetrics come from DW2 array in the font dictionary. There's alot more info in 9.7.4.3 Glyph Metrics in CIDFonts, but the number you're seeing is negative is because in the PDF coordinate system the origin starts at the bottom left, so a negative moves down.

@Snuffleupagus
Copy link
Collaborator

Oh, it's actually possible that some of the movement, in the vertical text, seen in the earlier testing of this patch exposed an issue which has since been fixed by PR #6391.

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/a366e510b2f5a6d/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.22.172.223:8877/35637fd01871341/output.txt

@@ -1130,7 +1133,8 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
var offset;
for (var j = 0, jj = items.length; j < jj; j++) {
if (typeof items[j] === 'string') {
buildTextGeometry(items[j], textChunk);
buildTextGeometry(items[j], textChunk,
(j > 0 ? items[j - 1] : 0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about passing in null instead when j === 0, i.e changing it to
(j > 0 ? items[j - 1] : null)? That way, you'll avoid a pointless division at line 958.

Also, can you please add one more space to the indentation of this line.

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://107.22.172.223:8877/35637fd01871341/output.txt

Total script time: 18.40 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.22.172.223:8877/35637fd01871341/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Linux)


Failed

Full output at http://107.21.233.14:8877/a366e510b2f5a6d/output.txt

Total script time: 20.01 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://107.21.233.14:8877/a366e510b2f5a6d/reftest-analyzer.html#web=eq.log

@Snuffleupagus
Copy link
Collaborator

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux)


Received

Command cmd_preview from @Snuffleupagus received. Current queue size: 0

Live output at: http://107.21.233.14:8877/fc737bf3c7f97aa/output.txt

@kalley
Copy link
Author

kalley commented Oct 15, 2015

Let me know if you're ready for me to squash this. Or if there are any additional comments.

@@ -953,8 +953,9 @@ var PartialEvaluator = (function PartialEvaluatorClosure() {
});
}

function buildTextGeometry(chars, textChunk) {
function buildTextGeometry(chars, textChunk, previousChars) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to handle previousChar in the if below? It does not look logical handle that in two places

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could handle it below, but I felt like figuring out the offset made sense here, where as below it's just passing in the previous character. It doesn't have a concept of an offset. Though that is debatable, looking at the else down there. I'll push another commit that changes this. There's still going to end up being a check up here to make a decision on 0 or not.

@SamyCookie
Copy link

@kalley In the generated preview above and with the tracemonkey demo paper, text selection on the first paragrapher (and others) is larger than before and seems buggy. Is it normal ?

@kalley
Copy link
Author

kalley commented Oct 16, 2015

@SamyCookie nope, that's not normal. I may need to go back to the drawing board a little after #6391. I think originally I had completely removed any offset logic from that if block, but when I rebased, I thought that needed to be there. I'll dig in tomorrow.

@kalley
Copy link
Author

kalley commented Oct 16, 2015

Looks like what @SamyCookie found is fixed now, and my initial visual test of the taro-text looks correct as well.

Though, there now seems to be a problem with issue6387.pdf.

@kalley
Copy link
Author

kalley commented Oct 16, 2015

Strangely, changing the - to a + here: https://github.com/mozilla/pdf.js/pull/6019/files#diff-0b94c2e77a5259f7a728122fdbf9f46aR1045 fixes the issues with issue6387.pdf, but breaks page 2 and 4 of taro-text. I'll keep digging around, but if someone has any insight, that would be awesome.

yurydelendik added a commit to yurydelendik/pdf.js that referenced this pull request Nov 2, 2015
yurydelendik added a commit to yurydelendik/pdf.js that referenced this pull request Nov 3, 2015
yurydelendik added a commit to yurydelendik/pdf.js that referenced this pull request Nov 3, 2015
@yurydelendik
Copy link
Contributor

The broken PDF was fixed by #6588. Thanks for looking into the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants