Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catch the engine(s) up to TeXLive 2020.0 #666

Merged
merged 24 commits into from
Oct 28, 2020

Conversation

pkgw
Copy link
Collaborator

@pkgw pkgw commented Oct 24, 2020

At long last, let's start the work to catch the Tectonic engine implementations up to TeXLive 2020.0.

  • xvipdfmx
  • bibtex
  • synctex
  • teckit
  • xetex

CC @Mrmaxmeier @crlf0710 tectonic-typesetting/tectonic-staging#8

There is a lot going on here so I hope that I got it all right. There is
new code relating to CMaps that alters the PDF output, so unfortunately
it does not seem practical to try to make it so that the new code can
pass the existing PDF-output test suite.
@pkgw pkgw marked this pull request as draft October 24, 2020 20:52
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 24, 2020

Current status: new xdvipdmx passes the test suite on my Linux laptop. We'll see how it fares in the CI battery ...

@codecov
Copy link

codecov bot commented Oct 24, 2020

Codecov Report

Merging #666 into master will decrease coverage by 0.00%.
The diff coverage is 22.91%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #666      +/-   ##
==========================================
- Coverage   46.30%   46.29%   -0.01%     
==========================================
  Files         132      133       +1     
  Lines       59137    60331    +1194     
==========================================
+ Hits        27386    27933     +547     
- Misses      31751    32398     +647     
Impacted Files Coverage Δ
src/engines/mod.rs 61.75% <0.00%> (-2.22%) ⬇️
src/errors.rs 41.66% <ø> (ø)
src/io/filesystem.rs 52.94% <0.00%> (-7.06%) ⬇️
src/io/mod.rs 72.63% <0.00%> (-2.37%) ⬇️
src/io/stdstreams.rs 84.90% <0.00%> (-5.10%) ⬇️
src/lib.rs 72.41% <0.00%> (ø)
tectonic/core-bridge.c 81.94% <0.00%> (-1.16%) ⬇️
tectonic/dpx-bmpimage.c 1.81% <0.00%> (-0.02%) ⬇️
tectonic/dpx-cid.c 0.31% <0.00%> (-0.92%) ⬇️
tectonic/dpx-cidtype0.c 0.00% <0.00%> (-0.11%) ⬇️
... and 75 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e4f756...d8dddeb. Read the comment docs.

@Mrmaxmeier
Copy link
Contributor

Mrmaxmeier commented Oct 24, 2020

Hmm, this causes a bunch of SIGABRTs and SIGSEGVs:
https://tt.ente.ninja/#/compare/tectonic@0.1.17/d47454ba362b95bb9191835a27f6e555478e903d

I'll add the tectonic-on-arXiv bot to this repo to simplify tracking down these issues.
EDIT: GitHub doesn't allow me to add the bot. @pkgw You'll need to approve this somewhere 🙂

@pkgw
Copy link
Collaborator Author

pkgw commented Oct 24, 2020

Windows vcpkg builds are current failing due to an unrelated issue: microsoft/vcpkg#13286, addressed by microsoft/vcpkg#13298 .

edit: Looking more closely, the relevant PR seems to be stalled. Not clear what the solution is going to be since it seems that the available versions of Windows just don't support TLS 1.3, which msys2.org is now requiring.

@pkgw
Copy link
Collaborator Author

pkgw commented Oct 24, 2020

@Mrmaxmeier OK, I installed the app on this repo. Do I need to do any further configuration?

Compared to xdvipdfmx this is a *much* smaller delta. Just two small
modifications. Yay.
@Mrmaxmeier
Copy link
Contributor

Mrmaxmeier commented Oct 24, 2020

@Mrmaxmeier OK, I installed the app on this repo. Do I need to do any further configuration?

No. It's supposed to just work ™️ after installing. The whole thing is pretty hacky at the moment though and I'll need to manually add each fork for it to pull the new commits.

EDIT: This doesn't work :/
https://github.com/Mrmaxmeier/tectonic-on-arXiv/blob/4ab2738fb279dcf521b511b8a218bc8b3b5418c9/github-ci/src/index.ts#L19

@pkgw
Copy link
Collaborator Author

pkgw commented Oct 24, 2020

@Mrmaxmeier Should I just rename this branch? That feels like it would be the most TeX-esque solution ...

@Mrmaxmeier
Copy link
Contributor

Mrmaxmeier commented Oct 24, 2020

I'll fix it in the bot. The GitHub API for custom checks has probably improved a bunch since I hacked it together.
It looks like the relevant information is now included in the web-hook payload, so I'll use that instead of searching the git tree locally 👍

Not much new here. Omitted the conversion of various #ifndef/#define
header guards to `#pragma once` due to (entirely hypothetical) concerns
about portability.
Very little new here:

- Instead of exiting when buffers fill up, increase their sizes
  and keep on going.
- Avoid adding redundant filename extensions in some outputs.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 24, 2020

Looks like it's running now, super cool!

As I recall, the concrete values of the enum options here need to stay
in sync with other parts of the engine. I'm not 100%, but can't hurt to
keep the values consistent with what they were before the catch-up.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 25, 2020

@Mrmaxmeier There was one big typo that caused the vast majority of the regressions. Now there are only a handful, but I can't reproduce them locally — my build of the CLI program handles them successfully. Do you have a sense of what the might be going on if your tool reports internal errors that don't reproduce in the CLI?

@Mrmaxmeier
Copy link
Contributor

The CI bot runs release builds in a Ubuntu 20.10 docker. Reproducability issues are probably either different system libraries (It'd be nice statically link harfbuzz) or some form of UB.
I tried reproducing 1702.05081 locally and it segfaults due to an uninitialized(?) char-pointer:

gef➤  
#1  0x00005555557675f0 in save_font () at tectonic/dpx-mpost.c:159
159	    next->font_name = NEW(strlen(current->font_name)+1, char);
gef➤  p current->font_name
$5 = 0x4052000000000000 <error: Cannot access memory at address 0x4052000000000000>

- Tweaks to how font design sizes are mapped to TeX values
- Instead of having a separate `prim_eqtb`, merge it into the main eqtb.
  - This forces us to bump the format file serial number.
- New primitives, mostly unimplemented:
  - \creationdate
  - \elapsedtime
  - \expanded
  - \filedump
  - \filemoddate
  - \filesize
  - \normaldeviate
  - \randomseed
  - \resettimer
  - \setrandomseed
  - \uniformdeviate
- Some uc_hyph tweak deep in the linebreaking algorithm
- print_raw_char when using pseudo selector in print_char
- New magic numbers used in PDF last x/y position accounting,
  instead of cur_[hv]_offset.
- Don't print_raw_char in show_context with trick_buf
- Back up cur_cs in scan_keyword
- Handle XETEX_MATH_GIVEN in scan_something_internal
- If encountering an unexpandable primitive in scan_something_internal,
  try to deal with it.
- Ditto for scan_int.
- Do something different with active chars in str_toks_cat
- Rework how file names are scanned.
- Defend against undefined eTeX registers in main_control
- Back up cur_cs in compare_strings.
Note that the `trip` reference outputs do not need updating if the
`print_char` and `show_context` changes are reverte.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 26, 2020

Almost there! We just need to fix the Arxiv regressions and actually implement the new I/O primitives.

Looking at the diff in the update to TeXLive 2020.0, I think the
upstream changes introduced some bugs.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 26, 2020

@Mrmaxmeier looks like the crashes are now fixed!

This requires us to add a new feature to the I/O API and so involves
touching a lot of code. In particular, I think that it is probably
important for the memory I/O layer to produce meaningful-ish mtimes,
which adds some complexity there.

The `InputFeatures` trait has added a new method, `get_unix_mtime`, to
support this primitive. It has a default implementation to return
`Ok(None)`, which indicates that the input does not have a well-defined
modification time.

**BREAKING CHANGE**

In order to implement file modtimes for the memory I/O provider, we need
to add more data to its data structures, breaking the APIs for users of
its `files` field. This is a very clear signal that this API should be
cleaned up and made more future-prof. That being said, migration should
be pretty easy: instead of `files` containing a bunch of `Vec<u8>`s, it
now contains a bunch of `MemoryFileInfo` structs that contain a
`Vec<u8>` field named `data`. So you just need to add some `.data` field
accessors to existing code.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 27, 2020

@Mrmaxmeier The last few builds generated internal errors for the tectonic-on-arXiv bot :-( Did I break your service?

@pkgw
Copy link
Collaborator Author

pkgw commented Oct 27, 2020

I believe that this is now ready to go! Modulo the fact that we should always have more test coverage and the Arxiv tester seems to be broken at the moment.

Discovered in the course of tracking down issue tectonic-typesetting#478. XeLaTeX works
because it generates a "PK" font file on the fly. We can't do that.
More research needed to understand if we should even try.
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 28, 2020

Reran the TeXLive 2020 bundle tests with this version of the engine. Ten more classes compile now, and there are no new failures, so that's a good sign.

@pkgw pkgw marked this pull request as ready for review October 28, 2020 13:51
@pkgw
Copy link
Collaborator Author

pkgw commented Oct 28, 2020

Going to try to retrigger the tectonic-on-arXiv bot since it looks like it may have been fixed a few hours ago.

@pkgw pkgw closed this Oct 28, 2020
@pkgw pkgw reopened this Oct 28, 2020
@pkgw pkgw merged commit 045dbb5 into tectonic-typesetting:master Oct 28, 2020
@pkgw pkgw deleted the catchup-texlive-2020 branch October 28, 2020 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants