Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails to parse JMDict which has a large !DOCTYPE section #595

Closed
udoprog opened this issue Apr 24, 2023 · 1 comment
Closed

Fails to parse JMDict which has a large !DOCTYPE section #595

udoprog opened this issue Apr 24, 2023 · 1 comment

Comments

@udoprog
Copy link

udoprog commented Apr 24, 2023

File can be downloaded from here from here: https://www.edrdg.org/jmdict/j_jmdict.html.

I'm not entirely sure what's going on, a large chunk of the DOCTYPE section is processed, then it stops in the middle and emits the following error:

Note that I've added the byte offset:

Only Comment (`--`), CDATA (`[CDATA[`) and DOCTYPE (`DOCTYPE`) nodes can start with a '!', but symbol `B` found (at 8466)

This is the following element (inside of the !DOCTYPE section):

<!ELEMENT field (#PCDATA)>
        <!-- Information about the field of application of the entry/sense.
        When absent, general application is implied. Entity coding for
        specific fields of application. -->
@Mingun
Copy link
Collaborator

Mingun commented Apr 24, 2023

Likely duplicate of #590

@Mingun Mingun closed this as not planned Won't fix, can't repro, duplicate, stale Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants