tld: incorrectly parsed Wikipedia table #1945

dgw · 2020-09-24T05:14:09Z

One table of the Wikipedia article we use for TLD details in tld gets parsed wrong, and makes things like this happen:

<dgw> ,tld xn--q7ce6a
<SopelTest> [tld] : Lao | Bulgaria: .ລາວ | Bulgarian: Laos | Cyrillic: Lao | bg: Lao | .bg: Not in use | No: .la
<dgw> ,tld ລາວ
<SopelTest> [tld] : Lao | Bulgaria: .ລາວ | Bulgarian: Laos | Cyrillic: Lao | bg: Lao | .bg: Not in use | No: .la

A quick look at the HTML didn't reveal any obvious structural differences between this table and the correctly-parsed others, but something is obviously tripping up my rudimentary HTMLParser-derived class. I'll probably need to spend some quality time with pdb, trying to figure out where in the parsing routine the data gets mangled.

Follow-up to #1939 (comment)

The text was updated successfully, but these errors were encountered:

dgw · 2020-10-23T13:56:43Z

Peppered debug logging through the parser, spun up my test bot, issued a bunch of TLD commands, and… nothing. Can't reproduce this any more. Will leave in the 7.1 milestone for historical purposes, but seems this likely wasn't our problem.

dgw added the Bug Things to squish; generally used for issues label Sep 24, 2020

dgw added this to the 7.1.0 milestone Sep 24, 2020

dgw self-assigned this Sep 24, 2020

dgw added the Declined Requests that will not be implemented for technical or project direction reasons label Oct 23, 2020

dgw closed this as completed Oct 23, 2020

dgw mentioned this issue Oct 23, 2020

tld: small tweak to parser's handling of superscript #1968

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tld: incorrectly parsed Wikipedia table #1945

tld: incorrectly parsed Wikipedia table #1945

dgw commented Sep 24, 2020 •

edited

Loading

dgw commented Oct 23, 2020

tld: incorrectly parsed Wikipedia table #1945

tld: incorrectly parsed Wikipedia table #1945

Comments

dgw commented Sep 24, 2020 • edited Loading

dgw commented Oct 23, 2020

dgw commented Sep 24, 2020 •

edited

Loading