This repository has been archived by the owner on Nov 11, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
IndexError: string index out of range during segmentation #8
Comments
Added a try/except to work around this issue. It shows that the bug is caused by malformed annotations (see below). The fix simply ignores the malformed lines, which might be the only appropriate solution. Segmentation training [###.............................] 35/320: 0:00:24 remaining...
16563.xml: problem parsing <author><surname>Weber <author><given-names>Max </surname></author></given-names></author>(</author><year>1988</year><author>c/ Orig. </author><year>1920</ye
ar><author>) <title>Gesammelte Aufsätze zur Religionssoziologie I</author>. <other>Tübingen</title>.</other>
Segmentation training [#######.........................] 71/320: 0:00:23 remaining...
20786.xml: problem parsing <author><surname>Schnell</surname>,<given-names> R.</given-names></author>, <year>1997</year>: <title>Nonresponse in Bevölkerungsumfragen. Ausmaß, Entwicklun
g und Ursachen</title>. <other>Opladen<other>: <publisher>Leske + Budrich.</publisher></other></other>
Segmentation training [#######.........................] 77/320: 0:00:14 remaining...
21690.xml: problem parsing <source>Working Brief</source> <volume>15</volume>: <author><given-names>Diego</given-names> <surname>Compagna / <author><given-names>Stefan</surname> <surna
me>Derpmann</surname></author></given-names></author> / <author><given-names>Kathrin</given-names> <surname>Mauz</surname></author> / <author><given-names>Karen</given-names> <surname>
Shire</surname></author> (<year>2009</year>): <title>Förderung des Wissenstransfers für eine aktive Mitgestaltung des Pflegesektors durch Mikrosystemtechnik (WiMi-Care)</title>, <sourc
e>Working Brief</source> <volume>15</volume>: <title>Die Einstellung von Pflegekräften gegenüber technischen Neuerungen</title>. In: <url>http://www.wimi-care.de/outputs.html#Briefs</u
rl> (letzter Abruf: <other>02.12.2009</other>).
Segmentation training [##################..............] 188/320: 0:00:13 remaining...
36684.xml: problem parsing <title>Stellungnahmen geladener Sachverständiger vor dem Bundestag zum Thema Fiskalpakt und ESM</title>, <other>7.5.</other><year>2012</year>: <url><www. bun
destag.de/bundestag/ausschuesse17/a08/anhoerungen/fiskalpakt_und_esm/stellungnahmen/index.html/></url>.
Segmentation training [######################..........] 225/320: 0:00:10 remaining...
40723.xml: problem parsing <author><surname>Koskinas</surname></author>, <author><given-names>Ioannis </given-names></author>(<year>2014</year>),<title> The Only Choice Left for Afghan
istan</title>, online: <url>htp://southasia.foreign-policy.com/posts/2014/09/11/the_only_choice_ left_for_afghanistan></url> (<other>27 October 2014</other>).
Segmentation training [##########################......] 260/320: 0:00:05 remaining...
45841.xml: problem parsing <editor>Folha Online</editor> (<year>2012</year>), <url><www1.folha.uol.com.br/fsp/brasil/></url> (<other>12. November 2012</other>).
45841.xml: problem parsing <author><surname>Patarra</surname>, <given-names>Ivo</given-names></author> (<year>2010</year>), <title>O chefe</title>, online: <url><www.escandalodomensala
o.com.br></url> (<other>2. November 2012</other>).
45841.xml: problem parsing <editor>Veja</editor> (<year>2012</year>), <title>O Julgamento do Mensalão. A hora da Sentença</title>, online: <url><htp://veja.abril.com.br/o-jul - gamento
-do-mensalao/hora-da-sentenca/></url> (<other>13. November 2012</other>). |
cboulanger
added a commit
that referenced
this issue
Feb 19, 2022
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: