Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Metadata update #344

Merged
merged 6 commits into from
Sep 11, 2023
Merged

Metadata update #344

merged 6 commits into from
Sep 11, 2023

Conversation

BobBorges
Copy link
Collaborator

@BobBorges BobBorges commented Sep 7, 2023

PR includes should includes updated metadata (wikidata_query.py, wikidata_process.py) and redetecting speaker introductions (redetect.py). For now, I pushed only a few protocols that contain the sample of changes introduced by the redetect script. The rest of the protocols and the metadata commited/pushed if the sample looks ok. also updated unit test files in the cases where wiki_ids changed.

@BobBorges
Copy link
Collaborator Author

BobBorges commented Sep 7, 2023

Sampled changes

corpus/protocols/1867/prot-1867--fk--0320.xml

Diff starting from line 2291

@@ -2291,7 +2291,7 @@
           <note xml:id="i-6Lgm7zM3ArtEuPp5zKoGQr" type="speaker">
             Friherre af Ugglas:
           </note>
-          <u who="unknown" xml:id="i-M1HkFioo7uaMpKw2pbMEC3" next="i-PR5aXhLyKmyQToEBWE1p4f">
+          <u who="Q6218798" xml:id="i-M1HkFioo7uaMpKw2pbMEC3" next="i-PR5aXhLyKmyQToEBWE1p4f">
             <seg xml:id="i-NXqx8mfbWPZx9N2rdR1MrJ">
               Jag ber om ursäkt, att jag ytterligare begärt ordet; men med
               anledning af den siste värde talarens yttrande, att då latun,
  • Correct
  • Incorrect

corpus/protocols/1867/prot-1867--fk--0328.xml

Diff starting from line 1398

@@ -1398,7 +1398,7 @@
               dem, åtminstone förrän de blifvit klart och öfvertygande vederlagda.
             </seg>
           </u>
-          <u xml:id="i-TyT3Y8YdwS9b58PvBLgNij" who="unknown" prev="i-FmdPeYxuH52vzWDuwq6r8D" next="i-HsMk84MLGHJ1Sr9BP2FT4x">
+          <u xml:id="i-TyT3Y8YdwS9b58PvBLgNij" who="Q6218798" prev="i-FmdPeYxuH52vzWDuwq6r8D" next="i-HsMk84MLGHJ1Sr9BP2FT4x">
             <seg xml:id="i-9LF9P8xMup9vk9RmfXMVcY">
               Samme talare har jemväl sagt, att det syntes honom som om först,
               efter det den nu begärda undersökningen blifvit verkställd, det
  • Correct
  • Incorrect

corpus/protocols/1867/prot-1867--fk--0509.xml

Diff starting from line 1817

@@ -1817,7 +1817,7 @@
             om rustoch rotehållarnes besvär blifvit afgjord. Skilnaden är,
             som hvar och en torde finna, ganska väsendåtlig.
           </note>
-          <u xml:id="i-WanokFqZM8QK8qyTZVJ4cq" next="i-86zjRBXxq3cLympDygGpVS" who="Q6060104">
+          <u xml:id="i-WanokFqZM8QK8qyTZVJ4cq" next="i-86zjRBXxq3cLympDygGpVS" who="unknown">
             <seg xml:id="i-Q5Mwm3ckzF6PNYBocSXE1N">
               Egentligen har jag begärt ordet med anledning af hvad en talare
               här yttrade om hvad som tilldragit sig i Andra Kammaren. Jag vill
  • Correct
  • Incorrect

corpus/protocols/1868/prot-1868--fk--0310.xml

Diff starting from line 2382

@@ -2382,14 +2382,14 @@
           <note xml:id="i-PrF5UBqdjVq1ernC1dsoFP" type="speaker">
             Grefve af Ugglas:
           </note>
-          <u who="unknown" xml:id="i-H3mviftqi7nW1LeK2XVqwc" next="i-S4p1XVZYsvkBFtwarf7n4r">
+          <u who="Q6218798" xml:id="i-H3mviftqi7nW1LeK2XVqwc" next="i-S4p1XVZYsvkBFtwarf7n4r">
             <seg xml:id="i-HGoiTi2dPmpCdF2BVo9u7k">
               På sätt en föregående talare upplyst, har detta anslag ännu icke
               kunnat utgå, emedan det först blifvit beviljadt för detta år,
               och man känner således icke verkningarne af detsamma.
             </seg>
           </u>
-          <u xml:id="i-S4p1XVZYsvkBFtwarf7n4r" who="unknown" prev="i-H3mviftqi7nW1LeK2XVqwc" next="i-2btXy3g1DFpQCpCfk94mdJ">
+          <u xml:id="i-S4p1XVZYsvkBFtwarf7n4r" who="Q6218798" prev="i-H3mviftqi7nW1LeK2XVqwc" next="i-2btXy3g1DFpQCpCfk94mdJ">
             <seg xml:id="i-B7t462s4MVnRmtiK7ZsDbp">
               Jag får dessutom säga, att om det ock kan vara angeläget att
               göra besparingar, der så ske kan, så finnes en omständighet af
  • Correct
  • Incorrect

corpus/protocols/1869/prot-1869--fk--0309.xml

Diff starting from line 3197

@@ -3197,7 +3197,7 @@
           <note xml:id="i-YCLr7GbwBJ7cwQdn9b6tEa" type="speaker">
             Grefve af Ugglas:
           </note>
-          <u who="unknown" xml:id="i-LNZYzhE6o5vwC2j93y5FtJ">
+          <u who="Q6218798" xml:id="i-LNZYzhE6o5vwC2j93y5FtJ">
             <seg xml:id="i-48aSseSFCYeDdxjFLaNukm">
               I afseende å Stats-Utskottets förfarande i denna del får jag
               hänvisa till Kongl. Maj:ts nådiga Proposition angående upphörandet
  • Correct
  • Incorrect

corpus/protocols/1869/prot-1869--fk--0428.xml

Diff starting from line 4090

@@ -4090,7 +4090,7 @@
               maximibeloppet af lönerna och pensionerna inom hvarje grad.
             </seg>
           </u>
-          <u xml:id="i-SasctKVyRyugCrC5E6xtH6" who="unknown" prev="i-QyvTdsxfLLjD3ysc1dutQd" next="i-MGeuCSuBNJC7G1TYPfwwnD">
+          <u xml:id="i-SasctKVyRyugCrC5E6xtH6" who="Q6218798" prev="i-QyvTdsxfLLjD3ysc1dutQd" next="i-MGeuCSuBNJC7G1TYPfwwnD">
             <seg xml:id="i-F9eftt8c5aKuJVzDXBHSX6">
               När dessutom ganska allvarsamma anmärkningar kunna göras emot
               denna normalstat, som, just derigenom att fasta, till siffran
  • Correct
  • Incorrect

corpus/protocols/1869/prot-1869--fk--0503.xml

Diff starting from line 1397

@@ -1397,7 +1397,7 @@
               få något.
             </seg>
           </u>
-          <u xml:id="i-YZN36W45wnidcp6u77CiMn" prev="i-KyURxY4STj8ezjonHXs5zE" who="Q6060104" next="i-JG3yqAVDigZChhnF2Kdo9">
+          <u xml:id="i-YZN36W45wnidcp6u77CiMn" prev="i-KyURxY4STj8ezjonHXs5zE" who="unknown" next="i-JG3yqAVDigZChhnF2Kdo9">
             <seg xml:id="i-ALSMPRFf2sfTSd3ip3xViD">
               . Grefve Hamilton antydde vidare, att det skulle vara bättre,
               om man kunde ge ut millioner under årets statsregleringsperiod,
  • Correct
  • Incorrect

corpus/protocols/1870/prot-1870--fk--0427.xml

Diff starting from line 4171

@@ -4171,7 +4171,7 @@
           <note xml:id="i-G988CBKKuqUzALoD9UBk68" type="speaker">
             Grefve af Ugglas:
           </note>
-          <u who="unknown" xml:id="i-EzNKVULPGFZDY57dZAHXqd">
+          <u who="Q6218798" xml:id="i-EzNKVULPGFZDY57dZAHXqd">
             <seg xml:id="i-D6CNhu5u56tgnpnNehYMXM">
               Jag ber blott få fästa uppmärksamheten derpå, att det i förevarande
               punkt framställda förslag är af temliger stor betydenhet, ty om
  • Correct
  • Incorrect

corpus/protocols/1870/prot-1870--fk--0507.xml

Diff starting from line 424

@@ -424,7 +424,7 @@
               ej gerna med att göra denne förargad.
             </seg>
           </u>
-          <u xml:id="i-BTKf3bir1SfTXxPo4qYiaA" who="unknown" prev="i-YW8stTR1dEXBXVDu9fVhAd" next="i-WNvzPBr7RZBqHjP5UaoSwg">
+          <u xml:id="i-BTKf3bir1SfTXxPo4qYiaA" who="Q6218798" prev="i-YW8stTR1dEXBXVDu9fVhAd" next="i-WNvzPBr7RZBqHjP5UaoSwg">
             <seg xml:id="i-PzRziksJc7YHfjmq9iaiAP">
               Jag erkänner således, att förslaget möjligen kan komma att medföra
               åtskilliga olägenheter; men å andra sidan finner jag det iunebära
  • Correct
  • Incorrect

corpus/protocols/1871/prot-1871--ak--0413.xml

Diff starting from line 5363

@@ -5363,7 +5363,7 @@
           <note xml:id="i-MxtTc6bQEnz91htbTXYi37" type="speaker">
             Herr J. Rundbäck:
           </note>
-          <u who="Q6083441" xml:id="i-6G4jSPJuT25QCrSQ7FAGtP" next="i-Xohc4yZyHdgTT5Hxqn858J">
+          <u who="unknown" xml:id="i-6G4jSPJuT25QCrSQ7FAGtP" next="i-Xohc4yZyHdgTT5Hxqn858J">
             <seg xml:id="i-5ei1K2HwEMoTphbSfHkMSP">
               Det kan sannerligen synas. dristigt att med min svaga farkost
               våga sig ut under en så stark sjögång, som den hvilken nu tyckes
  • Correct
  • Incorrect

corpus/protocols/1871/prot-1871--fk--0304.xml

Diff starting from line 3744

@@ -3744,7 +3744,7 @@
           <note xml:id="i-ADNgXcksQoeyMrmQRq8kuN" type="speaker">
             Grefve af Ugglas:
           </note>
-          <u who="unknown" xml:id="i-WpE4VERKXdpkcUEoJsY4Bf" next="i-EBRA2mMxbWtKpYu6opPHAK">
+          <u who="Q6218798" xml:id="i-WpE4VERKXdpkcUEoJsY4Bf" next="i-EBRA2mMxbWtKpYu6opPHAK">
             <seg xml:id="i-4eGdzsYKyS7zo2jgvn8XsR">
               För min del erkänner jag villigt det stora och trängande behofvet
               för Svea artilleriregemente att kunna sammanföras inom en kasern,
  • Correct
  • Incorrect

corpus/protocols/1876/prot-1876--ak--43.xml

Diff starting from line 405

@@ -405,7 +405,7 @@
               är inne.
             </seg>
           </u>
-          <u xml:id="i-JrQFYzPbYCpDc6ScQ7JpLt" who="unknown" prev="i-P6W3hbaXHeCCWzynhGj9uC" next="i-UfcofkBPjpT68xqGhpndRw">
+          <u xml:id="i-JrQFYzPbYCpDc6ScQ7JpLt" who="Q6014586" prev="i-P6W3hbaXHeCCWzynhGj9uC" next="i-UfcofkBPjpT68xqGhpndRw">
             <seg xml:id="i-FbxCYLJHAkipVdfK8GnzP4">
               Under sådana förhållanden synes mig icke vara klokt att helt
               och hållet afslå den summa, hvarom här är fråga, och jag tror
  • Correct
  • Incorrect

corpus/protocols/1878/prot-1878--ak--35.xml

Diff starting from line 4578

@@ -4578,7 +4578,7 @@
               beslut komme att understiga det belopp, hvarmed den förut utgått.
             </seg>
           </u>
-          <u xml:id="i-5q9GqR8HP7Apyd3sNYHGsp" who="unknown" prev="i-3DHhcquH5nATg99Ara9sRL" next="i-UwEgin6Dgf2YU7UMmSxZEp">
+          <u xml:id="i-5q9GqR8HP7Apyd3sNYHGsp" who="Q6014586" prev="i-3DHhcquH5nATg99Ara9sRL" next="i-UwEgin6Dgf2YU7UMmSxZEp">
             <seg xml:id="i-Cp4AMiE6Txh9YYpCiRtMhU">
               Beträffande femte punkten i Utskottets förslag, mot hvilken jag
               reserverat mig, kan jag instämma med den förste ärade talaren
  • Correct
  • Incorrect

corpus/protocols/1879/prot-1879--ak--59.xml

Diff starting from line 2570

@@ -2570,7 +2570,7 @@
           <note xml:id="i-M6t5Lzmpxhx5zzEnvRMAkS" type="speaker">
             Friherre Nordenfalk, som anförde:
           </note>
-          <u who="unknown" xml:id="i-8RZBHHFKapEAtYRwy98H5J" next="i-EcwHoQrPkPmP4m5r8vEoU6">
+          <u who="Q6014586" xml:id="i-8RZBHHFKapEAtYRwy98H5J" next="i-EcwHoQrPkPmP4m5r8vEoU6">
             <seg xml:id="i-Ah7q7m8mhiQAC2XskU5uiG">
               Då motionären i sitt anförande icke framstält något bestämdt
               yrkande, kan jag fatta mig helt kort; och jag har begärt ordet
  • Correct
  • Incorrect

corpus/protocols/1880/prot-1880--fk--32.xml

Diff starting from line 2266

@@ -2266,13 +2266,13 @@
             att använda be- ” fintliga öfverskottsmedel och har derigenom
             icke blott rättighet,
           </note>
-          <u xml:id="i-J6qK9XaVeHgcmvfniaNE6z" who="Q6228276" prev="i-Ttj14KnsS6FkmTsf13DmDG" next="i-XDUXtwMUAQeJMWsBXhCFbt">
+          <u xml:id="i-J6qK9XaVeHgcmvfniaNE6z" who="unknown" prev="i-Ttj14KnsS6FkmTsf13DmDG" next="i-XDUXtwMUAQeJMWsBXhCFbt">
             <seg xml:id="i-WX56hTyUBD1xj7XH9nP3RL">
               utan också förpligtelse att dertill använda dessa medel. Det
               kan
             </seg>
           </u>
-          <u xml:id="i-XDUXtwMUAQeJMWsBXhCFbt" who="Q6228276" prev="i-J6qK9XaVeHgcmvfniaNE6z" next="i-9jwM5f939s8nQDzPazZWx6">
+          <u xml:id="i-XDUXtwMUAQeJMWsBXhCFbt" who="unknown" prev="i-J6qK9XaVeHgcmvfniaNE6z" next="i-9jwM5f939s8nQDzPazZWx6">
             <seg xml:id="i-E8EeZtf9mtogMQwRwN78fP">
               således nu icke komna i fråga att de användas till andra ända-
             </seg>
  • Correct
  • Incorrect

corpus/protocols/1882/prot-1882--ak--47.xml

Diff starting from line 1065

@@ -1065,7 +1065,7 @@
               sitt yrke.
             </seg>
           </u>
-          <u xml:id="i-5PJaQHbVok3mLKyDoTBz1A" prev="i-FHk43TjzmnFR16jZyEAjLn" who="Q6093486" next="i-61Rfdh7dEhQ2b4vGYh8Thc">
+          <u xml:id="i-5PJaQHbVok3mLKyDoTBz1A" prev="i-FHk43TjzmnFR16jZyEAjLn" who="unknown" next="i-61Rfdh7dEhQ2b4vGYh8Thc">
             <seg xml:id="i-S9XpjHfXG5qLNrjnhNSuDR">
               Talaren på stockholmsbänken ansåg att det icke vore någon olycka,
               om läderberedningen inom landet upphörde; det vore lika bra, menade
  • Correct
  • Incorrect

corpus/protocols/1885/prot-1885--ak--58.xml

Diff starting from line 2180

@@ -2180,7 +2180,7 @@
               kom och sederna försvunno.
             </seg>
           </u>
-          <u xml:id="i-8i8AYsGn3xgM4CAvUQPV1R" prev="i-A88fU6LSHX1ByxuBk4AE3Q" who="Q6083441" next="i-TK8gS7Fv5zaBBHn2RHhu3N">
+          <u xml:id="i-8i8AYsGn3xgM4CAvUQPV1R" prev="i-A88fU6LSHX1ByxuBk4AE3Q" who="unknown" next="i-TK8gS7Fv5zaBBHn2RHhu3N">
             <seg xml:id="i-WqkigfvRdHLrd5HqzJzdVH">
               Herr Wieselgren nämnde, att något ovilkorligt bränvinsförbud
               icke kunde stadgas, så länge städernas hamnar voro öppna för import
  • Correct
  • Incorrect

corpus/protocols/1887/prot-1887-janmar-ak--13.xml

Diff starting from line 1553

@@ -1553,7 +1553,7 @@
           <note xml:id="i-FGChmDrMo77kLk7LY7yoo9" type="date">
             Tisdagen den 1 Mars, f. m. 19 N:o 13.
           </note>
-          <u xml:id="i-FXXawouud7WFgkiKQKF1r4" prev="i-WfmLABtQfyYDENWAUwyt81" who="Q6199107">
+          <u xml:id="i-FXXawouud7WFgkiKQKF1r4" prev="i-WfmLABtQfyYDENWAUwyt81" who="Q6198243">
             <seg xml:id="i-JdwsPa7UtJ1CNApzBDQ8LZ">
               tionen öfver till dem. Vi veta också att La Plata-staterna gifvit
               Ang. införande oss en ny verld så stor, att, om den hade Tysklands
  • Correct
  • Incorrect

corpus/protocols/1888/prot-1888--ak--20.xml

Diff starting from line 3159

@@ -3159,7 +3159,7 @@
           <note xml:id="i-HLvnNskAZQSVj77ZoPwDqZ" type="date">
             Onsdagen den 14 Mars, f. m. 39
           </note>
-          <u xml:id="i-DH6zrhmWjVHNmPkedTF5Df" prev="i-9Y1MhEiN5fFieB8tWmgBeb" who="Q6199107" next="i-24KXHqkiByPpnQdo1oe5mD">
+          <u xml:id="i-DH6zrhmWjVHNmPkedTF5Df" prev="i-9Y1MhEiN5fFieB8tWmgBeb" who="Q6198243" next="i-24KXHqkiByPpnQdo1oe5mD">
             <seg xml:id="i-4pRXJwz3wkknU17nuLzCtu">
               att postverket är ett vidlyftigt verk, som under sin nuvarande
               chef arbetat sig fram på ett förtjenstfullt sätt och såsom få
  • Correct
  • Incorrect

corpus/protocols/1888/prot-1888--ak--38.xml

Diff starting from line 3295

@@ -3295,13 +3295,13 @@
               som man kan bjuda dem på. Det är min uppfattning.
             </seg>
           </u>
-          <u xml:id="i-EzBEzsGgLsfs35EgywRDsp" prev="i-MRqqCkhdsfuvYQHLAhr2ch" who="Q6199107" next="i-VFqriB9hJFRMByfJiURFWQ">
+          <u xml:id="i-EzBEzsGgLsfs35EgywRDsp" prev="i-MRqqCkhdsfuvYQHLAhr2ch" who="Q6198243" next="i-VFqriB9hJFRMByfJiURFWQ">
             <seg xml:id="i-CwFDkq7v3b5AgMAFhFMfoS">
               Jag skall icke uppehålla tiden längre, utan yrkar, som sagdt,
               bifall till utskottets förslag.
             </seg>
           </u>
-          <u xml:id="i-VFqriB9hJFRMByfJiURFWQ" prev="i-EzBEzsGgLsfs35EgywRDsp" who="Q6199107">
+          <u xml:id="i-VFqriB9hJFRMByfJiURFWQ" prev="i-EzBEzsGgLsfs35EgywRDsp" who="Q6198243">
             <seg xml:id="i-DPEwZAk7pnRzCjMHAZkWND">
               Herr Bergendahl: Jag vill endast tillkännagifva, att jag kommer
               att rösta för utskottets förslag. Det är sagdt, att kaffet icke
  • Correct
  • Incorrect

corpus/protocols/1901/prot-1901--fk--22.xml

Diff starting from line 2209

@@ -2209,13 +2209,13 @@
           <note xml:id="i-YDM9NNm6iqL4ircR1BFUsB" type="speaker">
             Herr Lundberg:
           </note>
-          <u who="Q53286" xml:id="i-M7nfz7fSJ1kspsh76UTAkD" next="i-4fHmBqYRAQPTyLYvu3uagY">
+          <u who="Q6150867" xml:id="i-M7nfz7fSJ1kspsh76UTAkD" next="i-4fHmBqYRAQPTyLYvu3uagY">
             <seg xml:id="i-L9hA81fb3apxQ66jfi1Tfz">
               Jag tackar vördsamt för de svar, jag erhållit, men jag kan icke
               annat än beklaga, att svaren äro hvarandra diametralt motsatta.
             </seg>
           </u>
-          <u xml:id="i-4fHmBqYRAQPTyLYvu3uagY" prev="i-M7nfz7fSJ1kspsh76UTAkD" who="Q53286" next="i-R2bcdFtMkjKtJcd4tnRwUe">
+          <u xml:id="i-4fHmBqYRAQPTyLYvu3uagY" prev="i-M7nfz7fSJ1kspsh76UTAkD" who="Q6150867" next="i-R2bcdFtMkjKtJcd4tnRwUe">
             <seg xml:id="i-Rvc9D7BXZbyVFjCAuCGojM">
               I särskilda utskottet framstälde jag såsom min tro, att bristen
               skulle täckas på sätt herr chefen för justitiedepartementet nämnde,
  • Correct
  • Incorrect

corpus/protocols/1903/prot-1903--fk--15.xml

Diff starting from line 1486

@@ -1486,7 +1486,7 @@
           <note xml:id="i-NDhKHS6s2jmy5KcdxaH7mf">
             strömdrag. (Forts.)
           </note>
-          <u xml:id="i-4sj3gHDemMTiHkizfCiLSS" prev="i-KgqvmkdapeiEcEmH7M9k1C" who="Q53286">
+          <u xml:id="i-4sj3gHDemMTiHkizfCiLSS" prev="i-KgqvmkdapeiEcEmH7M9k1C" who="Q6150867">
             <seg xml:id="i-Mq4bZGvM7ec7x4jYpfZLvp">
               stiftning är behöflig. Jag medgifver, att det är mycket svårt
               att afgöra, huru den ersättning, som en hvar vattenverksägare
  • Correct
  • Incorrect

corpus/protocols/1913/prot-1913--ak--55.xml

Diff starting from line 537

@@ -537,7 +537,7 @@
               att stöta på utomordentliga svårigheter.
             </seg>
           </u>
-          <u xml:id="i-WjvTPgSJR1zKfWS2S3KpKV" prev="i-VgYAWnsvoFvemC9AV2eQdo" who="Q111805044" next="i-SGZgrH5BCMd2ikBS3hnyQr">
+          <u xml:id="i-WjvTPgSJR1zKfWS2S3KpKV" prev="i-VgYAWnsvoFvemC9AV2eQdo" who="Q5798813" next="i-SGZgrH5BCMd2ikBS3hnyQr">
             <seg xml:id="i-E7jYVsEZPgAUa6asMHDg7X">
               Alla de svårigheter, som möta vid försöken att reformera apoteksväsendet,
               leda i mycket sitt upphov från den omständig-
  • Correct
  • Incorrect

Diff starting from line 646

@@ -646,7 +646,7 @@
           <note xml:id="i-4qsVB51romkQq3zzsq6tT4">
             Ang. apoteksvarustadga.
           </note>
-          <u xml:id="i-SL84n9f6vSdv8Pe8Fpi7J" prev="i-CLhXqxXaWgv8aboPZ9dCEu" who="Q111805044" next="i-6q9xDpA4thsYyefaEwhgk8">
+          <u xml:id="i-SL84n9f6vSdv8Pe8Fpi7J" prev="i-CLhXqxXaWgv8aboPZ9dCEu" who="Q5798813" next="i-6q9xDpA4thsYyefaEwhgk8">
             <seg xml:id="i-M9Tkkb2Hx3CwfTs3v3CxVG">
               (Forts.)
             </seg>
  • Correct
  • Incorrect

corpus/protocols/1921/prot-1921--fk--20.xml

Diff starting from line 3778

@@ -3778,7 +3778,7 @@
           <note type="date" xml:id="i-PC6ayTajEctK8xxcmrCPYy">
             Lördagen den 2 april e. m. 49 Nr 20.
           </note>
-          <u who="unknown" xml:id="i-080ecbfc8b8ed394-2" prev="i-080ecbfc8b8ed394-1">
+          <u who="Q1253889" xml:id="i-080ecbfc8b8ed394-2" prev="i-080ecbfc8b8ed394-1">
             <seg xml:id="i-DUiVSbSFrqPWZELdY6kaEM">
               teserna, för att inte tala om kolonialfolkens befrielse och arbetarnas
               so-Om Sveriges ciala välfärds säkerställande. Jag undrar, om t.
  • Correct
  • Incorrect

corpus/protocols/1940/prot-1940--ak--24.xml

Diff starting from line 2644

@@ -2644,7 +2644,7 @@
           <note type="speaker" xml:id="i-KAp5JU3kD9yy2yf8t1WRwS">
             Herr Pettersson i Norregård:
           </note>
-          <u who="Q6318945" xml:id="i-d446f18f1079242e-0" next="i-d446f18f1079242e-1">
+          <u who="unknown" xml:id="i-d446f18f1079242e-0" next="i-d446f18f1079242e-1">
             <seg xml:id="i-YYty1wPRSbU8hfqxNpYxoU">
               Herr talman! Då jag tillsammans med några andra ledamöter av
               denna kammare har väckt en motion, nr 207, i vilken yrkas, att
  • Correct
  • Incorrect

corpus/protocols/1943/prot-1943--ak--12.xml

Diff starting from line 7870

@@ -7870,14 +7870,14 @@
           <note type="date" xml:id="i-9PPC1LhqXZd77n7rRUzcs7">
             Onsdagen den 31 mars 1943 e. m. Nr 12. 93
           </note>
-          <u xml:id="i-BwfUcPG6v2mAmypM6UzECH" prev="i-7bb3fc339ddbcc3c-2" who="Q6042992" next="i-7bb3fc339ddbcc3c-3">
+          <u xml:id="i-BwfUcPG6v2mAmypM6UzECH" prev="i-7bb3fc339ddbcc3c-2" who="Q6042858" next="i-7bb3fc339ddbcc3c-3">
             <seg xml:id="i-UaC8QbewhqcQ1Wi1MK13zV">
               Fortsatt giltighet av gällande skattegruppering m. m. (Forts.)
               skulle en lösning av frågan i dess helhet kunna fördröjas, och
               man skulle inte kunna få ett förslag redan till 1945 års riksdag.
             </seg>
           </u>
-          <u who="Q6042992" xml:id="i-7bb3fc339ddbcc3c-3" prev="i-BwfUcPG6v2mAmypM6UzECH">
+          <u who="Q6042858" xml:id="i-7bb3fc339ddbcc3c-3" prev="i-BwfUcPG6v2mAmypM6UzECH">
             <seg xml:id="i-KtnY9bkDeyGQhKKAYJiPZS">
               Vidare vilja vi utesluta sista meningen i första stycket på sid.
               8, vilken är av följande lydelse: »Däremot anser utskottet tveksamt,
  • Correct
  • Incorrect

corpus/protocols/1943/prot-1943--ak--27.xml

Diff starting from line 3729

@@ -3729,14 +3729,14 @@
               exemplet.
             </seg>
           </u>
-          <u xml:id="i-HvhqmRgu6qZgjFkWkVu4a9" who="unknown" prev="i-a6a2334c9ab850ff-2" next="i-a6a2334c9ab850ff-3">
+          <u xml:id="i-HvhqmRgu6qZgjFkWkVu4a9" who="Q5936890" prev="i-a6a2334c9ab850ff-2" next="i-a6a2334c9ab850ff-3">
             <seg xml:id="i-W3iHjNfPbD3fcsc73dTkWy">
               Utskottet har emellertid hemställt, att motionen om rättelse
               av dessa förhållanden icke måtte föranleda till någon riksdagens
               åtgärd.
             </seg>
           </u>
-          <u who="unknown" xml:id="i-a6a2334c9ab850ff-3" prev="i-HvhqmRgu6qZgjFkWkVu4a9" next="i-a6a2334c9ab850ff-5">
+          <u who="Q5936890" xml:id="i-a6a2334c9ab850ff-3" prev="i-HvhqmRgu6qZgjFkWkVu4a9" next="i-a6a2334c9ab850ff-5">
             <seg xml:id="i-En2KAp4cmC4gARm6W6KUAz">
               Tyngdpunkten i utskottets motivering för dess ståndpunkt, angiven
               på sid. 8 i utlåtandet, synes vara dels att de lägre ersättningarna
  • Correct
  • Incorrect

corpus/protocols/1944/prot-1944--ak--24.xml

Diff starting from line 3549

@@ -3549,7 +3549,7 @@
           <note type="speaker" xml:id="i-G7ijoLofo9BtQ1zx9GhzWy">
             Fru Rydh Munck af Rosenschöld:
           </note>
-          <u who="unknown" xml:id="i-eaa4b8e38b6f6470-4">
+          <u who="Q517947" xml:id="i-eaa4b8e38b6f6470-4">
             <seg xml:id="i-FK7y5QL1sJUQJ3qMG92Xha">
               Herr talman! Trots att debatten varat länge nog, kan jag icke
               underlåta att med anledning av motionen nr 305 och med understrykande
  • Correct
  • Incorrect

corpus/protocols/1945/prot-1945--ak--16.xml

Diff starting from line 8119

@@ -8119,7 +8119,7 @@
           <note type="speaker" xml:id="i-QscbdGuSoUkPnNigUnHD8">
             Herr Persson i Stockholm:
           </note>
-          <u who="unknown" xml:id="i-5e559120bdb1d035-10">
+          <u who="Q6042858" xml:id="i-5e559120bdb1d035-10">
             <seg xml:id="i-GHWJHVtxCXtJQwqEaAhA2P">
               Herr talman! På fröken Anderssons fråga, varpå jag stöder min
               förmodan, att någon prisstegring inte skulle behöva, inträda i
  • Correct
  • Incorrect

corpus/protocols/1949/prot-1949--ak--23.xml

Diff starting from line 6409

@@ -6409,12 +6409,12 @@
               kan straff-
             </seg>
           </u>
-          <u xml:id="i-Qgy99hXGwLGqkJ86XstLKq" who="unknown" prev="i-02dcb9a5a941beac-3" next="i-02dcb9a5a941beac-4">
+          <u xml:id="i-Qgy99hXGwLGqkJ86XstLKq" who="Q6236669" prev="i-02dcb9a5a941beac-3" next="i-02dcb9a5a941beac-4">
             <seg xml:id="i-T17Bt4gCxdoyNLMCPuo9R6">
               föreläggandet äga vid en eventuell verkställighet?
             </seg>
           </u>
-          <u who="unknown" xml:id="i-02dcb9a5a941beac-4" prev="i-Qgy99hXGwLGqkJ86XstLKq" next="i-02dcb9a5a941beac-5">
+          <u who="Q6236669" xml:id="i-02dcb9a5a941beac-4" prev="i-Qgy99hXGwLGqkJ86XstLKq" next="i-02dcb9a5a941beac-5">
             <seg xml:id="i-Vw1pAJLjzXMGQGmB5teqA3">
               Jag har liksom många före mig ansett det nödvändigt att bringa
               dessa nu alltmer framträdande, för en demokratisk rättsstat främmande
  • Correct
  • Incorrect

corpus/protocols/1950/prot-1950--ak--17.xml

Diff starting from line 6316

@@ -6316,7 +6316,7 @@
           <note type="speaker" xml:id="i-cz3ekoKtgPHbJ5ndEnr5E">
             Herr LARSSON i Karlstad:
           </note>
-          <u who="Q5619070" xml:id="i-8e5972a606bcc32f-4" next="i-5TSCKeDav6qYvnarpX4A2u">
+          <u who="Q5936890" xml:id="i-8e5972a606bcc32f-4" next="i-5TSCKeDav6qYvnarpX4A2u">
             <seg xml:id="i-GtHHwYZuacEXFWfdhMCYBB">
               Herr talman! Jag förstår att Kungl. Maj:t måste tänka även på
               andra, men det var en del saker som jag har litet svårt att förstå.
  • Correct
  • Incorrect

corpus/protocols/1950/prot-1950--ak--18.xml

Diff starting from line 6414

@@ -6414,7 +6414,7 @@
           <note type="speaker" xml:id="i-9Tm5CL1GPj61keq2T5rYoe">
             Herr PERSSON i Växjö (kort genmäle):
           </note>
-          <u who="unknown" xml:id="i-72e1b21c261253a6-64" next="i-aa3a9c080f317ac4-5">
+          <u who="Q6042858" xml:id="i-72e1b21c261253a6-64" next="i-aa3a9c080f317ac4-5">
             <seg xml:id="i-FjGeee3EmN9My2qnCzUr98">
               Herr talman! Jag vill med mycket stor tillfredsställelse konstatera,
               att man från folkpartihåll framfört de synpunkter, som herr Larsson
  • Correct
  • Incorrect

corpus/protocols/1951/prot-1951--ak--24.xml

Diff starting from line 5336

@@ -5336,7 +5336,7 @@
           <note xml:id="i-NPaC9Atj7q6E4f97CifzQd">
             Vägtrafikförordning m. m.
           </note>
-          <u who="unknown" xml:id="i-7bea252b0996de65-1" prev="i-7bea252b0996de65-0" next="i-7bea252b0996de65-2">
+          <u who="Q5789652" xml:id="i-7bea252b0996de65-1" prev="i-7bea252b0996de65-0" next="i-7bea252b0996de65-2">
             <seg xml:id="i-6Vq1XMLDGNvVn2ec4LMZgD">
               bruk på åker, har blivit ersatt med en växel, som tillåter en
               något högre hastighet, vanligtvis 15—17 km i timmen.
  • Correct
  • Incorrect

corpus/protocols/1952/prot-1952--ak--14.xml

Diff starting from line 17312

@@ -17312,7 +17312,7 @@
           <note type="speaker" xml:id="i-XSCyBjmpHBWTtWeReAy4qV">
             Herr NILSSON i Varuträsk:
           </note>
-          <u who="unknown" xml:id="i-4c80a63f68cc3a2a-14">
+          <u who="Q6011123" xml:id="i-4c80a63f68cc3a2a-14">
             <seg xml:id="i-6cRdTtQb2w3RFUZdmeZB4K">
               Herr talman! Jag tyckte ait herr Malmborg i Skövde anförde mycket
               tilltalande uttryck när han yttrade sig i detta ärende. Det skall
  • Correct
  • Incorrect

corpus/protocols/1958/prot-1958-a-fk--11.xml

Diff starting from line 9107

@@ -9107,7 +9107,7 @@
           <note type="speaker" xml:id="i-CCNBVUPpE8EAXtQqvFcPWT">
             Herr KRÖGEL (s):
           </note>
-          <u who="unknown" xml:id="i-bb41388651146491-17">
+          <u who="Q5924912" xml:id="i-bb41388651146491-17">
             <seg xml:id="i-ECnX1RbicsVtiVqegZQHHp">
               Herr talman! Jag vill för min del gärna instämma i att det anslag,
               som är anvisat för det nu ifrågavarande ändamålet, är mycket litet.
  • Correct
  • Incorrect

corpus/protocols/1959/prot-1959--ak--6.xml

Diff starting from line 6976

@@ -6976,7 +6976,7 @@
               på sin lösning, snart också skall få sin lösning.
             </seg>
           </u>
-          <u xml:id="i-7vemtmUJPLKbfNWMeVVhtZ" prev="i-805568b8fb675308-5" who="Q6012223">
+          <u xml:id="i-7vemtmUJPLKbfNWMeVVhtZ" prev="i-805568b8fb675308-5" who="Q6010461">
             <seg xml:id="i-UYNwsnKvWtL9EYSbMLfvez">
               Jag yrkar bifall till reservationen.
             </seg>
  • Correct
  • Incorrect

corpus/protocols/1960/prot-1960--ak--5.xml

Diff starting from line 3870

@@ -3870,7 +3870,7 @@
               få framställa följande fråga:
             </seg>
           </u>
-          <u xml:id="i-MjWFmqjtTcmWpi79DWQtPM" who="unknown" prev="i-dc305bb5b66b1020-6">
+          <u xml:id="i-MjWFmqjtTcmWpi79DWQtPM" who="Q5653837" prev="i-dc305bb5b66b1020-6">
             <seg xml:id="i-VHD7JTAwYXdRk8HbnmGMYb">
               Är statsrådet beredd att vidtaga sådana åtgärder att betyg från
               folkhögskola, som utfärdats enligt nu gällande folkhögskolestadga,
  • Correct
  • Incorrect

corpus/protocols/1962/prot-1962--ak--13.xml

Diff starting from line 13332

@@ -13332,7 +13332,7 @@
           <note type="speaker" xml:id="i-XqpovbPHQvaYPdeGqkS5gT">
             Herr NILSSON i Gävle (k):
           </note>
-          <u who="Q6012223" xml:id="i-54c1d69b4d84606a-12" next="i-54c1d69b4d84606a-13">
+          <u who="Q6010461" xml:id="i-54c1d69b4d84606a-12" next="i-54c1d69b4d84606a-13">
             <seg xml:id="i-ByAdcFPLgQBZKUgQLugkEo">
               Herr talman! Statsutskottets utlåtande beträffande bostadsförsörjningen
               innehåller mycket av värde. Jag uppfattar detta som en följdverkan
  • Correct
  • Incorrect

corpus/protocols/1962/prot-1962--ak--20.xml

Diff starting from line 7716

@@ -7716,7 +7716,7 @@
           <note type="date" xml:id="i-GHSiJeSBxgagaCW46Fkkh6">
             Onsdagen den 16 maj 1962 fm.
           </note>
-          <u who="Q6012223" xml:id="i-4a7e67be432f64f0-18" prev="i-2041d445a14a597a-0">
+          <u who="unknown" xml:id="i-4a7e67be432f64f0-18" prev="i-2041d445a14a597a-0">
             <seg xml:id="i-7nFxU7powhpgKx7zJkTM5R">
               de här nämnda. Herr Christenson anser att utredningens förslag
               bör antas. Jag vill då läsa upp en passus ur det remissvar över
  • Correct
  • Incorrect

corpus/protocols/1962/prot-1962--fk--15.xml

Diff starting from line 13635

@@ -13635,7 +13635,7 @@
             om förbättrade kommunikationer mellan Jämtland och övriga Norrland,
             erhöll ordet och anförde:
           </note>
-          <u who="unknown" xml:id="i-0d33c7f17574b665-73" next="i-So2yxmk3RbNxJa3CFhwpTP">
+          <u who="Q6240086" xml:id="i-0d33c7f17574b665-73" next="i-So2yxmk3RbNxJa3CFhwpTP">
             <seg xml:id="i-4FqZsv2anSm3p1Rxv4eVyn">
               Herr talman! I en interpellation har herr Widén frågat mig om
               jag är beredd medverka till en förbättring av Jämtlands förbindelser
  • Correct
  • Incorrect

corpus/protocols/1967/prot-1967--ak--13.xml

Diff starting from line 1469

@@ -1469,7 +1469,7 @@
               följande fråga:
             </seg>
           </u>
-          <u xml:id="i-CYNbkHtJH1D7tYMvDzb6Hp" prev="i-f38849d1a302daef-8" who="Q6012223">
+          <u xml:id="i-CYNbkHtJH1D7tYMvDzb6Hp" prev="i-f38849d1a302daef-8" who="Q6010461">
             <seg xml:id="i-SRLLZ5kkpU2iu4DicoMLVS">
               Anser statsrådet gällande bestämmelser och instruktioner för
               handläggning av ärenden rörande vapenfri tjänst vara tillfredsställande
  • Correct
  • Incorrect

corpus/protocols/1967/prot-1967--fk--7.xml

Diff starting from line 1573

@@ -1573,7 +1573,7 @@
           <note xml:id="i-X8Bu9zujEQMhYuykZLD9cK">
             Om åtgärder i syfte att fördjupa och stärka det svenska folkstyret
           </note>
-          <u xml:id="i-Ey972u346QMdyiLrSQ68cF" prev="i-0b628cd00c26be59-8" who="Q116040430">
+          <u xml:id="i-Ey972u346QMdyiLrSQ68cF" prev="i-0b628cd00c26be59-8" who="Q6197962">
             <seg xml:id="i-5v2ThrJwuWcRZZgxaexraJ">
               skapet i en oganisation — vilket utgör ett litet antal personer
               — till ett politiskt parti ansluter ett så stort antal medlemmar
  • Correct
  • Incorrect

corpus/protocols/1968/prot-1968--fk--19.xml

Diff starting from line 740

@@ -740,7 +740,7 @@
           <note xml:id="i-QDAKW2R1bq2NJZMB3tcn9i">
             Ang. arbetsmarknadspolitiken
           </note>
-          <u who="Q116040430" xml:id="i-d1fc164d76975114-12" prev="i-d1fc164d76975114-11" next="i-d1fc164d76975114-13">
+          <u who="Q6197962" xml:id="i-d1fc164d76975114-12" prev="i-d1fc164d76975114-11" next="i-d1fc164d76975114-13">
             <seg xml:id="i-3TsmAqYzbNUqGREcoFGasn">
               som visserligen icke vore handikappade, men vilkas arbetsplacering
               i öppna marknaden ändå på grund av nedsatt arbetsförmåga bedömdes
  • Correct
  • Incorrect

corpus/protocols/197980/prot-197980--165.xml

Diff starting from line 2207

@@ -2207,7 +2207,7 @@
           <note xml:id="i-Kb7YiCKqrATNawgo2bgeFw">
             168
           </note>
-          <u who="Q6011035" xml:id="i-17df5c0e5300e2b6-11" prev="i-17df5c0e5300e2b6-9">
+          <u who="unknown" xml:id="i-17df5c0e5300e2b6-11" prev="i-17df5c0e5300e2b6-9">
             <seg xml:id="i-Lu2iHJjgFbZDMk3GouqKbV">
               är vi centerpartister och framför allt då vår duktige industriminister
               Åsling som har rätt.
  • Correct
  • Incorrect

corpus/protocols/198485/prot-198485--32.xml

Diff starting from line 8476

@@ -8476,7 +8476,7 @@
           <note type="speaker" xml:id="i-7gymCeuKg2MVFvCRZRuzUc">
             Anf. 933 GUNNAR OLSSON (s) replik:
           </note>
-          <u who="unknown" xml:id="i-5bb435e740870b6a-9">
+          <u who="Q97061982" xml:id="i-5bb435e740870b6a-9">
             <seg xml:id="i-KqPEzkR7jpnBXnotnzBc4S">
               Herr talman! Trots vad Sven Eric Lorentzon säger vågar jag ändå
               tro att även många moderater är av den uppfattningen att en modernisering
  • Correct
  • Incorrect

@BobBorges
Copy link
Collaborator Author

Looking back at the last time we did this wiki_ids -> unknown were automatically tagged as incorrect.

@BobBorges
Copy link
Collaborator Author

FYI about 20% of changes in the whole set of protocols contain 'who="unknown"': 2994 of 14535.

@ninpnin
Copy link
Collaborator

ninpnin commented Sep 7, 2023

31/46 correct.

LGTM

@ninpnin
Copy link
Collaborator

ninpnin commented Sep 7, 2023

Once the test is fixed, that is.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 7, 2023

31/46 seem poor to me? Its just 67% correct?

@BobBorges
Copy link
Collaborator Author

@ninpnin Were most of the incorrect ones in the sample due to q_id --> unknown? Mostly pre1900?

@BobBorges
Copy link
Collaborator Author

The test fails because of an ID that's a redirect. Maybe it was manually added because it shows up only in the alias file. I'll look into this tomorrow.
image

@BobBorges
Copy link
Collaborator Author

BobBorges commented Sep 7, 2023

@MansMeg

31/46 seem poor to me? Its just 67% correct?

Seems like a failing grade, but a little perspective:

  • the whole diff from which the sample was drawn was 14,535 changes
  • assuming redetect.py does what we want it to do, it only touches who= attributes
  • there are more than 5 million who attrs in all the protocols, so even if every change is wrong, it amounts to 0.2% of all cases
  • 41% of changes go from unknown --> wiki_id, so this is only potential improvement (changing wrong to wrong doesn't degrade our quality)
  • 20% of the changes go from wiki_id --> unknown : even if we assume there's no good reason for it, 20% of 0.2% is a pretty good error rate
  • I don't have a good way of measuring how often wiki_id --> wiki_id incorrectly (I could draw a sample of these), but in total it accounts for ca 39% of 0.2% (≈ 0.11%) of the corpus.

image

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 7, 2023

Yes. Of course. But we need to make a decision if we should merge this revision or not. If all (or just a majority of the edits) its a net loss of quality and then we should not merge the PR.

Lets discuss this tomorrow. I dont really follow your reasoning.

I think we should start to compute the three different types according to the process paper.
incorrect to correct
incorrect to incorrect
correct to incorrect

Is it possible to get this for the sample?

@BobBorges
Copy link
Collaborator Author

The unit test fails for a less scary reason than I thought. Redetect updates wiki_id when it changes: wikidata --> our metadata --> protocols. Here I only pushed a sample of protocols, but the unit test still runs on all the protocols -- predictably it fails on protocols involving changed wiki_ids that haven't been committed / pushed yet. I suspect when all the changes are pushed, the unit test won't fail anymore. Here's the diff, showing the problem wiki_id/protocol pointed to in the unit test.
image

@BobBorges
Copy link
Collaborator Author

If I counted the errors in the sample correctly, 9 are unknown --> id; 6 are id --> unknown.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Ok, but what is:
incorrect to incorrect
correct to incorrect
?

@BobBorges
Copy link
Collaborator Author

incorrect to incorrect
correct to incorrect

same numbers:
incorrect to incorrect: 9
correct to incorrect: 6

@BobBorges
Copy link
Collaborator Author

... so if I understand our earlier discussion, we should be concerned about 6 of the 45 in the sample. by the sample, we would estimate 13% of the diffs may be introducing errors.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Yes! What is the gause of these 6 correct to incorrect?

@BobBorges
Copy link
Collaborator Author

the gause

What's that?

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Sorry! The cause. :)

@BobBorges
Copy link
Collaborator Author

I can't say specifically for those cases, but it seems like the matching algorithm + additions to metadata. Those 6 involved 3 names, so it doesn't seem to be random error.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

So my question is:
Why do these errors exists/show up?

They are introduced now compared to the previous version. Is it the two problem Väinö list on slack? Sequential use in intros and grefve/friherre is used as identifiers? Or are there other reasons?

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Ping @BobBorges

@BobBorges
Copy link
Collaborator Author

I don't know why they're marked incorrectly.

I'm working on getting the unit tests to pass locally. Wiki IDs changed and this needs to get mapped to all the unit test files. I can also see that some iorter were changed in wikidata that no longer matches our manual files.

@BobBorges
Copy link
Collaborator Author

Spelling changes in wikidata are failing our unittest in two cases:

image

I don't find either place in either spelling variant as a convincing city on google maps.

@BobBorges
Copy link
Collaborator Author

BobBorges commented Sep 8, 2023

Semsholm and Säbrå are the spellings in the bio books, but was edited after I added these places to wikidata.

edit: I can re-edit these on wikidata, but I suggest for right now, I just change them back to the bio-book spelling manually in the metadata files so that the unit tests pass :|

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Ok. So these 6 errors come from changes in Wikidata? Did someone change the iorts to an incorrect name? Could you point to a Q number for this?

@BobBorges
Copy link
Collaborator Author

They were reverted by Magnus.

Q5792849 & Q5990912

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

So is this the cause of the 6 errors?

Edit: No this is unit tests that are failing. This is a very good sign! We actually capture this incorrect edit made by @salgo60 !

Have you checked the cause of these 6 errors in the QC sample?

@BobBorges
Copy link
Collaborator Author

BobBorges commented Sep 8, 2023

No. It's not.

One ID is not in the diff at all.

The other was changed (not in our sample) to another id from a guy with the same name -- Helmer Molander --> Seth Molander in 1941--ak--37 -- both were alive at the righ time, but I'm still looking which guy it should really be

edit: the algorithm picked and changed to to correct Molander in this case.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Ok. Just so I can follow:

There are two things left in this PR:

  1. The unit tests are failing (partly due to the incorrect updates).
  2. You are working on trying to identify the cause for the 6 errors?

@BobBorges
Copy link
Collaborator Author

  1. The unit test fails because the spelling was edited on two places. (I proposed editing the metadata files this time around to force it to pass).
  2. I haven't even gotten to this yet -- I need to write better procedures for requery b/c it cannot cost so much labor every time to figure out and fix things that are going wrong. I will still look into it -- I have the diff saved -- do you think it's necessary to find the causes of these specific instances before merging this branch to dev?

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

There is no real hurry with this. So better to fix it at the core than to stress it.

  1. Just fix it on wikidata and update. Then wikidata becomes more correct. No point of doing hacks.
  2. Yes. We want to understand why we get the errors. Is it a trivial bug that we introduced now, or is it more the problem with the heuristics? When we know the main causes we check if we have it as issues and then merge.

It seems like updating the wikidata will result in more and more of these problems (thats why we have the unit tests). So we should probably expect it to be more work in the future with these updates.

@BobBorges
Copy link
Collaborator Author

Just fix it on wikidata and update. Then wikidata becomes more correct. No point of doing hacks.

The only point is that it costs a lot of time to start over again.

Yes. We want to understand why we get the errors. Is it a trivial bug that we introduced now, or is it more the problem with the heuristics? When we know the main causes we check if we have it as issues and then merge.

errors in the sample Q6060104: since the last version of metadata, another guy with a similar name chamber/time/and location got another entry on wikidata. computer couldn't decide. I'm suspicious -- they're too similar, but will check the bio-books... looking at the others

@BobBorges
Copy link
Collaborator Author

another error involving Q6083441 -- similarly, another guy with the same surname and chamber w/ mandate period. New medatdata has more rows for these two

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

  1. Ok. It is hard for me to follow the exact details. If I understand this, the mapping algorithm was run on this incorrect data? And then, if you would update this with the correct values, would you need to rerun the algorithm? So the lesson learned here is hence is that we should have run the iort unit test before we ran the mapping algorithm? I tend to agree that we should think more about the updating process so we avoid this. For now I'm happy to fix that manually in this PR to fix it - but then this would need to be fixed before the next update? otherwise we will end up here again, right?

  2. Great! The two first examples seem to be a problem that people have added the same people twice, but they have not been merged at wikidata. So it looks like a wikidata update problem. Looking forward to the last 4.

@BobBorges
Copy link
Collaborator Author

Looked at all 6 of our problems -- all due to heuristics. Similar names / chamber / mandate periods, and rows added to the metadata.

@BobBorges
Copy link
Collaborator Author

the mapping algorithm was run on this incorrect data?

incorrect in the sense that it doesn't match the bio books and our unit test files (maybe Magnus had a reason for reverting exactly these two). We'll see if my edits get reverted again.

should have run the iort unit test before we ran the mapping algorithm?

I will think about how we can do this in a more logical way so we don't repeat the same set of problems next time.

For now, I'm going to both edit wikidata and our metadata files so the test passes. I'm not going to requery everything and redetect again, but we'll make sure we do it in the right order next time around.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Great! Now I'm happy with both the quality control and the unit test. Solving it by both updating wikidata and our file is a good quick fix.

I can add the issue as I see it, and you can add your perspectives.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

Sorry for being the annoying reviewer here.

@BobBorges
Copy link
Collaborator Author

Sorry for being the annoying reviewer here.
no worries :D

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 8, 2023

I have now added two issues that describes the problems we have identified. Feel free to check if they are correct.

@BobBorges
Copy link
Collaborator Author

Did someone change the merge branches? If not, I didn't mean to request a merge directly to main.

@MansMeg
Copy link
Collaborator

MansMeg commented Sep 10, 2023

Oh. I missed to check that. Think its easy to change.

@ninpnin ninpnin changed the base branch from main to dev September 11, 2023 07:54
@ninpnin
Copy link
Collaborator

ninpnin commented Sep 11, 2023

Did you make manual changes to the metadata @BobBorges ? Will we run into the same problems next time we requery ?

Otherwise, it looks like we know exactly what the problems are and I can merge.

@BobBorges
Copy link
Collaborator Author

I did, but I also made corresponding edits on wikidata, so unless they get reverted again, we won't have the same problem.

In general, I think there's a good chance that we will run into this kind of error (not necessarily with the same MPs), so I will formalize (in the form of a readme) a procedure to run checks in the correct order next time around, so we can find/correct bad edits to wikidata before we run redetect and start committing stuff to the repo.

@ninpnin
Copy link
Collaborator

ninpnin commented Sep 11, 2023

How do you do this without running redetect? That's how we discovered the errors this time anyway..

@BobBorges
Copy link
Collaborator Author

It concerned the i-ort of two people. I made the edit in the metadata files and on wikidata. Give me 2 minutes, I'll check if these two were involved in any of the protocol edits in this pr.

@BobBorges
Copy link
Collaborator Author

One of the two was re-tagged (correctly) as someone else, so there's no need to re run redetect.

@ninpnin ninpnin merged commit 253fc8b into dev Sep 11, 2023
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants