Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate WikiData and Wikipedia #117

Open
upintheairsheep opened this issue Jan 27, 2024 · 1 comment
Open

Investigate WikiData and Wikipedia #117

upintheairsheep opened this issue Jan 27, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@upintheairsheep
Copy link

Wikidata and Wikipedia are both sources of PoIs, and should be investigated for potential merging with Overture.

@ImreSamu
Copy link

Wikidata and Wikipedia are both sources of PoIs, and should be investigated for potential merging with Overture.

@upintheairsheep :

Be cautious with Wikidata's geodata.
I believe the wikidata database has not yet been completely cleaned up everywhere due to the Cebuano import, which caused duplicated Wikidata items for geographic places.

see: https://youtu.be/HaKuKRdJojc?t=161

"Duplicating Everywhere All at Once | Cebuano Wikipedia | Wikimania2023
Alex Lum : 28 Nov 2023
Five years ago, bots created millions of articles on several Wikipedia language editions, notablly cebuano Wikipedia and corresponding Wikidata items, resulting in thousands of duplicated Wikidata items for geographic places. This session will cover how this happened, use data visualisation to show the scope of the issues, and suggest some novel ways of cleaning up Wikidata, Wikipedia and the original data sources.

Five years ago, Lsjbot created millions of articles on several Wikipedia language editions, for which other bots created corresponding Wikidata items. The result has been hundreds of thousands of duplicated items for geographic places on Wikidata.

This session will look at the history of how this happened, use data visualisation to show the scope and scale of the issue, and propose some ways of cleaning up Wikidata, Wikipedia and even the original data sources. It will concentrate primarily on geographic places in Aotearoa New Zealand and some parts of Australia, but will be relevant to other countries where the issue of bot-created duplicates of geographic entities is significant.
"

@danabauer danabauer removed the Admins label Jul 26, 2024
@atiannicelli atiannicelli added the enhancement New feature or request label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants