Wikifier

Wikification is the process of labeling input sentences into concepts from Wikipedia. The repository contains a major script for scraping text from Wikipedia dumps and parsing it into a dataset, the model for annotating sentences and an asynchronous web scraper for generating the dataset dynamically starting from a Wikipedia page used as seed.

Prerequisites

You can install the required dependencies using the Python package manager (pip):

pip3 install aiohttp
pip3 install cchardet
pip3 install aiodns
pip3 install wikipedia
pip3 install requests

Getting Started

First, we need to get the data. Wikiparser is a web scraper that loads dumps from XML files and stores the dataset as a collection of compressed files. You can run the script using the following syntax:

python3 WikiParser.py [OPTION]... URL... [-n NUM]
python3 WikiParser.py [OPTION]... [-n NUM]
python3 WikiParser.py [OPTION]... URL...

Built With

AIOHTTP - Asynchronous HTTP Client used
Beautiful Soup - Library for parsing HTML
mwparserfromhell - A parser for MediaWiki wikicode
wikipedia - A wrapper for the MediaWiki API

Authors

Leonardo Emili - LeonardoEmili

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
input_data		input_data
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikifier

Prerequisites

Getting Started

Built With

Authors

About

Releases 1

Packages

Languages

License

LeonardoEmili/Wikifier

Folders and files

Latest commit

History

Repository files navigation

Wikifier

Prerequisites

Getting Started

Built With

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages