GitHub - tinusg/PHP-Wikipedia-Syntax-Parser: Given raw contents and title of a Wikipedia article, this will output highly useful information in an organized fashion.

Jungle PHP Wikipedia Parser

This is an attempt at extracting useful information out of raw Wikipedia page syntax, written as a portable PHP class.

$wikipedia_syntax_parser = new Jungle_WikiSyntax_Parser($raw_wikipedia_syntax, "George Harrison");

$raw_wikipedia_syntax is the raw Wiki syntax from a database dump or from the Edit textarea of a given page. An example of this syntax is provided in sample_input.txt.

"Goerge Harrison" is a string containing the full Wiki page title (e.g.: George Harrison, Template:Wikipedia Syntax, File:image.png) and is optional (this helps determine the page_type [Main, Template, Special, File, ...])
$parsed_wiki_syntax = $wikipedia_syntax_parser->parse();

Your $parsed_wiki_syntax variable becomes an array with information about the Wiki page itself and useful information extracted from within. An example of this output, after parsing sample_input.txt, can be found in sample_output.txt.

When reading Wiki syntax files from disk, make sure they are properly encoded in UTF-8. To read these correctly encoded files, please use implode(file('WIKI_RAW_SYNTAX.TXT')) as file_get_contents('WIKI_RAW_SYNTAX.TXT') seems to mess up language-specific characters.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
sample_input.txt		sample_input.txt
sample_output.txt		sample_output.txt
wiki_parser.php		wiki_parser.php