GitHub - donwilson/PHP-Wikipedia-Syntax-Parser: Given raw contents and title of a Wikipedia article, this will output highly useful information in an organized fashion.

JungleDB PHP Wikipedia Parser

This is an attempt at extracting useful information out of raw Wikipedia page syntax, written as a portable PHP class. Originally written for JungleDB. Released the most recently updated (2015-02-13) version of the wiki_parser.php script, which is a significant improvement over the last copy.

I don't expect to update this repository in the forseeable future.

How to use

$wikipedia_syntax_parser = new Jungle_WikiSyntax_Parser($raw_wikipedia_syntax, "George Harrison");

$raw_wikipedia_syntax is the raw Wiki syntax from a database dump or from the Edit textarea of a given page. An example of this syntax is provided in sample_input.txt.

"Goerge Harrison" is a string containing the full Wiki page title (e.g.: George Harrison, Template:Wikipedia Syntax, File:image.png) and is optional (this helps determine the page_type [Main, Template, Special, File, ...])
$parsed_wiki_syntax = $wikipedia_syntax_parser->parse();

Your $parsed_wiki_syntax variable becomes an array with information about the Wiki page itself and useful information extracted from within. An example of this output (using the old_version/wiki_parser.php), after parsing sample_input.txt, can be found in sample_output.txt. No preview of the latest revision is available but it is vastly improved and worth the effort to get it working on your end.

Notes

When reading Wiki syntax files from disk, make sure they are properly encoded in UTF-8. To read these correctly encoded files, please use implode(file('WIKI_RAW_SYNTAX.TXT')) as file_get_contents('WIKI_RAW_SYNTAX.TXT') seems to mess up language-specific characters.

Usage

If you make use of all or any portion of this code, please add an attribution linking to this github repo.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
old_version		old_version
README.md		README.md
sample_input.txt		sample_input.txt
utils.php		utils.php
wiki_parser.201207301142CST.php		wiki_parser.201207301142CST.php
wiki_parser.201208081058CST.php		wiki_parser.201208081058CST.php
wiki_parser.201208081059CST.php		wiki_parser.201208081059CST.php
wiki_parser.201208150011CST.php		wiki_parser.201208150011CST.php
wiki_parser.201208211159CST.php		wiki_parser.201208211159CST.php
wiki_parser.201304201348CST.php		wiki_parser.201304201348CST.php
wiki_parser.201308131405CST.php		wiki_parser.201308131405CST.php
wiki_parser.201308231749CST.php		wiki_parser.201308231749CST.php
wiki_parser.php		wiki_parser.php
wiki_table_parser.php		wiki_table_parser.php
wikisyntax_sample_table.txt		wikisyntax_sample_table.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JungleDB PHP Wikipedia Parser

How to use

Notes

Usage

About

Releases

Packages

Languages

donwilson/PHP-Wikipedia-Syntax-Parser

Folders and files

Latest commit

History

Repository files navigation

JungleDB PHP Wikipedia Parser

How to use

Notes

Usage

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages