go-wikiparse

If you're like me, then you enjoy playing with lots of textual data and scour the internet for sources of it.

mediawiki's dumps are a pretty awesome chunk that's fun to work with.

Installation

go get github.com/dustin/go-wikiparse

Usage

The parser takes any io.Reader as a source assuming it's a complete XML dump and lets you pull wikiparse.Page objects out of it. These typically arrive as bzip2 files, so I make my program open the file and set up a bzip reader over it and all that. But you don't need to do that if you want to read off of stdin. Here's a complete example that emits page titles from a decompressing stream on stdin:

package main

import (
	"fmt"
	"os"

	"github.com/dustin/go-wikiparse"
)

func main() {
	p, err := wikiparse.NewParser(os.Stdin)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error setting up parser", err)
		os.Exit(1)
	}

	for err == nil {
		var page *wikiparse.Page
		page, err = p.Next()
		if err == nil {
			fmt.Println(page.Title)
		}
	}
}

Example invocation:

bzcat enwiki-20120211-pages-articles.xml.bz2 | ./sample

Geographical Information

Because it's interesting to me, I wrote a parser for the wikiproject geographical coordinates that are found on many pages. Use this on the page's content to find out if it's a place or not. Then go there.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.markdown		README.markdown
coords.go		coords.go
coords_test.go		coords_test.go
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum
images.go		images.go
images_test.go		images_test.go
index.go		index.go
index_test.go		index_test.go
links.go		links.go
links_test.go		links_test.go
multi_stream.go		multi_stream.go
parse_test.go		parse_test.go
parser.go		parser.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-wikiparse

Installation

Usage

Geographical Information

About

Releases

Packages

Contributors 5

Languages

License

dustin/go-wikiparse

Folders and files

Latest commit

History

Repository files navigation

go-wikiparse

Installation

Usage

Geographical Information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages