scrapingTool

pythonのbeautifulsoupみたいなのをgoで実現したいと思いました。

package main

import (
    "./Tool"
    "fmt"
)

func main ()  {
    s := `<div><p id = "welcome">ようこそ</p></div>`
    e :=Tool.ParseHTML(s)
    p := Tool.SearchFirst(e,"p","",[]string{})
    fmt.Print(Tool.GetTextNoneTab(p))
    fmt.Println(p.Option["id"])
}

ようこそ
welcome

func ParseHTML(s string) (elem *HTMLParser.Element)

HTMLをパースします。HTMLの木の根の要素のポインタを返します。

func SearchFirst(elem *HTMLParser.Element,tag string,optionName string,optionValue []string) (*HTMLParser.Element)

指定した条件に該当する一番最初の要素のポインタを返します。tag and (option or option) で探してきます。

func SearchAll (elem *HTMLParser.Element,tag string,optionName string,optionValue []string) []*HTMLParser.Element

指定した条件に該当するすべての要素を探して該当する要素のポインタのスライスを返します。

func GetText(elem *HTMLParser.Element) string

指定した要素以下の要素をすべてテキストに変換して適宜インデントを入れて返します。

func GetTextNoneTab(elem *HTMLParser.Element) string

指定した要素以下の要素をすべてテキストに変換して返します。

HTMLParser

一部出てくる正規表現はbeautifulsoupを参考にしています。今の段階ではhtml,body,head,colgroup,captionの省略がある場合は想定外の動きをする可能性があります。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

scrapingTool

HTMLParser

Files

README.md

Latest commit

History

README.md

File metadata and controls

scrapingTool

HTMLParser