WebPalm

Take a look

What is webpalm?

WebPalm is a command-line tool that enables users to traverse a website and generate a tree of all its webpages and their links. It uses a recursive approach to enter each link found on a webpage and continues to do so until all levels have been explored. In addition to generating a site map, WebPalm can extract data from the body of each page using regular expressions and save the results in a file. This feature can be useful for web scraping or extracting specific information.

⚠️ DISCLAIMER ⚠️:

this tool is intended to be used for legal purposes only, and you are responsible for your actions.

Features

Generate a palm tree struct of web urls
Dump data from body pages using regular expressions
Multi-threading and parallelism
Export the web-tree to json, xml, txt
Fast and easy to use
Colorized output and error handling

Installation

From source

git clone https://github.com/Malwarize/webpalm.git
cd webpalm
go build -o webpalm && ./webpalm

From binary

you can download the binary from Releases

wget https://github.com/Malwarize/webpalm/releases/download/v0.0.1/webpalm_x.x.x_os_arch.tar.gz
tar -xvf webpalm_x.x.x_os_arch.tar.gz
cd webpalm
./webpalm

if you have go installed

go install github.com/Malwarize/webpalm/v2@latest

Usage

webpalm -h

Flags:
  -d, --delay int                delay (ms) between each request / ex: -d 200
  -x, --exclude-code ints        status codes to exclude / ex : -x 404,500
  -h, --help                     help for webpalm
  -i, --include strings          include only domains / ex : -i google.com,facebook.com
  -l, --level int                level of palming / ex: -l2
  -o, --output string            file to export the result (f.json, f.xml, f.txt) / ex: -o result.json
  -p, --proxy string             proxy to use / ex: -p http://proxy.com:8080
      --regexes stringToString   regexes to match in each page / ex: --regexes comments="\<\!--.*?-->" (default [])
  -t, --timeout int              timeout in seconds / ex: -t 10 (default 10)
  -u, --url string               target url / ex: -u https://google.com
  -a, --user-agent string        user agent to use / ex: -a chrome, firefox, safari, ie, edge, opera, android, ios, custom
  -v, --version                  version for webpalm
  -w, --worker int               number of workers for multi-threading  / ex: -w 10

Examples

get the palm tree of a website:

webpalm -u https://google.com -l1
# or
webpalm -u https://google.com -l1 -w 3 # 3 workers (multi-threading)

get palm tree of a website and exclude some status codes:

webpalm -u https://google.com -l1 -x 404,500

get the palm tree of a website and dump data from the body of the pages:

webpalm -u https://google.com -l1 --regexes comments="\<\!--.*?-->" -o result.json

this will dump the comments of each page in the body of the page

webpalm -u https://google.com -l1 --regexes comments="\<\!--.*?-->",emails="([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+)"

this will dump the comments and emails of each page in the body of the page

get the palm tree of a website and export it to xml,txt:

webpalm -u https://google.com -l3 -o result.xml

webpalm -u https://google.com -l2 -o result.txt

get the palm tree of a website and include only some domains:

webpalm -u https://google.com -l2 -i google.com,facebook.com

this will crawl only the urls that contains google.com or facebook.com

threading and concurrency

get the palm tree of a website using 100 workers:

webpalm -u https://google.com -l2 -w 100

Regexes Examples

Regex	Pattern
emails	([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+)
comments	\<\!--.*?-->
tokens	[a-zA-Z0-9]{32}
password	\bpassword\b.{0,10}

Don't forget escaping the regexes if needed

Tests

You can run unit tests to gain more confidence in the enhancements or changes to the code by running go test -v ./...

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. you can also contact me on discord:xorbit.

Powered By Malwarize

Join to Discord

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
cmd		cmd
core		core
shared		shared
webtree		webtree
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebPalm

Take a look

What is webpalm?

⚠️ DISCLAIMER ⚠️:

Features

Installation

From source

From binary

if you have go installed

Usage

Examples

get the palm tree of a website:

get palm tree of a website and exclude some status codes:

get the palm tree of a website and dump data from the body of the pages:

get the palm tree of a website and export it to xml,txt:

get the palm tree of a website and include only some domains:

threading and concurrency

get the palm tree of a website using 100 workers:

Regexes Examples

Tests

Contributing

Powered By Malwarize

About

Releases 24

Contributors 8

Languages

License

Malwarize/webpalm

Folders and files

Latest commit

History

Repository files navigation

WebPalm

Take a look

What is webpalm?

⚠️ DISCLAIMER ⚠️:

Features

Installation

From source

From binary

if you have go installed

Usage

Examples

get the palm tree of a website:

get palm tree of a website and exclude some status codes:

get the palm tree of a website and dump data from the body of the pages:

get the palm tree of a website and export it to xml,txt:

get the palm tree of a website and include only some domains:

threading and concurrency

get the palm tree of a website using 100 workers:

Regexes Examples

Tests

Contributing

Powered By Malwarize

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 24

Contributors 8

Languages