confluenceScraper

Overview

ConfluenceScraper is a Python script designed to search and extract information from Confluence pages and attachments. It supports searching for keywords in page content, binary attachments (like PDFs, DOCX, XLSX, TXT files), and images using Optical Character Recognition (OCR). The script allows you to specify search terms, control the number of threads, limit results, and more.

This can come in handy during penetration tests where you've obtained access to an organization's Confluence instance. Finding passwords and other sensitive information in Confluence is a frequent occurence.

Disclaimer

Use at your own risk
You may need to alter code for optimal usage.
Code is possibly unfinished in places.

Features

Search for keywords in Confluence page content.
Extract information from binary attachments (PDF, DOCX, XLSX, TXT).
Perform OCR on image attachments to extract text (PNG, JPG, JPEG, TIFF, BMP, GIF).
Limit the number of images processed.
Multithreaded for faster processing.
Save extracted information to a file.

Requirements

Python 3.7+
Confluence API Access Token with appropriate permissions.

Python Packages

All required Python packages are listed in requirements.txt.

External Dependencies

Tesseract OCR Engine (for OCR functionality)
- macOS: Install via Homebrew
```
brew install tesseract
```
- Ubuntu/Debian:
```
sudo apt-get install tesseract-ocr
```
- Windows:
  - Download and install from Tesseract OCR GitHub.

Installation

Required Arguments

-c, --url <TARGET_URL>: Mandatory. The base URL of the Confluence instance (e.g., https://confluence.company.com).

Optional Arguments

-p, --accesstoken <API_ACCESS_TOKEN>: API access token for authentication, if needed.
-w, --wordlist <WORDLIST_FILE_PATH>: Path to a file containing keywords to search for. One keyword per line.
-k, --keyword <SINGLE_KEYWORD>: Specify a single keyword for the search.
-s, --search-content: If set, the script will extract and save content around the found keyword on each page.
-n, --num-chars <NUM_CHARS>: Number of characters to extract after the found keyword. Default is 10 characters.
-t, --threads <THREADS>: Number of threads to use. Default is 10.
-l, --limit <LIMIT>: Limit the number of search results to process. By default, all results are processed.

And more!

Some Example Commands

Search using a wordlist:

python3 ConfluenceScraper.py -c https://confluence.company.com -p myApiToken123 -w keywords.txt -t 10 -s -n 200

Search using a single keyword:

python3 ConfluenceScraper.py -c https://confluence.company.com -p myApiToken123 -k "search-term" -s

Limit the number of search results:

python3 ConfluenceScraper.py -c https://confluence.company.com -p myApiToken123 -w keywords.txt -l 50

Run with default settings but extract surrounding content:

python3 ConfluenceScraper.py -c https://confluence.company.com -p myApiToken123 -k "search-term" -s -n 150

Run with OCR in images and max image search number:

python3 ConfluenceScraper.py -c https://confluence.company.com -p myApiToken123 -k "search-term" -s -i -m 5

Output

Results will be saved in the ./loot directory.
- If -s is used, the script will save both the URLs and extracted content surrounding the found keyword.
- If -s is not used, only the page URLs and titles will be saved.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
confluenceScraper.py		confluenceScraper.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

confluenceScraper

Overview

Disclaimer

Features

Requirements

Python Packages

External Dependencies

Installation

Required Arguments

Optional Arguments

Some Example Commands

Output

About

Releases

Packages

Languages

pizza-power/confluenceScraper

Folders and files

Latest commit

History

Repository files navigation

confluenceScraper

Overview

Disclaimer

Features

Requirements

Python Packages

External Dependencies

Installation

Required Arguments

Optional Arguments

Some Example Commands

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages