Skip to content

Commit

Permalink
Merge pull request #32 from MikeMeliz/coveragereport
Browse files Browse the repository at this point in the history
AddCoverageReports: Tox & SonarQube Cloud
  • Loading branch information
MikeMeliz authored Nov 3, 2024
2 parents 568a859 + ca78258 commit 5f9ed77
Show file tree
Hide file tree
Showing 5 changed files with 79 additions and 24 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Build
on:
push:
branches:
- master
pull_request:
types: [opened, synchronize, reopened]
jobs:
sonarcloud:
name: SonarCloud
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Install tox and any other packages
run: pip install tox
- name: Run tox
run: tox -e py
- name: SonarCloud Scan
uses: SonarSource/sonarcloud-github-action@master
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
49 changes: 26 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,9 @@
[![Release][release-version-shield]][releases-link]
[![Last Commit][last-commit-shield]][commit-link]
![Python][python-version-shield]
[![Quality Gate Status][quality-gate-shield]][quality-gate-link]
[![license][license-shield]][license-link]

</div>

### What makes it simple and easy to use?
Expand Down Expand Up @@ -43,7 +44,7 @@ $ torcrawl -v -u http://www.github.com/ -c -d 2 -p 2
```

> [!TIP]
> Crawling is not illegal, but violating copyright *is*. It’s always best to double check a website’s T&C before start crawling them. Some websites set up what’s called `robots.txt` to tell crawlers not to visit those pages.
> Crawling is not illegal, but violating copyright *is*. It’s always best to double-check a website’s T&C before start crawling them. Some websites set up what’s called `robots.txt` to tell crawlers not to visit those pages.
> <br>This crawler *will* allow you to go around this, but we always *recommend* respecting robots.txt.
<hr>
Expand All @@ -62,34 +63,34 @@ $ torcrawl -v -u http://www.github.com/ -c -d 2 -p 2
1. **Debian/Ubuntu**: <br>
`apt-get install tor`<br>
`service tor start`
3. **Windows**: Download [`tor.exe`][tor-download], and:<br>
2. **Windows**: Download [`tor.exe`][tor-download], and:<br>
`tor.exe --service install`<br>
`tor.exe --service start`
5. **MacOS**: <br>
3. **MacOS**: <br>
`brew install tor`<br>
`brew services start tor`
6. For different distros, visit:<br>
4. For different distros, visit:<br>
[TOR Setup Documentation][tor-docs]

## Arguments
**arg** | **Long** | **Description**
----|------|------------
**General**: | |
-h |--help| Help message
-v |--verbose| Show more information about the progress
-u |--url *.onion| URL of Webpage to crawl or extract
-w |--without| Without using TOR Network
-f |--folder| The directory which will contain the generated files
**Extract**: | |
-e |--extract| Extract page's code to terminal or file (Default: Terminal)
-i |--input filename| Input file with URL(s) (separated by line)
-o |--output [filename]| Output page(s) to file(s) (for one page)
-y |--yara | Perform yara keyword search:<br>h = search entire html object,<br>t = search only text
**Crawl**: | |
-c |--crawl| Crawl website (Default output on website/links.txt)
-d |--depth| Set depth of crawler's travel (Default: 1)
-p |--pause| Seconds of pause between requests (Default: 0)
-l |--log| Log file with visited URLs and their response code
| **arg** | **Long** | **Description** |
|--------------|---------------------|----------------------------------------------------------------------------------------|
| **General**: | | |
| -h | --help | Help message |
| -v | --verbose | Show more information about the progress |
| -u | --url *.onion | URL of Webpage to crawl or extract |
| -w | --without | Without using TOR Network |
| -f | --folder | The directory which will contain the generated files |
| **Extract**: | | |
| -e | --extract | Extract page's code to terminal or file (Default: Terminal) |
| -i | --input filename | Input file with URL(s) (separated by line) |
| -o | --output [filename] | Output page(s) to file(s) (for one page) |
| -y | --yara | Perform yara keyword search:<br>h = search entire html object,<br>t = search only text |
| **Crawl**: | | |
| -c | --crawl | Crawl website (Default output on website/links.txt) |
| -d | --depth | Set depth of crawler's travel (Default: 1) |
| -p | --pause | Seconds of pause between requests (Default: 0) |
| -l | --log | Log file with visited URLs and their response code |

## Usage & Examples

Expand Down Expand Up @@ -240,6 +241,8 @@ v1.2:
[last-commit-shield]: https://img.shields.io/github/last-commit/MikeMeliz/TorCrawl.py?logo=github&label=Last%20Commit&style=plastic
[release-version-shield]: https://img.shields.io/github/v/release/MikeMeliz/TorCrawl.py?logo=github&label=Release&style=plastic
[python-version-shield]: https://img.shields.io/badge/Python-v3-green.svg?style=plastic&logo=python&label=Python
[quality-gate-shield]: https://sonarcloud.io/api/project_badges/measure?project=MikeMeliz_TorCrawl.py&metric=alert_status
[quality-gate-link]: https://sonarcloud.io/summary/new_code?id=MikeMeliz_TorCrawl.py
[license-shield]: https://img.shields.io/github/license/MikeMeliz/TorCrawl.py.svg?style=plastic&logo=gnu&label=License
[commit-link]: https://github.com/MikeMeliz/TorCrawl.py/commits/main
[releases-link]: https://github.com/MikeMeliz/TorCrawl.py/releases
Expand Down
2 changes: 1 addition & 1 deletion modules/tests/test_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ def setUp(cls) -> None:
def tearDownClass(cls):
""" Test Suite Teardown. """
# Remove test folder.
os.rmdir('torcrawl')
os.rmdir('output/torcrawl')

def test_url_canon_001(self):
""" url_canon unit test.
Expand Down
15 changes: 15 additions & 0 deletions sonar-project.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
sonar.projectKey=MikeMeliz_TorCrawl.py
sonar.organization=mikemeliz

# This is the name and version displayed in the SonarCloud UI.
sonar.projectName=TorCrawl.py
sonar.projectVersion=1.0

# Path is relative to the sonar-project.properties file. Replace "\" by "/" on Windows.
sonar.sources=.

# Encoding of the source code. Default is default system encoding
sonar.sourceEncoding=UTF-8

# Adding the coverage analysis path
sonar.python.coverage.reportPaths=coverage.xml
10 changes: 10 additions & 0 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[tox]
envlist = py39
skipsdist = True

[testenv]
deps =
-r{toxinidir}/requirements.txt
pytest
pytest-cov
commands = pytest --cov=. --cov-report=xml --cov-config=tox.ini --cov-branch

0 comments on commit 5f9ed77

Please sign in to comment.