My URL tools written in Python, BASH and Go

WARNING This library is still under development and intended for experimental purposes only.

My URL tools written in Python, BASH and Go

BASH functions `url_extract`, `url_deref` and `url_unescape`

First, don't forget to run

source ./bash_functions.sh

BASH function `url_extract`

uses a regex pattern to extract URLs from text passed either as STDIN stream or arguments:

# search for URL patterns in text passed as STDIN stream:
curl -sS "https://suricrasia.online/iceberg/" | url_extract

# or search for URL patterns in text passed as arguments:
url_extract $(curl -sS "https://suricrasia.online/iceberg/")

Result: extracted 200+ URLs, including weird ones, such as https://opensource.apple.com/source/cctools/cctools-822/misc/strip.c#:~:text=/%0A%09%20%20If%20there%20is,it%20would%20save.%0A%09%20*/.

💡 Tip: to sort the output and remove any duplicates, pipe the above command to sort -u

BASH function `url_deref`

is basically a cURL-based, off-line version of https://deref.link. It follows URL redirects and prints only the target URL. The URL can be given either as an argument or as STDIN stream:

# process a URL as an argument:
url_deref "https://stackoverflow.com/a/70819429/4883320"

# or process a URL as STDIN stream:
echo "https://stackoverflow.com/a/70819429/4883320" | url_deref

Result:

https://stackoverflow.com/questions/70817657/cant-get-firefox-extension-logs-to-show-up/70819429#70819429

BASH function `url_unescape` (with Python under the hood)

💡 Also available as REST API: check https://github.com/kirisakow/api-py

This function makes a URL or any string prettier by resolving HTML entities (e.g. & -> &) with Python's html.unescape, non-ASCII escaped characters (%uXXXX) with a regex pattern, and percent-encoded characters (%XX) with urllib.parse.unquote:

url_unescape "https%3A//be.wikipedia.org/wiki/%u0416%u044B%u0432%u0435_%u0411%u0435%u043B%u0430%u0440%u0443%u0441%u044C%21"

Result: https://be.wikipedia.org/wiki/Жыве_Беларусь!

url_unescape "https%3A%2F%2Fuk.wikipedia.org%2Fwiki%2F%D0%A1%D0%BB%D0%B0%D0%B2%D0%B0_%D0%A3%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%96!"

Result: https://uk.wikipedia.org/wiki/Слава_Україні!

Also,

the URL can be given either as an argument or as STDIN stream,
url_unescape is meant to work in pair with url_deref:

long_url_with_redirect="https://france24.nlfrancemm.com/m/surl/200243/517183/yD0Vqr_mEaDTwJcBJSIuyA==/link_13/HztCd5MALBSiwyWcdZpQvGZuP+L2dlD0fqSjv4DZVsqW+MUvK7a2X8uUILOWdBCiVjMwqEsKsY+9dh7nVfSCzxyxWHUs7tbSQxU3Ok5bOrTyAvRPCKsURxr+LisJ58BR28mFkT2aLLItU7iBkLrHfB5MoWOY3+x0YHcH5Z66LNg-L0J2ND8pSiAw4qzu0Dz19Meq-zbPfN7-MLR6V9LeeQGpxifPQCKMU5nmaVyQUXRZDgDLx+sLPRlzIr--Oc3bzV0X+jgm6SfsBYhxruKPQz70kvNSgAGeNQPgEtBR0AC-m92X8EDJI2th4UFqBvwNeU-rRJx1wgsydqUjrVsLi6-0og9XJILZ3hSboC3S85wB3AW2D6PP7SDuZkDhaTGLG03mmkCipwsPwW2-8UhTLniSzKA054euZqG9vo+Ve3gJrO9QYwQ64EjKTplSScUZVZMok0OhhCg9C3dW1M-tQ1Hd19YpdgWP8U9Tl0xyPmJmOZUAamPUyZJR569tdI+hW-g7tMx9T90eAAstFzj86hQISpD7cKeV3PvMJj+MV8K2668OTZULlrocfGSXTyMbDc0ZaSroLe0nrpbHSjmRWgUisF-z2Rq2+7XzUGmrtcS3sYgpMag2QemK68TzVlqu2CaK2B97jIyZNOyuHpbBKPNYRM58mu+D7-9KTnysI-YcH93Fmh33mRv1fyVlxCpmm0PoZXmZd7x7klL6-JStwhei33DpD-qRUAlmo93xOlzO9xJQxjUpZaG1qM2xn9e+WAfwVIA3ouw8slY0W5PjCRmqOjtB4bSIWANjsLrKkAAwzHm-BCcfeWFjzA+PlQXJ3jV4WNaTkek91lEF0aPbWoxUplU0xV+610tu3sKnjM4="

# process a URL as STDIN stream:
url_deref "$long_url_with_redirect" | url_unescape

# or process a URL as an argument:
url_unescape $(url_deref "$long_url_with_redirect")

Result:

https://www.france24.com/fr/amériques/20221227-le-blizzard-du-siècle-fait-au-moins-50-morts-aux-états-unis-le-bilan-risque-de-s-alourdir?xtor=EPR-300&_ope=eyJndWlkIjoiN2ZiZTFiYWI1YWRiMTI1ZGJmMzRkMDdhNWQzNGQ2ZWIifQ==

Go function `url_clean`

The url_clean function checks a URL against a 120+ long list of garbage query parameters (unwanted_query_params.txt) which it removes from the URL. Originally written in BASH, this function has been refactored in Go to be faster.

This function can process a URL either as an argument or as STDIN stream, and can be combined with the aforementioned BASH functions:

long_url_with_junk_query_params="https://www.france24.com/fr/amériques/20221227-le-blizzard-du-siècle-fait-au-moins-50-morts-aux-états-unis-le-bilan-risque-de-s-alourdir?xtor=EPR-300&_ope=eyJndWlkIjoiN2ZiZTFiYWI1YWRiMTI1ZGJmMzRkMDdhNWQzNGQ2ZWIifQ=="

# process a URL as STDIN stream:
echo "$long_url_with_junk_query_params" | url_clean

# or process a URL as an argument:
url_clean "$long_url_with_junk_query_params"

Result:

https://www.france24.com/fr/amériques/20221227-le-blizzard-du-siècle-fait-au-moins-50-morts-aux-états-unis-le-bilan-risque-de-s-alourdir

Test `url_clean` live as a REST API

💡 This piece of software is also available live as a REST API: check https://github.com/kirisakow/api-go

Unlike the former functions url_deref and url_unescape which only process one URL at a time, the url_clean function can process multiple URLs either as multiple arguments or as a multiline STDIN stream.

Installation

# 0. install go with snap or apt

# 1. download url_tools project and cd into the project dir
git clone ...
cd ./url_tools/url_clean

# 1. build the binary
go build

# 2. add to $PATH
export PATH="/path/to/url_tools/url_clean:$PATH"

Other tools to help you keep the URLs clean of junk as you browse

Browser extension `uBlock Origin` (my #1 favorite)

The powerful browser extension uBlock Origin adblocker is also a powerful URL cleaner. To keep your URLs clean of junk query parameters, install uBlock Origin, go to the parameters, then to the My filters tab, and use the removeparam modifier with either plain-text or regex values:

Here, you can copy my filters as text:

||*$removeparam=_ope
||*$removeparam=/^(at|ul|utm)_/
||*$removeparam=/^act(CampaignType|Id|Source)/
||*$removeparam=/^hash(a|b|c)/
||*$removeparam=bxid
||*$removeparam=CMP
||*$removeparam=cndid
||*$removeparam=esrc
||*$removeparam=etear
||linkedin.com*$removeparam=/^(ref|tracking|reference)Id/
||twitter.com*$removeparam=/^(t|s)/
||*$removeparam=ueid
||*$removeparam=xtor

Browser extension `neat-url`

Firefox add-on neat-url allows for blacklist customization. However, it's not as powerful as uBlock Origin.

Browser extension `clearurls`

Firefox add-on clearurls has a “Recommended” badge. However, it doesn't allow for blacklist customization at all.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
img		img
url_clean		url_clean
url_unescape		url_unescape
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bash_functions.sh		bash_functions.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

My URL tools written in Python, BASH and Go

BASH functions `url_extract`, `url_deref` and `url_unescape`

BASH function `url_extract`

BASH function `url_deref`

BASH function `url_unescape` (with Python under the hood)

Go function `url_clean`

Test `url_clean` live as a REST API

Installation

Other tools to help you keep the URLs clean of junk as you browse

Browser extension `uBlock Origin` (my #1 favorite)

Browser extension `neat-url`

Browser extension `clearurls`

About

Releases

Packages

Languages

License

kirisakow/url_tools

Folders and files

Latest commit

History

Repository files navigation

My URL tools written in Python, BASH and Go

BASH functions url_extract, url_deref and url_unescape

BASH function url_extract

BASH function url_deref

BASH function url_unescape (with Python under the hood)

Go function url_clean

Test url_clean live as a REST API

Installation

Other tools to help you keep the URLs clean of junk as you browse

Browser extension uBlock Origin (my #1 favorite)

Browser extension neat-url

Browser extension clearurls

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

BASH functions `url_extract`, `url_deref` and `url_unescape`

BASH function `url_extract`

BASH function `url_deref`

BASH function `url_unescape` (with Python under the hood)

Go function `url_clean`

Test `url_clean` live as a REST API

Browser extension `uBlock Origin` (my #1 favorite)

Browser extension `neat-url`

Browser extension `clearurls`

Packages