This is a script that downloads the daily prompt and the data files for Eric Watsl's Advent Of Code challenge. The script converts raw HTML into a Markdown file, and if configured correctly, will also download input data as a text file. If there is an ongoing AOC event, you can run the script each day to download the current day's AOC challenge. Otherwise, you can select from past AOC events (starting from 2015) and download the entire year's challenges.
python aoc_scrape.py
- Download the current year's challenges. If the current year's AOC has not been released yet, the script will default to downloading the entire previous year's challenges.
- In the case of connection errors, the script will terminate.
- By default the script will generate folders for each year in the directory above where the
aoc_scrape.py
file is located.
python aoc_scrape.py -c CONFIG_FILE
- Specify a configuration file. Details below.
- This argument must be specified in order to retrieve input data files.
python aoc_scrape.py -o
- If there are currently downloaded files (challenge and/or data), will overwrite the current files with new ones
target_directory
├── 2022
│ ├── day1
│ │ ├── challenge.md
│ │ └── data.txt
│ ├── day2
│ │ ├── challenge.md
│ │ └── data.txt
│ ├── day3
│ │ ├── challenge.md
│ │ └── data.txt
│ └── ...
└── AdventOfCodeScraper
├── aoc_scraper.py
├── config.json
├── LICENSE
├── README.md
└── scraper_tools.py
This script can be configured using a .json
file. A template file is provided with default values.
target_directory
is an optionalstring
value that stores the directory where you intend to place the folder containing all AOC years.- Default value is
null
.
- Default value is
years
is an optionalarray
value that stores the years that you wish to download for.- Default value is an empty array,
[]
, which will be interpreted as the current year. - You can specify multiple years at a time (
[2015,2016,...]
). Invalid years and duplicates will be ignored.
- Default value is an empty array,
auth_token
is intended to be astring
value, default value isnull
.- In order to get your input data for AOC, you need to authenticate with your own session cookie. Follow the instructions here to obtain it.
- Append the session ID cookie, e.g.
0123456789abcdef...
to this field. - Default behavior if no token is provided is to not retrieve input data.
{
"config": {
"years": [],
"targetDir": null
},
"auth": {
"authToken": null
}
}
- Python Standard Library
- Requests,
pip install requests
- pytz,
pip install pytz
- BeautifulSoup4,
pip install beautifulsoup4
- Markdownify,
pip install markdownify