A collection of scripts and programs to extract dota's game files and build an sqlite database. See the output of this builder at my Dotabase repository.
The main library/tool that this builder leans on is ValveResourceFormat. This lovely project is what allows me to extract the data from dota's vpk files, and decompile some of the obscure file formats like vsnd_c into more friendly ones like mp3. I'm using the Decompiler from this project in a call that looks about like this:
./Decompiler.exe -i "<vpk_location>" --vpk_cache -d -e "txt,dat,vsndevts_c,vxml_c,vjs_c,vcss_c,png,cfg,res,vsnd_c,vtex_c" -o "<vpk_out_location>" --threads 16
As a focus of dotabase is the extraction of the Hero Responses data, I wanted to extract the subtitles/captions for each response. Unfortunatly, these are stored in .dat files, and as of the time of this project creation, I have not found a reliable way to decompile these back into their original .txt format. Instead, I have decided to scrape this information from the Dota 2 Wiki. A bit of a hackish method, but it works.
The voice lines texts are now taken directly from the subtitles.dat files from the vpk. See vccd_reader.py for more info on how that works.
Although the ValveResourceFormat decompiler does a good job of decompiling the game files into readable text files, they are still not in a format that is easily readable by programs. To that end, I convert all of the files containing information I need into .json files. I do this by doing a bunch of regex substitutions.