Scrape data on MtG decks.
This is a hobby project.
It started as a card data scraping from MTG Goldfish
. Then, some JumpIn! packets info scraping
was added. Then, there was some play with Limited data from 17lands when
I thought I had to bear with utter boringness of that format (before the dawn of Golden Packs on
Arena) [This part has been deprecated and moved to archive package]. Then, I discovered I
don't need to scrape anything because Scryfall.
Then, I quit (Arena).
Now, the main focus is decks
package and yt
module (parsing data on youtubers' decks from YT videos
descriptions).
- Scryfall data management via downloading bulk data with scrython and wrapping it in convenient abstractions
- Scraping YT channels for videos with decklists in descriptions (or comments) - using no less than four Python libraries to avoid bothering with Google APIs:
- Scraping YT videos' descriptions (or comments) for decks:
- Text decklists in Arena/MTGO format pasted into video descriptions are parsed into Deck objects
- Links to decklist services are scraped into Deck objects. 36 services are supported so far:
- 17Lands
- Aetherhub
- Archidekt
- CardBoard Live
- Cardhoarder
- Cardsrealm
- ChannelFireball
- Deckbox
- Deckstats
- Flexslot
- Goldfish
- Hareruya
- LigaMagic (with caveats)
- MagicVille
- ManaBox
- ManaStack
- Manatraders
- Melee.gg
- Moxfield
- MTGArena.Pro
- MTGAZone
- MTGDecks.net
- MTGSearch.it
- MTGStocks
- MTGOTraders
- MTGTop8
- PauperMTG
- PennyDreadfulMagic
- Scryfall
- StarCityGames
- Streamdecker
- TappedOut
- TCDecks
- TCGPlayer
- TopDecked
- Untapped
- Other decklist services are in plans (but, it does seem like I've pretty much exhausted the possibilities already :))
- Both Aetherhub decklist types featured in YT videos are supported: regular deck and write-up deck
- Both Untapped decklist types featured in YT videos are supported: regular deck and profile deck
- Both old TCGPlayer site and TCGPlayer Infinite are supported
- Both international and native Hareruya sites are supported
- LigaMagic is the only sore spot that demands from me investing in scraping APIs to bypass their CloudFlare protection and be fully supported (anyway, the logic to scrape them is already in place)
- All those mentioned above work even if they are behind shortener links and need unshortening first
- Sites that need it are scraped using Selenium
- Link trees posted in descriptions are expanded
- Links to pastebin-like services (like Amazonian does) , Patreon posts and Google Docs documents are expanded too and further parsed for decks
- If nothing is found in the video's description, then the author's comments are parsed
- Deck's name and format are derived (from a video's title, description and keywords) if not readily available
- Foreign cards and other that cannot be found in the downloaded Scryfall bulk data are looked up with queries to the Scryfall API
- Individual decklists are extracted from container pages and further processed for decks.
These include:
- Aetherhub users, events and articles
- Archidekt folders and users
- Cardsrealm profiles, folders, tournaments and articles
- ChannelFireball players, articles and authors
- CyclesGaming articles
- Deckbox users and events
- Deckstats users
- EDHTop16 tournaments and commanders
- Flexslot users
- Goldfish tournaments, players and articles
- Hareruya events and players
- LigaMagic events (with caveats)
- MagicVille events and users
- ManaStack users
- Manatraders users
- Magic.gg events
- MagicBlogs.de articles
- Melee.gg tournaments
- Moxfield bookmarks, users and search results
- MTGAZone articles and authors
- MTGDecks.net tournaments
- MTGO events
- MTGStocks articles
- MTGTop8 events
- Pauperwave articles
- PennyDreadfulMagic competitions and users
- StarCityGames events, players, articles and author's decks databases
- Streamdecker users
- TappedOut users, folders, and user folders
- TCDecks events
- TCGPlayer (old-site) players
- TCGPlayer Infinite players, authors, author searches, author deck panes, events and articles
- TopDeck.gg brackets and profiles
- Untapped profiles
- Assessing the meta:
- Scraping Goldfish and MGTAZone for meta-decks (others in plans)
- Scraping a singular Untapped meta-deck decklist page
- Exporting decks into a Forge MTG .dck format or Arena decklist saved into a .txt file - with autogenerated, descriptive names based on scraped deck's metadata
- Importing back into a Deck from those formats
- Export/import to other formats in plans
- Dumping decks, YT videos and channels to .json
- I compiled a list of almost 1.9k YT channels that feature decks in their descriptions and successfully scraped them (at least 25 videos deep) so this data only waits to be creatively used now!
No | Format | Count | Percentage |
---|---|---|---|
1 | commander | 38460 | 38.29 % |
2 | standard | 21776 | 21.68 % |
3 | modern | 9539 | 9.50 % |
4 | pauper | 6693 | 6.66 % |
5 | pioneer | 5721 | 5.70 % |
6 | legacy | 3438 | 3.42 % |
7 | brawl | 2094 | 2.08 % |
8 | historic | 1913 | 1.90 % |
9 | explorer | 1730 | 1.72 % |
10 | undefined | 1559 | 1.55 % |
11 | timeless | 1412 | 1.41 % |
12 | duel | 1389 | 1.38 % |
13 | paupercommander | 1290 | 1.28 % |
14 | premodern | 764 | 0.76 % |
15 | irregular | 703 | 0.70 % |
16 | vintage | 685 | 0.68 % |
17 | alchemy | 480 | 0.48 % |
18 | penny | 268 | 0.27 % |
19 | standardbrawl | 208 | 0.21 % |
20 | oathbreaker | 188 | 0.19 % |
21 | gladiator | 77 | 0.08 % |
22 | oldschool | 34 | 0.03 % |
23 | future | 22 | 0.02 % |
24 | predh | 1 | 0.00 % |
TOTAL | 100444 | 100.00 % |
No | Source | Count | Percentage |
---|---|---|---|
1 | moxfield.com | 45823 | 45.62 % |
2 | arena.decklist | 9410 | 9.37 % |
3 | mtggoldfish.com | 7761 | 7.73 % |
4 | aetherhub.com | 7758 | 7.72 % |
5 | mtgo.com | 6744 | 6.71 % |
6 | archidekt.com | 4143 | 4.12 % |
7 | mtgdecks.net | 2993 | 2.98 % |
8 | tcgplayer.com | 2174 | 2.16 % |
9 | melee.gg | 1940 | 1.93 % |
10 | mtga.untapped.gg | 1920 | 1.91 % |
11 | tappedout.net | 1562 | 1.56 % |
12 | mtg.cardsrealm.com | 1343 | 1.34 % |
13 | streamdecker.com | 1226 | 1.22 % |
14 | mtgtop8.com | 1160 | 1.15 % |
15 | magic.gg | 1010 | 1.01 % |
16 | deckstats.net | 532 | 0.53 % |
17 | mtgazone.com | 431 | 0.43 % |
18 | hareruyamtg.com | 351 | 0.35 % |
19 | pennydreadfulmagic.com | 250 | 0.25 % |
20 | scryfall.com | 247 | 0.25 % |
21 | flexslot.gg | 246 | 0.24 % |
22 | pauperwave.com | 219 | 0.22 % |
23 | magic-ville.com | 217 | 0.22 % |
24 | channelfireball.com | 170 | 0.17 % |
25 | topdecked.com | 157 | 0.16 % |
26 | old.starcitygames.com | 151 | 0.15 % |
27 | manabox.app | 121 | 0.12 % |
28 | manatraders.com | 56 | 0.06 % |
29 | tcdecks.net | 55 | 0.05 % |
30 | mtgstocks.com | 44 | 0.04 % |
31 | manastack.com | 39 | 0.04 % |
32 | mtgsearch.it | 37 | 0.04 % |
33 | deckbox.org | 28 | 0.03 % |
34 | paupermtg.com | 26 | 0.03 % |
35 | cardhoarder.com | 24 | 0.02 % |
36 | mtgarena.pro | 22 | 0.02 % |
37 | cyclesgaming.com | 20 | 0.02 % |
38 | app.cardboard.live | 18 | 0.02 % |
39 | 17lands.com | 11 | 0.01 % |
40 | magicblogs.de | 4 | 0.00 % |
41 | mtgotraders.com | 1 | 0.00 % |
TOTAL | 100444 | 100.00 % |