Skip to content

z33kz33k/mtg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mtg

Scrape data on MtG decks.

Description

This is a hobby project.

It started as a card data scraping from MTG Goldfish. Then, some JumpIn! packets info scraping was added. Then, there was some play with Limited data from 17lands when I thought I had to bear with utter boringness of that format (before the dawn of Golden Packs on Arena) [This part has been deprecated and moved to archive package]. Then, I discovered I don't need to scrape anything because Scryfall.

Then, I quit (Arena).

Now, the main focus is decks package and yt module (parsing data on youtubers' decks from YT videos descriptions).

What works

  • Scryfall data management via downloading bulk data with scrython and wrapping it in convenient abstractions
  • Scraping YT channels for videos with decklists in descriptions (or comments) - using no less than four Python libraries to avoid bothering with Google APIs:
  • Scraping YT videos' descriptions (or comments) for decks:
    • Text decklists in Arena/MTGO format pasted into video descriptions are parsed into Deck objects
    • Links to decklist services are scraped into Deck objects. 36 services are supported so far:
    • Other decklist services are in plans (but, it does seem like I've pretty much exhausted the possibilities already :))
    • Both Aetherhub decklist types featured in YT videos are supported: regular deck and write-up deck
    • Both Untapped decklist types featured in YT videos are supported: regular deck and profile deck
    • Both old TCGPlayer site and TCGPlayer Infinite are supported
    • Both international and native Hareruya sites are supported
    • LigaMagic is the only sore spot that demands from me investing in scraping APIs to bypass their CloudFlare protection and be fully supported (anyway, the logic to scrape them is already in place)
    • All those mentioned above work even if they are behind shortener links and need unshortening first
    • Sites that need it are scraped using Selenium
    • Link trees posted in descriptions are expanded
    • Links to pastebin-like services (like Amazonian does) , Patreon posts and Google Docs documents are expanded too and further parsed for decks
    • If nothing is found in the video's description, then the author's comments are parsed
    • Deck's name and format are derived (from a video's title, description and keywords) if not readily available
    • Foreign cards and other that cannot be found in the downloaded Scryfall bulk data are looked up with queries to the Scryfall API
    • Individual decklists are extracted from container pages and further processed for decks. These include:
      • Aetherhub users, events and articles
      • Archidekt folders and users
      • Cardsrealm profiles, folders, tournaments and articles
      • ChannelFireball players, articles and authors
      • CyclesGaming articles
      • Deckbox users and events
      • Deckstats users
      • EDHTop16 tournaments and commanders
      • Flexslot users
      • Goldfish tournaments, players and articles
      • Hareruya events and players
      • LigaMagic events (with caveats)
      • MagicVille events and users
      • ManaStack users
      • Manatraders users
      • Magic.gg events
      • MagicBlogs.de articles
      • Melee.gg tournaments
      • Moxfield bookmarks, users and search results
      • MTGAZone articles and authors
      • MTGDecks.net tournaments
      • MTGO events
      • MTGStocks articles
      • MTGTop8 events
      • Pauperwave articles
      • PennyDreadfulMagic competitions and users
      • StarCityGames events, players, articles and author's decks databases
      • Streamdecker users
      • TappedOut users, folders, and user folders
      • TCDecks events
      • TCGPlayer (old-site) players
      • TCGPlayer Infinite players, authors, author searches, author deck panes, events and articles
      • TopDeck.gg brackets and profiles
      • Untapped profiles
  • Assessing the meta:
    • Scraping Goldfish and MGTAZone for meta-decks (others in plans)
    • Scraping a singular Untapped meta-deck decklist page
  • Exporting decks into a Forge MTG .dck format or Arena decklist saved into a .txt file - with autogenerated, descriptive names based on scraped deck's metadata
  • Importing back into a Deck from those formats
  • Export/import to other formats in plans
  • Dumping decks, YT videos and channels to .json
  • I compiled a list of almost 1.9k YT channels that feature decks in their descriptions and successfully scraped them (at least 25 videos deep) so this data only waits to be creatively used now!

How it looks in a Google Sheet

Most popular channels

Scraped decks breakdown

No Format Count Percentage
1 commander 38460 38.29 %
2 standard 21776 21.68 %
3 modern 9539 9.50 %
4 pauper 6693 6.66 %
5 pioneer 5721 5.70 %
6 legacy 3438 3.42 %
7 brawl 2094 2.08 %
8 historic 1913 1.90 %
9 explorer 1730 1.72 %
10 undefined 1559 1.55 %
11 timeless 1412 1.41 %
12 duel 1389 1.38 %
13 paupercommander 1290 1.28 %
14 premodern 764 0.76 %
15 irregular 703 0.70 %
16 vintage 685 0.68 %
17 alchemy 480 0.48 %
18 penny 268 0.27 %
19 standardbrawl 208 0.21 %
20 oathbreaker 188 0.19 %
21 gladiator 77 0.08 %
22 oldschool 34 0.03 %
23 future 22 0.02 %
24 predh 1 0.00 %
TOTAL 100444 100.00 %
No Source Count Percentage
1 moxfield.com 45823 45.62 %
2 arena.decklist 9410 9.37 %
3 mtggoldfish.com 7761 7.73 %
4 aetherhub.com 7758 7.72 %
5 mtgo.com 6744 6.71 %
6 archidekt.com 4143 4.12 %
7 mtgdecks.net 2993 2.98 %
8 tcgplayer.com 2174 2.16 %
9 melee.gg 1940 1.93 %
10 mtga.untapped.gg 1920 1.91 %
11 tappedout.net 1562 1.56 %
12 mtg.cardsrealm.com 1343 1.34 %
13 streamdecker.com 1226 1.22 %
14 mtgtop8.com 1160 1.15 %
15 magic.gg 1010 1.01 %
16 deckstats.net 532 0.53 %
17 mtgazone.com 431 0.43 %
18 hareruyamtg.com 351 0.35 %
19 pennydreadfulmagic.com 250 0.25 %
20 scryfall.com 247 0.25 %
21 flexslot.gg 246 0.24 %
22 pauperwave.com 219 0.22 %
23 magic-ville.com 217 0.22 %
24 channelfireball.com 170 0.17 %
25 topdecked.com 157 0.16 %
26 old.starcitygames.com 151 0.15 %
27 manabox.app 121 0.12 %
28 manatraders.com 56 0.06 %
29 tcdecks.net 55 0.05 %
30 mtgstocks.com 44 0.04 %
31 manastack.com 39 0.04 %
32 mtgsearch.it 37 0.04 %
33 deckbox.org 28 0.03 %
34 paupermtg.com 26 0.03 %
35 cardhoarder.com 24 0.02 %
36 mtgarena.pro 22 0.02 %
37 cyclesgaming.com 20 0.02 %
38 app.cardboard.live 18 0.02 %
39 17lands.com 11 0.01 %
40 magicblogs.de 4 0.00 %
41 mtgotraders.com 1 0.00 %
TOTAL 100444 100.00 %