The data this week comes from Steam by way of Kaggle and originally came from SteamCharts. The data was scraped and uploaded to Kaggle.
Note there is a different dataset based on video games from 2019's TidyTuesday, check it out here, there's a possibility that some of the data could be joined on "name".
Additionally we are doing a crossover with the "Sliced" data science challenge this week!
Make sure to tune in to "Sliced" on Nick Wan's Twitch stream, Tuesday March 16th at 8:30 pm ET!
What is Sliced? It's like Chopped but for Data Science!
Data scientists get data they have never seen and have 2 hours to make a predictive model. Create the best data science or be sliced!
This is inline with the TidyTuesday efforts, and I look forward to seeing what they do with the stream.
# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2021-03-16')
tuesdata <- tidytuesdayR::tt_load(2021, week = 12)
games <- tuesdata$games
# Or read in the data manually
games <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2021/2021-03-16/games.csv')
variable | class | description |
---|---|---|
gamename | character | Name of video games |
year | double | Year of measure |
month | character | Month of measure |
avg | double | Average number of players at the same time |
gain | double | Gain (or loss) Difference in average compared to the previous month (NA = 1st month) |
peak | double | Highest number of players at the same time |
avg_peak_perc | character | Share of the average in the maximum value (avg / peak) in % |
No cleaning this week!