Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

Mine Dogucu, University of California Irvine
Mine Çetinkaya-Rundel University of Edinburgh, RStudio, and Duke University

To cite this article:

Mine Dogucu & Mine Çetinkaya-Rundel (2021) Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities, Journal of Statistics and Data Science Education, 29:sup1, S112-S122, DOI: 10.1080/10691898.2020.1787116

The code for web scraping examples can be found in this repo and on RStudio Cloud.

Preprint of the paper.

This paper is now published online on Journal of Statistics Education.

Abstract

Best practices in statistics and data science courses include the use of real and relevant data as well as teaching the entire data science cycle starting with importing data. A rich source of real and current data is the web, where data are often presented and stored in a structure that needs some wrangling and transforming before they can be ready for analysis. The web is a resource students naturally turn to for finding data for data analysis projects, but without formal instruction on how to get that data into a structured format, they often resort to copy-pasting or manual entry into a spreadsheet, which are both time consuming and error-prone. Teaching web scraping provides an opportunity to bring such data into the curriculum in an effective and efficient way. In this paper we explain how web scraping works and how it can be implemented in a pedagogically sound and technically executable way at various levels of statistics and data science curricula. We provide classroom activities where we connect this modern computing technique with traditional statistical topics. Lastly, we share the opportunities web scraping brings to the classrooms as well as the challenges the instructors and tips for avoiding them.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
html-examples		html-examples
img		img
opensecrets		opensecrets
README.md		README.md
header.tex		header.tex
references.bib		references.bib
summary-skills.xlsx		summary-skills.xlsx
web_scrape.Rmd		web_scrape.Rmd
web_scrape.pdf		web_scrape.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

Abstract

About

Releases

Packages

Contributors 3

Languages

mdogucu/web-scrape

Folders and files

Latest commit

History

Repository files navigation

Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities

Abstract

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages