-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
index.Rmd
78 lines (53 loc) · 6.77 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
title: "Introduction to Data Science"
subtitle: "Data Analysis and Prediction Algorithms with R"
author: "Rafael A. Irizarry"
documentclass: krantz
bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
colorlinks: yes
lot: no
lof: no
graphics: yes
urlcolor: blue
geometry: "left=1.5in, right=1.5in, top=1.25in, bottom=1.25in"
description: This book introduces concepts and skills that can help you tackle real-world
data analysis challenges. It covers concepts from probability, statistical inference,
linear regression and machine learning and helps you develop skills such as R programming,
data wrangling with dplyr, data visualization with ggplot2, file organization with
UNIX/Linux shell, version control with GitHub, and reproducible document preparation
with R markdown.
#documentclass: book
#classoption: openany
site: bookdown::bookdown_site
always_allow_html: yes
---
```{r include=FALSE}
# automatically create a bib database for R packages
knitr::write_bib(c(
.packages(), 'bookdown', 'knitr', 'rmarkdown'), 'packages.bib')
```
<hr>
# New edition available {-}
This is the website for the first edition of Introduction to Data Science. **This book is now out-of-date**. We recommend using the second edition which is now divided into two parts:
* [Data Wrangling and Visualization with R](https://rafalab.dfci.harvard.edu/dsbook-part-1/).
* [Statistics and Prediction Algorithms Through Case Studies](https://rafalab.dfci.harvard.edu/dsbook-part-2/).
# Preface for first edition {-}
<div style= "float:right;position: relative; middle: -80px;box-shadow: 0 .5rem 1rem rgba(0,0,0,.15);"><a href="https://leanpub.com/datasciencebook"><img src = "logo.png" width = "250"></a></div>
This book started out as the class notes used in the
HarvardX [Data Science Series](https://www.edx.org/professional-certificate/harvardx-data-science)<!--^[https://www.edx.org/professional-certificate/harvardx-data-science]. -->
A **hardcopy** version of the book is available from [CRC Press]( https://www.routledge.com/Introduction-to-Data-Science-Data-Analysis-and-Prediction-Algorithms-with/Irizarry/p/book/9780367357986?utm_source=author&utm_medium=shared_link&utm_campaign=B043135_jm1_5ll_6rm_t081_1al_introductiontodatascienceauthorshare)<!--^[https://www.crcpress.com/Introduction-to-Data-Science-Data-Analysis-and-Prediction-Algorithms-with/Irizarry/p/book/9780367357986]. -->
A **free PDF** of the October 24, 2019 version of the book is available from [Leanpub](https://leanpub.com/datasciencebook)<!--^[https://leanpub.com/datasciencebook].-->
A version in **Spanish** is available from [https://rafalab.github.io/dslibro](https://rafalab.github.io/dslibro/).
This book was published with [bookdown](https://github.com/rstudio/bookdown). The **R markdown code** used to generate the book is available on [GitHub](https://github.com/rafalab/dsbook)<!--^[https://github.com/rafalab/dsbook]-->. Note that, the graphical theme used for plots throughout the book can be recreated using the `ds_theme_set()` function from __dslabs__ package.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0).
We make announcements related to the book on Twitter. For updates follow [\@rafalab](https://twitter.com/rafalab).
Last update: `r lubridate::now()`
# Acknowledgments {-}
This book is dedicated to all the people involved in building and maintaining R and the R packages we use in this book. A special thanks to the developers and maintainers of R base, the tidyverse, and the caret package.
A special thanks to my tidyverse guru David Robinson and Amy Gill for dozens of comments, edits, and suggestions. Also, many thanks to Stephanie Hicks who twice served as a co-instructor in my data science classes and Yihui Xie who patiently put up with my many questions about bookdown. Thanks also to Karl Broman, from whom I borrowed ideas for the Data Visualization and Productivity Tools parts, and to Héctor Corrada-Bravo, for advice on how to best teach machine learning. Thanks to Peter Aldhous from whom I borrowed ideas for the principles of data visualization section and Jenny Bryan for writing _Happy Git and GitHub for the useR_, which influenced our Git chapters. Thanks to
Alyssa Frazee for helping create the homework problem that became the Recommendation Systems chapter and to Amanda Cox for providing the New York Regents exams data. Also, many thanks to Jeff Leek, Roger Peng, and Brian Caffo, whose class inspired the way this book is divided and to Garrett Grolemund and Hadley Wickham for making the bookdown code for their R for Data Science book open. Finally, thanks to Alex Nones for proofreading the manuscript during its various stages.
This book was conceived during the teaching of several applied statistics courses, starting over fifteen years ago. The teaching assistants working with me throughout the years made important indirect contributions to this book. The latest iteration of this course is a HarvardX series coordinated by Heather Sternshein and Zofia Gajdos. We thank them for their contributions. We are also grateful to all the students whose questions and comments helped us improve the book. The courses were partially funded by NIH grant R25GM114818. We are very grateful to the National Institutes of Health for its support.
A special thanks goes to all those who edited the book via GitHub pull requests or made suggestions by creating an _issue_ or sending an email: `nickyfoto` (Huang Qiang), `desautm` (Marc-André Désautels), `michaschwab` (Michail Schwab), `alvarolarreategui` (Alvaro Larreategui), `jakevc` (Jake VanCampen), `omerta` (Guillermo Lengemann), `espinielli` (Enrico Spinielli), `asimumba`(Aaron Simumba), `braunschweig` (Maldewar), `gwierzchowski` (Grzegorz Wierzchowski), `technocrat` (Richard Careaga), `atzakas`, `defeit` (David Emerson Feit), `shiraamitchell` (Shira Mitchell), `Nathalie-S`, `andreashandel` (Andreas Handel), `berkowitze` (Elias Berkowitz), `Dean-Webb` (Dean Webber), `mohayusuf`, `jimrothstein`, `mPloenzke` (Matthew Ploenzke), `NicholasDowand` (Nicholas Dow), `kant` (Darío Hereñú), `debbieyuster` (Debbie Yuster), `tuanchauict` (Tuan Chau), `phzeller`, `BTJ01` (BradJ), `glsnow` (Greg Snow), `mberlanda` (Mauro Berlanda), `wfan9`, `larswestvang` (Lars Westvang), `jj999` (Jan Andrejkovic), `Kriegslustig` (Luca Nils Schmid), `odahhani`, `aidanhorn` (Aidan Horn), `atraxler` (Adrienne Traxler), `alvegorova`,`wycheong` (Won Young Cheong),
`med-hat` (Medhat Khalil), `kengustafson`, `Yowza63`, `ryan-heslin` (Ryan Heslin), `raffaem`, `tim8west`, `lgatto` (Laurent Gatto), `jooleer`, `pauluhn` (Paul), `pete-murphy` (Pete Murphy), David D. Kane, El Mustapha El Abbassi, Vadim Zipunnikov, Anna Quaglieri, Chris Dong, Bowen Gu, and Rick Schoenberg.