Skip to content

pdcherry/statistics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

statistics readme

Patrick Cherry

emmeans: Estimated marginal means

Link to output

Estimated marginal means (EMMs, previously known as least-squares means in the context of traditional regression models) are derived by using a model to make predictions over a regular grid of predictor combinations (called a reference grid).

I use estimated marginal means to estimate the effect sizes when:

  • interaction effects are present,
  • when multiple effects are present—but are not scaled the same way (e.g. one effect is linear, one effect is reciprocal (1/x) ),
  • when variability is not homoscedastic (e.g. as in a glm) and I could use the confidence intervals offered by emmeans that I don’t get with ordinary means,
  • or when the experiment’s sampling is not balanced, causing some conditions to be over-weighted in the ordinary means.

DoE: Design of Experiment

Link to output

Design of Experiment principles seek to configure the samples, variables, and controls in a scientific experimental plan to answer the question posed or test the hypothesis while controlling for known sources of variability and confounding due to the methods and materials used to carry out the experiment.

Here, I use the and the standard R function gen.factorial to make a full factorial design and then use the package AlgDesign to add blocking for two operators who will be carrying out the experiment (with n = 3 replicates for each unique sample condition).

Random forest classification

Link to output

Random forest model classification of legal status of trees in San Francisco Department of Public Works data from a Tidy Tuesday project.

Here, I used ranger engine to train a random forest model to classify the legal status of the trees using all relevant observations, use the tune package to run hyperperamater optimization on mtry and min_n, and then evaluate the accuracy of the model using AUC and by plotting the correct and incorrect predictions in a map-like format.

Anscombe’s Quartet

Link to output

Anscombe’s quartet is a set of four x : y value pairs published by F J Anscombe in American Statistician in 1793. The sets have nearly identical descriptive statistics, like mean, standard deviation, R^2 correlation, and least-squares regression slopes, (to ~ 3 decimal places), but are clearly very different data sets when visualized by plotting.

I use unpivotr to tidy the data upon import, ggplot to make plots, and purrr’s map for functional programming on nested dataframes with broom for model object manipulation, included nested in dataframes.

About

Exercises & Explorations of Statistics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published