Skip to content

Analysing data if waitress' or waiter's T-Shirt color has an impact if she or he is getting tipped.

Notifications You must be signed in to change notification settings

edkry/isTippedBasedOnTShirtColor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

isTippedBasedOnTShirtColor

Analysing data if waitress' or waiter's T-Shirt color has an impact if she or he is getting tipped.

Task and data

It is thought, that customer leave more often tips when the waitress is wearing red T-Shirt. So, let's dive into the data and let's see whether it's true or not. In our dataset we have these variables:

  • Color - indicates the color of t-shirt the waitress is wearing.
  • Tipnum - the variable which indicates if customer tipped or not. When tipnum = 1 - customer tipped.
  • Male - indicates the client's sex. When male = 1 - client is male.
  • Black - When black = 1, it means waitress is wearing black T-Shirt, and when black = 0, it means waitress is not wearing black t-shirt.
  • White - "--"
  • Yellow - "--"
  • Blue - "--"
  • Green - "--"

Note: when every variable black, white, yellow, blue and green are equal to zero, it that waitress is wearing red t-shirt. Dataset report:

image

Here sampsz variable indicates, how many observations we have of each group.

Analysis results and conclusions

I'm going to work with the data where only the observed customers are men. I'm analysing this logistic regression model:

image

First of all, let's see if we have enough data in each color group to work with:

image

It seems that we have enough data, since frequency of each group is over 5.

Next, let's see if every independent varialbe is statistically significant:

image

Since our = 0.95, we can see that yellow variable is statistically insignificant and we should remove it from the model.

Once it's removed, let's check Analysis of Maximum Likelihood Estimates table again:

image

Now we see that our every variable is statistically significant and we can continue work with this model.

Our convergence criterion is satisfied:

image

AIC criterion shows that our model is suitable as well:

image

Let's see if our model is accurate to our data:

image

From this table, we can see that c coefficient is only 0.593. It means, the model is better than trying to predict the outcome randomly, but it's still low.

The classification table is showing practically the same:

image

Also, from this table we can see that our treshold is best when it's equal between 0.4 and 0.56.

Now, let's see if we have any outliers:

image

image

image

We can see that we don't have any outliers since Pearson Residual and DFBetas values are not exceeding their limits.

Thus, our model is suitable. We have this model:

image

Model for population:

image

Model for our data sample:

image

image

In order to calculate the probability of getting a tip (P(timnup = 1)), we would need to use this formula:

image

I get these results when we are putting the specific numbers: When waitress wear red T-Shirt, the probability of getting tipped is 0.1846; Black - 0.2841; White - 0.2586; Green - 0.2513; Blue - 0.2563;

We can see that when waitress is wearing the red T-shirt, it's the lowest chance of getting tipped. When waitress is wearing black T-shirt, she has the biggest chance of getting tipped, but the value isn't very significantly different from other color groups, which means, that there isn't really big difference what T-shirt the waitress is wearing.

About

Analysing data if waitress' or waiter's T-Shirt color has an impact if she or he is getting tipped.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages