Skip to content

Linear regression options and insights using Python and PyShiny

Notifications You must be signed in to change notification settings

denisecase/python-ml-linear-regression

Repository files navigation

Why Do We Square Residuals in Linear Regression?

Why Do We Square Residuals in Linear Regression?

Try it here: Comparison App Running in PyShiny

This repo (python-ml-linear-regression) is for exploring simple linear regression using Python (and PyShiny).

Explore the PyShiny Playground

We use the Shiny for Python (PyShiny) online playground to explore.

Go to the Plotly Example at the link above. Notice there are two code files:

  • app.py (visible tab)
  • requirements.txt (background tab)

Click the run arrow to see the current chart.

We can use PyShiny to explore linear regression using Python.

Example 1: Random Linear Regression

Go to the PyShiny Playground. Run the Plotly Example.

For this example, we need to edit one file:

  • We'll replace the code in the app.py in the Playground with our app_random.py code.

app.py

This Repo: Click on app_random.py above. Select all the code (CTRL A if Windows or CMD A if Mac). Copy the code to your clipboard (CTRL C if Windows, or CMD C if Mac).

Browser: Click in the Playgroud app.py tab. Select all the code in the app.py playground example (CTRL A if Windows or CMD A if Mac). Paste the code from your clipboard (app_random.py as shown in this repo) into app.py in the Playground (CTRL V if Windows, or CMD V if Mac). Verify the code copied correctly.

Click the run arrow (⏵) to see the updated app.

Example 2: Comparing Linear Regression Options

Go to the PyShiny Playground. Run the Plotly Example.

For this example, we need three files:

  • We'll replace the code in the app.py in the Playground with our app.py code.
  • We'll edit the reqirements.txt file to add more packages.
  • We'll add another file named utils.py that has some linear regression code.

app.py

This Repo: Click on app.py above. Select all the code (CTRL A if Windows or CMD A if Mac). Copy the code to your clipboard (CTRL C if Windows, or CMD C if Mac).

Browser: Click in the Playground app.py tab. Select all the code in the app.py playground example (CTRL A if Windows or CMD A if Mac). Paste the code from your clipboard (app.py as shown in this repo) into app.py in the Playground (CTRL V if Windows, or CMD V if Mac). Verify the code copied correctly.

requirements.txt

This Repo: Click on requirements.txt above. Select all the code. Copy the code to your clipboard.

Browser: Click in the Playground requirements.txt tab. Select all the requirements.txt code in the playground example. Paste the code from your clipboard (requirements.txt as shown in this repo) into requirements.txt in the Playground. Verify the code copied correctly.

utils.py

This Repo: Click on utils.py above. Select all the code. Copy the code to your clipboard.

Browser: Click in the Playground code window. Add a new code file. Name the file exactly utils.py. Spelling and capitalization should be exact. Paste the code from your clipboard (utils.py as shown in this repo) into your new utils.py file in the Playground. Verify the code copied correctly.

Click the run arrow (⏵) to see the updated app.

Links to the Apps Running in Shinylive

Example 1: Random Linear Regression

Example 2: Comparing Linear Regression Options

Acknowledgements

Many thanks to the creators, contributors, and maintainors for making these powerful tools available for free:

  • Python
  • PyShiny
  • GitHub
  • pandas
  • plotly
  • numpy
  • scipy
  • statsmodels
  • scikit-learn

Screenshot Example 1

Example 1

Screenshot Example 2 (Dataset 1)

Example 2

Screenshot Example 2 (Dataset 2)

Example 2

Historical Note

Back in his 1970 book, "Statistical Problems and How To Solve Them", L.H. Longley-Cook noted (as a footnote on page 153):

"The least squares line in not completely satisfactory because it gives too great a weight to extreme values. Some writers have proposed the least distance line, when |D1| + |D2| + |D3| + |D4| + ... is a minimum. |D| is the distance taken as positive in each case. In the past, it has been difficult to caclulate this line because of the sign problem, but this can be overcome with modern computers."

That was 50 years ago. Why are we still avoiding "difficult" absolute value calculations by squaring?

About

Linear regression options and insights using Python and PyShiny

Topics

Resources

Stars

Watchers

Forks

Languages