Skip to content
This repository was archived by the owner on Jun 29, 2024. It is now read-only.

Commit 5b75e41

Browse files
authoredMay 8, 2024
Advanced level Task3 (#8)
* Create Task3 * Create README.md
1 parent afbb648 commit 5b75e41

File tree

2 files changed

+50
-0
lines changed

2 files changed

+50
-0
lines changed
 

‎Lakshmiprasanna8008/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Linear Regression with Scikit-learn:
2+
3+
Completed a program by applying linear regression to predict house prices from the Boston housing dataset using scikit-learn and compared train and test scores and plot residuals.
4+
5+
This Python script utilizes the Boston housing dataset to perform linear regression. After splitting the data into training and testing sets, it trains a linear regression model on the training data and evaluates its performance on both sets. It calculates the model's scores for training and testing data. Additionally, it visualizes the residuals, highlighting discrepancies between predicted and actual values. This analysis aids in assessing the model's fit and identifying potential areas for improvement, crucial for understanding predictive accuracy and model effectiveness in real-world applications.

‎Lakshmiprasanna8008/Task3

+45
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
import numpy as np
2+
import matplotlib.pyplot as plt
3+
from sklearn.model_selection import train_test_split
4+
from sklearn.linear_model import LinearRegression
5+
from sklearn.metrics import mean_squared_error
6+
import pandas as pd
7+
data_url = "http://lib.stat.cmu.edu/datasets/boston"
8+
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
9+
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
10+
target = raw_df.values[1::2, 2]
11+
12+
# Load the Boston housing dataset
13+
14+
X = data
15+
y = target
16+
17+
# Split the data into training and testing sets
18+
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
19+
20+
# Initialize the linear regression model
21+
model = LinearRegression()
22+
23+
# Fit the model on the training data
24+
model.fit(X_train, y_train)
25+
26+
# Predict on the training and testing data
27+
y_train_pred = model.predict(X_train)
28+
y_test_pred = model.predict(X_test)
29+
30+
# Calculate the scores
31+
train_score = model.score(X_train, y_train)
32+
test_score = model.score(X_test, y_test)
33+
34+
print("Training score:", train_score)
35+
print("Testing score:", test_score)
36+
37+
# Plot residuals
38+
plt.scatter(y_train_pred, y_train_pred - y_train, c='blue', marker='o', label='Training data')
39+
plt.scatter(y_test_pred, y_test_pred - y_test, c='lightgreen', marker='s', label='Testing data')
40+
plt.xlabel('Predicted values')
41+
plt.ylabel('Residuals')
42+
plt.legend(loc='upper left')
43+
plt.hlines(y=0, xmin=0, xmax=50, lw=2, color='red')
44+
plt.title('Residual plot')
45+
plt.show()

0 commit comments

Comments
 (0)
This repository has been archived.