Skip to content
This repository was archived by the owner on Jun 29, 2024. It is now read-only.

Advanced level Task3 #8

Merged
merged 2 commits into from
May 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Lakshmiprasanna8008/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Linear Regression with Scikit-learn:

Completed a program by applying linear regression to predict house prices from the Boston housing dataset using scikit-learn and compared train and test scores and plot residuals.

This Python script utilizes the Boston housing dataset to perform linear regression. After splitting the data into training and testing sets, it trains a linear regression model on the training data and evaluates its performance on both sets. It calculates the model's scores for training and testing data. Additionally, it visualizes the residuals, highlighting discrepancies between predicted and actual values. This analysis aids in assessing the model's fit and identifying potential areas for improvement, crucial for understanding predictive accuracy and model effectiveness in real-world applications.
45 changes: 45 additions & 0 deletions Lakshmiprasanna8008/Task3
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import pandas as pd
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]

# Load the Boston housing dataset

X = data
y = target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the linear regression model
model = LinearRegression()

# Fit the model on the training data
model.fit(X_train, y_train)

# Predict on the training and testing data
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate the scores
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)

print("Training score:", train_score)
print("Testing score:", test_score)

# Plot residuals
plt.scatter(y_train_pred, y_train_pred - y_train, c='blue', marker='o', label='Training data')
plt.scatter(y_test_pred, y_test_pred - y_test, c='lightgreen', marker='s', label='Testing data')
plt.xlabel('Predicted values')
plt.ylabel('Residuals')
plt.legend(loc='upper left')
plt.hlines(y=0, xmin=0, xmax=50, lw=2, color='red')
plt.title('Residual plot')
plt.show()