Advanced level Task3 (#8)

Lakshmiprasanna8008 · web-flow · commit 5b75e413593e · 2024-05-08T12:48:14.000+05:30
* Create Task3

* Create README.md
diff --git a/Lakshmiprasanna8008/README.md b/Lakshmiprasanna8008/README.md
@@ -0,0 +1,5 @@
+Linear Regression with Scikit-learn:
+
+Completed a program by applying linear regression to predict house prices from the Boston housing dataset using scikit-learn and compared train and test scores and plot residuals.
+
+This Python script utilizes the Boston housing dataset to perform linear regression. After splitting the data into training and testing sets, it trains a linear regression model on the training data and evaluates its performance on both sets. It calculates the model's scores for training and testing data. Additionally, it visualizes the residuals, highlighting discrepancies between predicted and actual values. This analysis aids in assessing the model's fit and identifying potential areas for improvement, crucial for understanding predictive accuracy and model effectiveness in real-world applications.
diff --git a/Lakshmiprasanna8008/Task3 b/Lakshmiprasanna8008/Task3
@@ -0,0 +1,45 @@
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import LinearRegression
+from sklearn.metrics import mean_squared_error
+import pandas as pd
+data_url = "http://lib.stat.cmu.edu/datasets/boston"
+raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
+data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
+target = raw_df.values[1::2, 2]
+
+# Load the Boston housing dataset
+ 
+X = data
+y = target
+
+# Split the data into training and testing sets
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
+
+# Initialize the linear regression model
+model = LinearRegression()
+
+# Fit the model on the training data
+model.fit(X_train, y_train)
+
+# Predict on the training and testing data
+y_train_pred = model.predict(X_train)
+y_test_pred = model.predict(X_test)
+
+# Calculate the scores
+train_score = model.score(X_train, y_train)
+test_score = model.score(X_test, y_test)
+
+print("Training score:", train_score)
+print("Testing score:", test_score)
+
+# Plot residuals
+plt.scatter(y_train_pred, y_train_pred - y_train, c='blue', marker='o', label='Training data')
+plt.scatter(y_test_pred, y_test_pred - y_test, c='lightgreen', marker='s', label='Testing data')
+plt.xlabel('Predicted values')
+plt.ylabel('Residuals')
+plt.legend(loc='upper left')
+plt.hlines(y=0, xmin=0, xmax=50, lw=2, color='red')
+plt.title('Residual plot')
+plt.show()