Before the beginning:
- "README.txt" - Contains the project itself with visualizations. It is assumed that the reader will view it as the main input file.
- "DataFrames" - Contains all data sets.
- "Images" - Contains all the pictures "README.txt".
- "Code_Visual" - Contains the code for the entire visual.
- "Code_Regression" - Contains the regression algorithm itself.
p.s. Additional notes have been made throughout the code for ease of reading and understanding.
- Introduction
- A task
- Exploratory Data Analysis (Data analysis, more details can be viewed in Code_Visuals.py)
- Search for correlations
- Visualization
- Linear Regression (More details can be viewed in Code_Regression.py)
- Model training
- Prediction
- Total
- Sources
A selection of data on used cars and their prices from the Kaggle website.
Analyze the dataset and make a prediction of a random sample.
After cleaning the data, it was necessary to highlight the main dependencies and conduct a small analysis of the relationships. You also need to be sure that they make sense. To start the analysis, I chose a graph to visualize correlations.
On this chart, we should immediately note several interesting correlations with the "current price".
- "km"
- "on road now"
- "on road old"
Here we test for negative correlation. On this chart, we confirm for ourselves the direct dependence of "current price" on "km" and the larger the second value, the smaller the first value.
Further, we have a not entirely obvious correlation between "current price" and "on road now", but still it exists.
As in the previous graph, there is a correlation, but minimal.
After splitting the sample into training and test, we train on the selected data. Then we predict our test sample "Y test".
Our linear regression is able to predict a random sample of used car data. We also determined that the distance traveled by it has the greatest influence on the cost of the car.
- Dataset : https://www.kaggle.com/mayankpatel14/second-hand-used-cars-data-set-linear-regression
- Seaborn documentation : https://seaborn.pydata.org/introduction.html
- Sklearn documentation : https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html