This document provides instructions for setting up the Python environment on a Linux system using Anaconda and Conda, and it outlines the steps for running a detailed data analysis and visualization of a movie dataset, which is available on Kaggle.
Creating a New Conda Environment:
To create a new isolated environment for the project:
conda create --name myproject python=3.12
This command creates a new environment named
myproject
with Python 3.12.Activating the Environment:
Activate the created environment with the following command:
conda activate myproject
This ensures that any Python operations or package installations are confined to this environment.
Installing Jupyter Lab:
Install Jupyter Lab to work with notebooks and code interactively:
conda install -c conda-forge jupyterlab
Jupyter Lab is installed from the Conda-Forge channel.
Starting Jupyter Lab:
To launch Jupyter Lab:
jupyter lab
This command opens Jupyter Lab in the default web browser.
Additional Package Installations:
Install additional packages required for data manipulation and visualization:
conda install pandas matplotlib seaborn
The core of this project involves analyzing and visualizing data from the movies.csv file, which can be downloaded from Kaggle at the following link:
- Kaggle Dataset: https://www.kaggle.com/datasets/danielgrijalvas/movies
The Python script processes this data to:
- Load and clean data: Handles missing data and corrects data types.
- Visualize relationships: Creates scatter plots, strip plots, and correlation matrices.
- Analyze trends: Identifies trends and high correlations within the movie industry data.
To run the analysis, navigate to the script in Jupyter Lab and execute the cells sequentially. Ensure the movies.csv file is correctly placed in your project directory.
Contributions to this project are welcome. Please ensure to maintain the environment specifications and follow the coding standards used in this project.
This project is licensed under the MIT License - see the LICENSE file for details.