The Statistical Data Analyzer is a Python-based application designed for quick and efficient statistical analysis of datasets. It provides users with tools for descriptive statistics, data visualization, hypothesis testing, and correlation analysis. The tool is user-friendly and accessible through a command-line interface.
-
Dataset Summary:
- Provides an overview of the dataset structure, including column types and missing values.
- Generates descriptive statistics (mean, median, standard deviation, etc.).
-
Data Visualization:
- Visualize the distribution of data in individual columns using histograms and density plots.
-
Hypothesis Testing:
- Perform one-sample t-tests to evaluate if the mean of a column differs from a specified value.
-
Correlation Analysis:
- Generate a heatmap to visualize correlations between numerical variables.
- Python 3.6 or higher
- Required libraries:
pandas
,numpy
,scipy
,matplotlib
,seaborn
- Clone this repository:
git clone https://github.com/your-username/StatisticalDataAnalyzer.git
- Navigate to the project directory:
cd StatisticalDataAnalyzer
- Install the required dependencies:
pip install -r requirements.txt
- Run the script:
python statistical_data_analyzer.py
- Follow the prompts to:
- Load your dataset (CSV format).
- Perform desired analyses (e.g., summary, visualizations, tests).
Enter the path to your CSV file: data.csv
Options:
1. Display dataset summary
2. Visualize data distribution
3. Perform hypothesis test
4. Visualize correlation matrix
5. Exit
Enter your choice: 1
--- Dataset Summary ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
...
--- Descriptive Statistics ---
Column1 Column2
mean ... ...
std ... ...
...
Contributions are welcome! To contribute:
- Fork the repository.
- Create a feature branch:
git checkout -b feature-name
- Commit your changes and push the branch:
git push origin feature-name
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE
file for details.
- Author: [Your Name]
- Email: [Your Email]
- GitHub: https://github.com/your-username
- Add support for additional statistical tests (e.g., ANOVA, chi-square).
- Include time series analysis features.
- Integrate with Jupyter Notebook for enhanced interactivity.