AnalytiQ is a data-centric application built using Streamlit. It offers a wide range of functionalities, such as applying data quality rules, performing data analysis, manipulations, preprocessing datasets, and leveraging these datasets to build machine learning models with the help of AutoML and generative AI.
AnalytiQ provides the following features:
- Dataset Management: Upload, manage, and version datasets with ease.
- Data Quality Rules: Define and apply customizable rules to ensure the quality of your datasets.
- Data Analysis: Perform detailed univariate, bivariate, multivariate, and correlation analyses on your data.
- Data Manipulation: Modify your datasets by renaming columns, handling missing values, performing transformations, and applying complex formulas.
- Preprocessing: Preprocess your data for machine learning tasks using one-hot encoding, scaling, and other techniques.
- Machine Learning: Utilize the power of AutoML and generative AI to train models directly within the application.
To run AnalytiQ on your local machine, follow these steps:
-
Clone the Repository:
git clone https://github.com/Data-Quotient/analytiq.git cd analytiq
-
Install the Required Packages: Install the dependencies using pip:
pip install -r requirements.txt
-
Run the Application:
streamlit run app.py
AnalytiQ will be available at http://localhost:8501
in your browser.
AnalytiQ uses the OpenAI API for its generative AI functionalities. To configure the OpenAI API key:
-
Create the
.streamlit
folder:mkdir -p .streamlit
-
Create the
secrets.toml
file in the.streamlit
folder:touch .streamlit/secrets.toml
-
Add your OpenAI API key to the
secrets.toml
file:openai_api_key = "your_openai_api_key_here"
Make sure you replace "your_openai_api_key_here"
with your actual OpenAI API key.
- View a summary of your datasets.
- Get insights such as the number of rows, columns, missing values, and duplicates.
- Upload CSV files as datasets.
- Create multiple versions of a dataset with options to apply different manipulations.
- Merge datasets or work with specific versions for detailed analysis.
- Define and apply rules to your datasets to ensure consistency and accuracy.
- Examples include null checks, unique value constraints, and custom lambda rules.
- Perform various types of analyses, such as:
- Univariate Analysis: Analyze individual variables.
- Bivariate and Multivariate Analysis: Understand relationships between multiple variables.
- Correlation Analysis: Discover correlations between features.
- View summaries of your datasets and generate visualizations.
- Perform transformations on your dataset, including:
- Renaming columns.
- Handling missing data.
- Applying complex formulas.
- Apply preprocessing techniques such as encoding, scaling, and more to prepare data for machine learning tasks.
- Use the integrated AutoML feature to train models with minimal manual effort.
- Build, train, and evaluate machine learning models using generative AI.
- Save the trained models for future use and download them as pickle files.
You can add your own datasets or use the provided sample datasets to experiment with AnalytiQ. To add a dataset:
- Navigate to the
Manage Datasets
tab. - Upload a CSV file.
- Apply versioning, manipulations, and analyses as needed.
We welcome contributions! To contribute:
- Fork the repository.
- Create a new feature branch:
git checkout -b feature-name
. - Commit your changes:
git commit -m 'Add some feature'
. - Push to the branch:
git push origin feature-name
. - Open a pull request.
Please make sure to update tests as appropriate.
Distributed under the MIT License. See LICENSE
for more information.