This Streamlit app allows you to upload two Excel files containing database metadata and automatically maps columns based on the similarity of their descriptions. It also generates a function to perform the data mapping between the columns.
- Upload Excel Files: Upload two Excel files with metadata (Table Name, Column Name, Description).
- Automatic Column Mapping: Maps columns based on the similarity of their descriptions using TF-IDF vectorization and cosine similarity.
- Visualization: Displays the column mappings and a heatmap of the similarity scores.
- Generated Function: Provides a Python function to map data from one DataFrame to another based on the column mappings.
- Python 3.6 or higher
- pip package manager
-
Clone the Repository
git clone https://github.com/your-username/database-column-mapper.git
-
Navigate to the Project Directory
cd database-column-mapper
-
Install Dependencies
Install the required Python packages using:
pip install -r requirements.txt
-
Run the Streamlit App
streamlit run app.py
This will open the app in your default web browser.
-
Prepare Your Excel Files
- Each Excel file should contain three columns:
Table Name
Column Name
Description
- You can use the provided
transaction_metadata.xlsx
anddeal_metadata.xlsx
files as examples.
- Each Excel file should contain three columns:
-
Upload the Files
- In the app interface, use the file uploaders to upload your two Excel files.
-
View Column Mappings
- After uploading, the app will display the column mappings based on the similarity of the descriptions.
-
Visualize Similarity Scores
- A heatmap will show the similarity scores between each pair of columns.
-
Get the Mapping Function
- The app will generate and display a Python function that can be used to map data from one DataFrame to another based on the column mappings.
Sample Excel files are provided to test the app:
transaction_metadata.xlsx
deal_metadata.xlsx
- Python Packages:
streamlit
pandas
scikit-learn
matplotlib
seaborn
openpyxl
Install them using:
pip install -r requirements.txt
database-column-mapper/
│
├── app.py # Main Streamlit app code
├── requirements.txt # List of required Python packages
├── transaction_metadata.xlsx # Sample metadata Excel file for the transaction table
├── deal_metadata.xlsx # Sample metadata Excel file for the deal table
└── README.md # Project documentation
This project is licensed under the MIT License.
For any questions or suggestions, please open an issue or contact me.