ASP.NET Core Web API application for multi-parameter data analysis through similarity calculation and clustering algorithms. Processes datasets with mixed numerical and categorical parameters using custom-implemented algorithms.
- Framework: ASP.NET Core Web API (.NET 9.0)
- Authentication: ASP.NET Core Identity with JWT tokens
- Database: PostgreSQL with Entity Framework Core
- Cache: Redis for distributed caching
- Containerization: Docker & Docker Compose
- AutoMapper: Object-to-object mapping
- Entity Framework Core: ORM and database migrations
- JWT Authentication: Token-based security
- Custom Implementation: All clustering algorithms and similarity calculations
- Unit Testing: xUnit, Moq, AutoFixture, AutoMoq
- Integration Testing: TestContainers with PostgreSQL and Redis
- CI/CD: GitHub Actions with automated testing and deployment
- JSON-based dataset creation via REST API
- Support for numerical and categorical parameters
- Automatic parameter type detection during import
- Data validation ensuring type consistency across parameters
- Parameter weighting (0.1-10) and activation/deactivation
Full pairwise comparison algorithm calculating similarity scores (0-1) between all objects in a dataset. Supports:
- Mixed data types (numerical and categorical)
- Comma-separated values in categorical parameters
- Configurable parameter weights and activation
- Optional parameter inclusion in results
Three custom-implemented clustering methods with:
- MinMax normalization for numerical data
- One-Hot encoding for categorical data
- Multiple distance metrics (Euclidean, Manhattan, Cosine for numerical; Hamming, Jaccard for categorical)
- PCA dimensionality reduction for 2D visualization coordinates
- K-Means: Configurable clusters (2-100), iterations (10-1000)
- DBSCAN: Epsilon (0.01-1.0), minimum points (1-20), noise detection
- Hierarchical Agglomerative: Merge threshold (0.01-1.0), bottom-up approach
- Hash-based caching using MD5 of request parameters
- Multi-layer storage: Redis cache + PostgreSQL persistence
- Result retrieval by dataset, algorithm type, or globally
- Automatic cache invalidation and cleanup
Base Route: /api/auth
| Method | Endpoint | Description | Auth Required |
|---|---|---|---|
| POST | /login |
User authentication with credentials | No |
| POST | /register |
User registration | No |
| POST | /logout |
User logout | Yes |
Base Route: /api/datasets (requires authentication)
| Method | Endpoint | Description | Permission |
|---|---|---|---|
| GET | / |
Retrieve all datasets | User/Admin |
| GET | /{id} |
Get dataset by ID | User/Admin |
| POST | / |
Create new dataset from JSON | User/Admin |
| DELETE | /{id} |
Delete dataset | Admin only |
Base Routes: /api/analysis/similarity, /api/analysis/clustering (requires authentication)
POST /api/analysis/similarity/{datasetId}
Request Body (optional):
{
"parameterSettings": [
{
"parameterId": 1,
"weight": 1.5,
"isActive": true
}
],
"includeParameters": false
}Response: List of similarity pairs with scores (0-1 range)
POST /api/analysis/clustering/kmeans/{datasetId}
POST /api/analysis/clustering/dbscan/{datasetId}
POST /api/analysis/clustering/agglomerative/{datasetId}
K-Means Request:
{
"numberOfClusters": 5,
"maxIterations": 200,
"numericMetric": "Euclidean",
"categoricalMetric": "Hamming",
"parameterSettings": [...]
}DBSCAN Request:
{
"epsilon": 0.2,
"minPoints": 2,
"numericMetric": "Euclidean",
"categoricalMetric": "Hamming",
"parameterSettings": [...]
}Agglomerative Request:
{
"threshold": 0.2,
"numericMetric": "Euclidean",
"categoricalMetric": "Hamming",
"parameterSettings": [...]
}Clustering Response: Clusters with object assignments and 2D coordinates for visualization
Base Routes: /api/results/similarity, /api/results/clustering (requires authentication)
| Method | Endpoint | Description |
|---|---|---|
| GET | / |
All analysis results of specified type |
| GET | /dataset/{datasetId} |
Results by dataset |
| GET | /dataset/{datasetId}/algorithm/{algorithm} |
Results by dataset and algorithm (clustering only) |
- Data Preprocessing: Filter active parameters and calculate numerical ranges
- Pair Generation: Create all unique object pairs (no self-comparison)
- Parameter Comparison:
- Numerical:
similarity = 1 - |value1 - value2| / parameter_range - Categorical: Jaccard coefficient on comma-separated values as sets
- Numerical:
- Weighted Aggregation: Sum weighted similarities divided by total weights
- Result: Final similarity score (0-1) per object pair
- Data Normalization: MinMax for numerical, One-Hot for categorical parameters
- Distance Calculation: Apply selected metrics to normalized data
- Cluster Formation: Execute chosen algorithm with specified parameters
- Dimensionality Reduction: Apply PCA for 2D coordinate generation
- Result Assembly: Package clusters with object coordinates
Numerical Metrics:
- Euclidean:
sqrt(sum((a - b)²)) / max_possible_distance - Manhattan:
sum(|a - b|) / vector_length - Cosine:
1 - (dot_product / (|A| * |B|))
Categorical Metrics:
- Hamming:
different_elements / total_elements - Jaccard:
1 - (intersection_size / union_size)
- ApplicationUser: Extends IdentityUser with FirstName, LastName, RegisteredDate
- Roles: Default user role and admin role with policy-based authorization
- JWT: Token-based authentication with configurable expiration
The application processes datasets similar to the Fragile States Index (https://fragilestatesindex.org/excel/). Example dataset available in DataAnalyzeApi.Integration/Data/ directory in JSON format, demonstrating the supported structure for multi-parameter country data.
Testing Stage:
- Runs on Ubuntu with PostgreSQL and Redis services
- Executes unit and integration tests
- Validates build in Release configuration
Deployment Stage (master branch only):
- Builds Docker images on self-hosted runner
- Stops existing containers
- Deploys updated stack
- Cleans unused Docker images
Docker Stack (7 containers):
services:
data-analyze-api: # Main application (port 8080)
data-analyze-frontend: # Web interface
postgresql: # Database with persistent volumes
redis: # Distributed cache
pgadmin: # Database management interface
nginx-proxy: # Reverse proxy and API gateway
portainer: # Container management# Ensure Docker images are available on server
# Then start the complete stack
docker-compose up -d
# API available at: http://your-server/api/
# Swagger docs at: http://your-server/swagger/Nginx Configuration:
- Routes
/api/*to backend application - Routes
/swagger*to API documentation - Routes all other traffic to frontend
- Handles CORS headers for cross-domain requests
- Comprehensive coverage of services, mappers, and business logic
- Technologies: xUnit, Moq, AutoFixture, AutoMoq
- Focus on mathematical algorithms and data processing logic
- End-to-end API workflow testing with TestContainers
- Isolated PostgreSQL and Redis instances per test
- Real dataset processing with Fragile States Index data
- Complete authentication and authorization flow validation