Skip to content

Commit

Permalink
Release 1.0.6
Browse files Browse the repository at this point in the history
  • Loading branch information
Samuel Abramov committed Nov 2, 2023
1 parent 95cd40d commit 25d382d
Show file tree
Hide file tree
Showing 65 changed files with 1,950 additions and 823 deletions.
118 changes: 101 additions & 17 deletions README.MD
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
[![Build](https://github.com/Samyssmile/edux/actions/workflows/gradle.yml/badge.svg?branch=main)](https://github.com/Samyssmile/edux/actions/workflows/gradle.yml)
[![CodeQL](https://github.com/Samyssmile/edux/actions/workflows/codeql-analysis.yml/badge.svg?branch=main)](https://github.com/Samyssmile/edux/actions/workflows/codeql-analysis.yml)

# EDUX - Java Machine Learning Library

EDUX is a user-friendly library for solving problems with a machine learning approach.
Expand All @@ -8,18 +9,22 @@ EDUX is a user-friendly library for solving problems with a machine learning app

EDUX supports a variety of machine learning algorithms including:

- **Multilayer Perceptron (Neural Network):** Suitable for regression and classification problems, MLPs can approximate non-linear functions.
- **Multilayer Perceptron (Neural Network):** Suitable for regression and classification problems, MLPs can approximate
non-linear functions.
- **K Nearest Neighbors:** A simple, instance-based learning algorithm used for classification and regression.
- **Decision Tree:** Offers visual and explicitly laid out decision making based on input features.
- **Support Vector Machine:** Effective for binary classification, and can be adapted for multi-class problems.
- **RandomForest:** An ensemble method providing high accuracy through building multiple decision trees.

### Battle Royale - Which algorithm is the best?

We run all algorithms on the same dataset and compare the results.
[Benchmark](https://github.com/Samyssmile/edux/discussions/42)

## Goal
The main goal of this project is to create a user-friendly library for solving problems using a machine learning approach. The library is designed to be easy to use, enabling the solution of problems with just a few lines of code.

The main goal of this project is to create a user-friendly library for solving problems using a machine learning
approach. The library is designed to be easy to use, enabling the solution of problems with just a few lines of code.

## Features

Expand All @@ -31,43 +36,122 @@ The library currently supports:
- Support Vector Machine
- RandomForest

## Installation
## Get started

Include the library as a dependency in your Java project file.

### Gradle

```
implementation 'io.github.samyssmile:edux:1.0.5'
implementation 'io.github.samyssmile:edux:1.0.6'
```

### Maven

```
<dependency>
<groupId>io.github.samyssmile</groupId>
<artifactId>edux</artifactId>
<version>1.0.5</version>
<version>1.0.6</version>
</dependency>
```

## How to use this library
## Getting started tutorial

This section guides you through using EDUX to process your dataset, configure a multilayer perceptron (Multilayer Neural
Network), perform training and evaluation.

A multi-layer perceptron (MLP) is a feedforward artificial neural network that generates a set of outputs from a set of
input features. An MLP is characterized by several layers of input nodes connected as a directed graph between the input
and output layers.

![Neural Network](https://hc-linux.eu/github/iris-nn.png)

### Step 1: Data Processing

Firstly, we will load and prepare the IRIS dataset:

| sepal.length | sepal.width | petal.length | petal.width | variety |
|--------------|-------------|--------------|-------------|---------|
| 5.1 | 3.5 | 1.4 | 0.2 | Setosa |

```java
var featureColumnIndices=new int[]{0,1,2,3}; // Specify your feature columns
var targetColumnIndex=4; // Specify your target column

var dataProcessor=new DataProcessor(new CSVIDataReader());
var dataset=dataProcessor.loadDataSetFromCSV(
"path/to/your/data.csv", // Replace with your CSV file path
',', // CSV delimiter
true, // Whether to skip the header
featureColumnIndices,
targetColumnIndex
);
dataset.shuffle();
dataset.normalize();
dataProcessor.split(0.8); // Replace with your train-test split ratio
```

### Step 2: Preparing Training and Test Sets:

NetworkConfiguration networkConfiguration = new NetworkConfiguration(...ActivationFunction.LEAKY_RELU, ActivationFunction.SOFTMAX, LossFunction.CATEGORICAL_CROSS_ENTROPY, Initialization.XAVIER, Initialization.XAVIER);
MultilayerPerceptron multilayerPerceptron = new MultilayerPerceptron(features, labels, testFeatures, testLabels, networkConfiguration);
multilayerPerceptron.train();
Extract the features and labels for both training and test sets:

multilayerPerceptron.predict(...);
```java
var trainFeatures=dataProcessor.getTrainFeatures(featureColumnIndices);
var trainLabels=dataProcessor.getTrainLabels(targetColumnIndex);
var testFeatures=dataProcessor.getTestFeatures(featureColumnIndices);
var testLabels=dataProcessor.getTestLabels(targetColumnIndex);
```

### Step 3: Network Configuration

```java
var networkConfiguration=new NetworkConfiguration(
trainFeatures[0].length, // Number of input neurons
List.of(128,256,512), // Number of neurons in each hidden layer
3, // Number of output neurons
0.01, // Learning rate
300, // Number of epochs
ActivationFunction.LEAKY_RELU, // Activation function for hidden layers
ActivationFunction.SOFTMAX, // Activation function for output layer
LossFunction.CATEGORICAL_CROSS_ENTROPY, // Loss function
Initialization.XAVIER, // Weight initialization for hidden layers
Initialization.XAVIER // Weight initialization for output layer
);
```

### Step 4: Training and Evaluation

```java
MultilayerPerceptron multilayerPerceptron=new MultilayerPerceptron(
networkConfiguration,
testFeatures,
testLabels
);
multilayerPerceptron.train(trainFeatures,trainLabels);
multilayerPerceptron.evaluate(testFeatures,testLabels);
```

### Results

```output
...
MultilayerPerceptron - Best accuracy after restoring best MLP model: 98,56%
```

### Working examples
You can find working examples for all algorithms in the [examples](https://github.com/Samyssmile/edux/tree/main/example/src/main/java/de/example) folder.

In all examples the IRIS or Seaborn Pinguins datasets are used.
You can find more fully working examples for all algorithms in
the [examples](https://github.com/Samyssmile/edux/tree/main/example/src/main/java/de/example) folder.

#### Iris Dataset
The IRIS dataset is a multivariate dataset introduced by the British statistician and biologist Ronald Fisher in his 1936 paper *The use of multiple measurements in taxonomic problems*
For examples we use the

![Neural Network](https://hc-linux.eu/github/iris-nn.png)
## Contributions
* [IRIS dataset](https://archive.ics.uci.edu/ml/datasets/iris).
* [SEABORNE PENGUINS dataset](https://seaborn.pydata.org/archive/0.11/tutorial/function_overview.html).

## Contributions

Contributions are warmly welcomed! If you find a bug, please create an issue with a detailed description of the problem. If you wish to suggest an improvement or fix a bug, please make a pull request. Also checkout the [Rules and Guidelines](https://github.com/Samyssmile/edux/wiki/Rules-&-Guidelines-for-New-Developers) page for more information.
Contributions are warmly welcomed! If you find a bug, please create an issue with a detailed description of the problem.
If you wish to suggest an improvement or fix a bug, please make a pull request. Also checkout
the [Rules and Guidelines](https://github.com/Samyssmile/edux/wiki/Rules-&-Guidelines-for-New-Developers) page for more
information.
Loading

0 comments on commit 25d382d

Please sign in to comment.