Skip to content

Commit

Permalink
KNNCollectionModel added
Browse files Browse the repository at this point in the history
  • Loading branch information
JohannGebhardt committed Sep 13, 2013
1 parent 297752a commit f981026
Show file tree
Hide file tree
Showing 14 changed files with 271 additions and 44 deletions.
4 changes: 2 additions & 2 deletions build.properties
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
extension.version=2
extension.revision=0
extension.update=004
extension.revision=1
extension.update=000
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,16 @@
algorithm proposed by Ramaswamy et al (2000) by setting the
corresponding parameter. The outlier score is calculated according
to the measure type selected. The higher the outlier the more
anomalous the instance is.
anomalous the instance is.

The operator is also able to read and write a model containing the k
nearest neighbors set. Typically, 99% of the execution time is used to
compute the neighbors such that is a good idea to store the model, for
example, when looping over a parameter. The operator checks whether the
model and the ExampleSet fit together. The model can be used for any of
the nearest-neighbor based algorithms. The parameter k used to create
the model needs to be the same or larger as the parameter k specified in
the operator. Otherwise, the model is re-computed.
</p>
</help>
</operator>
Expand Down Expand Up @@ -43,7 +52,16 @@
taken as the final LOF score.

A normal instance has an outlier value of approximately 1, while
outliers have values greater than 1.
outliers have values greater than 1.

The operator is also able to read and write a model containing the k
nearest neighbors set. Typically, 99% of the execution time is used to
compute the neighbors such that is a good idea to store the model, for
example, when looping over a parameter. The operator checks whether the
model and the ExampleSet fit together. The model can be used for any of
the nearest-neighbor based algorithms. The parameter k used to create
the model needs to be the same or larger as the parameter k specified in
the operator. Otherwise, the model is re-computed.
</p>
</help>
</operator>
Expand All @@ -61,6 +79,15 @@
comparisons. LoOP is also based on the nearest neighbors set.
The definition of the k-distance used is the same as the one
proposed by Breunig et al [1999; 2000] to handle duplicates.

The operator is also able to read and write a model containing the k
nearest neighbors set. Typically, 99% of the execution time is used to
compute the neighbors such that is a good idea to store the model, for
example, when looping over a parameter. The operator checks whether the
model and the ExampleSet fit together. The model can be used for any of
the nearest-neighbor based algorithms. The parameter k used to create
the model needs to be the same or larger as the parameter k specified in
the operator. Otherwise, the model is re-computed.
</p>
</help>
</operator>
Expand Down Expand Up @@ -98,6 +125,15 @@
one proposed by Breunig et al [1999; 2000] to handle duplicates. The
normal instances will have an outlier score of approximately 1,
while outliers have a value greater than 1.

The operator is also able to read and write a model containing the k
nearest neighbors set. Typically, 99% of the execution time is used to
compute the neighbors such that is a good idea to store the model, for
example, when looping over a parameter. The operator checks whether the
model and the ExampleSet fit together. The model can be used for any of
the nearest-neighbor based algorithms. The parameter k used to create
the model needs to be the same or larger as the parameter k specified in
the operator. Otherwise, the model is re-computed.
</p>
</help>
</operator>
Expand All @@ -114,6 +150,15 @@
the one proposed by Breunig et al [1999; 2000] to handle duplicates.
The normal instances will have an outlier score of approximately 1,
while outliers have a value greater than 1.

The operator is also able to read and write a model containing the k
nearest neighbors set. Typically, 99% of the execution time is used to
compute the neighbors such that is a good idea to store the model, for
example, when looping over a parameter. The operator checks whether the
model and the ExampleSet fit together. The model can be used for any of
the nearest-neighbor based algorithms. The parameter k used to create
the model needs to be the same or larger as the parameter k specified in
the operator. Otherwise, the model is re-computed.
</p>
</help>
</operator>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,20 @@
*
*/
public class COFEvaluator extends KNNEvaluator {

private int n;
private int k;
private boolean newCollection;
public COFEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger);
}

public COFEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger,int n, int k,boolean newCollection) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger, n, k , newCollection);
this.n = n;
this.k = k;
this.newCollection = newCollection;
}
/**
*
* The methods implements the COF algorithm.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,16 @@
public class INFLOEvaluator extends KNNEvaluator {


private boolean newCollection;
public INFLOEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger);
}
public INFLOEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger,int n, int k , boolean newCollection) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger,n,k,newCollection);
this.newCollection = newCollection;
}

public double [] evaluate() {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
*/
package de.dfki.madm.anomalydetection.evaluator.nearest_neighbor_based;

import java.io.Serializable;
import java.util.LinkedList;

/**
Expand All @@ -30,7 +31,12 @@
* @author Mennatallah Amer
*
*/
public class KNNCollection {
public class KNNCollection implements Serializable {
/**
* Change this if the object changes
*/
private static final long serialVersionUID = 123456L;

/** The size of the data **/
int n;

Expand Down Expand Up @@ -215,5 +221,12 @@ public void updateNearestNeighbors(int point1, int point2,
}
}
}

public static KNNCollection clone(KNNCollection a){
KNNCollection ret = new KNNCollection(a.n,a.k,a.points,a.weight);
ret.neighborIndicies = a.neighborIndicies.clone();
ret.neighborDistances = a.neighborDistances.clone();
ret.numberOfNeighbors = a.numberOfNeighbors.clone();
ret.kdistNeighbors = a.kdistNeighbors.clone();
return ret;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
/*
* RapidMiner Anomaly Detection Extension
*
* Copyright (C) 2009-2013 by Deutsches Forschungszentrum fuer
* Kuenstliche Intelligenz GmbH or its licensors, as applicable.
*
* This is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* You should have received a copy of the GNU Affero General Public License
* along with this software. If not, see <http://www.gnu.org/licenses/.
*
* Author: Johann Gebhardt
* Responsible: Markus Goldstein (Markus.Goldstein@dfki.de)
*
* URL: http://madm.dfki.de/rapidminer/anomalydetection
*/
package de.dfki.madm.anomalydetection.evaluator.nearest_neighbor_based;

import com.rapidminer.operator.AbstractModel;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.tools.math.similarity.DistanceMeasure;
/**
*
* This class is used to save the knnCollection as a RapidMiner model.
*
* @author Johann Gebhardt
*
*/
public class KNNCollectionModel extends AbstractModel{
/**
* Change this if the object changes
*/
private static final long serialVersionUID = -695692136502022L;
/** the saved knnCollection*/
KNNCollection knnCollection;
/** the distanceMeasure used to create the model*/
public DistanceMeasure measure;
/** returns the knnCollection*/
public KNNCollection get(){
return this.knnCollection;
}
public KNNCollectionModel(ExampleSet trainingExampleSet, KNNCollection col,DistanceMeasure measure){
super(trainingExampleSet);
this.knnCollection = col;
this.measure = measure;
}
public ExampleSet apply(ExampleSet tmp){
return tmp;

}
@Override
public String toString() {
return getName() + " model with k = " +knnCollection.getK();
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,12 @@ public void run() {
for (int j = 0; j < n; j++) {
if (i == j)
continue;
if(newCollection){
double currentDistance = measure.calculateDistance(
knnCollection.getPoints()[i], knnCollection
.getPoints()[j]);
knnCollection.updateNearestNeighbors(i, j, currentDistance);

}
}
}
if (logger != null)
Expand Down Expand Up @@ -107,8 +108,8 @@ public void run() {
if (logger != null)
logger.logNote("Thread " + start + " " + end + " started!");
for (int i = start; i <= end; i++) {
for (int j = 0; j < i; j++) {

for (int j = 0; j <= i; j++) {
if(newCollection){
double currentDistance = measure.calculateDistance(
knnCollection.getPoints()[i], knnCollection
.getPoints()[j]);
Expand All @@ -120,6 +121,7 @@ public void run() {
knnCollection.updateNearestNeighbors(j, i,
currentDistance);
}
}
}

}
Expand All @@ -145,7 +147,7 @@ public void run() {
private Operator logger;
protected boolean parallel;
protected int numberOfThreads;

boolean newCollection = false;
public KNNEvaluator(KNNCollection knnCollection, boolean kth,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger) {
this.knnCollection = knnCollection;
Expand All @@ -158,6 +160,19 @@ public KNNEvaluator(KNNCollection knnCollection, boolean kth,
this.logger = logger;
res = new double[n];
}
public KNNEvaluator(KNNCollection knnCollection, boolean kth,
DistanceMeasure measure, boolean parallel, int numberOfThreads, Operator logger,int n, int k,boolean newCollection) {
this.knnCollection = knnCollection;
this.measure = measure;
this.kth = kth;
this.n = knnCollection.getN();
this.k = knnCollection.getK();
this.parallel = parallel;
this.numberOfThreads = numberOfThreads;
this.logger = logger;
res = new double[n];
this.newCollection = newCollection;
}

/**
* To start the evaluation process.
Expand Down Expand Up @@ -270,14 +285,14 @@ public void run() {

private void KNNSeq() {
for (int i = 0; i < n; i++) {
for (int j = i + 1; j < n; j++) {

double currentDistance = measure.calculateDistance(
knnCollection.getPoints()[i],
knnCollection.getPoints()[j]);
knnCollection.updateNearestNeighbors(i, j, currentDistance);
knnCollection.updateNearestNeighbors(j, i, currentDistance);

for (int j = i + 1; j <n; j++) {
if(newCollection) {
double currentDistance = measure.calculateDistance(
knnCollection.getPoints()[i],
knnCollection.getPoints()[j]);
knnCollection.updateNearestNeighbors(i, j, currentDistance);
knnCollection.updateNearestNeighbors(j, i, currentDistance);
}
}
setAnomalyScore(i, knnCollection.getNeighBorDistanceSoFar()[i],
knnCollection.getNeighBorIndiciesSoFar()[i], knnCollection
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@

import java.util.LinkedList;

import com.rapidminer.example.ExampleSet;
import com.rapidminer.operator.Operator;
import com.rapidminer.operator.ports.OutputPort;
import com.rapidminer.tools.math.similarity.DistanceMeasure;

/**
Expand All @@ -31,14 +33,20 @@
*
*/
public class LOFEvaluator extends KNNEvaluator {

public KNNCollection savedCollection;
private int minK;

public LOFEvaluator(int minK, KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfthreads, Operator logger) {
super(knnCollection, false, measure, parallel, numberOfthreads, logger);
this.minK = minK;

}public LOFEvaluator(int minK, KNNCollection knnCollection,
DistanceMeasure measure, boolean parallel, int numberOfthreads, Operator logger,int n , int k , boolean newCollection
) {
super(knnCollection, false, measure, parallel, numberOfthreads, logger,n,k,newCollection);
this.minK = minK;
this.newCollection = newCollection;

}
/**
* The method is overridden to avoid the extra unnecessary work done
Expand All @@ -52,6 +60,7 @@ protected void setAnomalyScore(int i, double[] neighBorDistanceSoFar,
@Override
public double[] evaluate() {
super.evaluate();

double[] lof = lof();
return lof;
}
Expand All @@ -65,6 +74,7 @@ public double[] reEvaluate(int step) {
return lof;
}


private double [] lof(){
double [] lof = new double[getN()];
double [] lrd= new double[getN()];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,22 @@
public class LoOPEvaluator extends KNNEvaluator {

private double lambda;

private int n;
private int k;
private boolean newCollection;
public LoOPEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, double lambda, boolean parallel, int numberOfThreads, Operator logger) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger);
this.lambda = lambda;
}

public LoOPEvaluator(KNNCollection knnCollection,
DistanceMeasure measure, double lambda, boolean parallel, int numberOfThreads, Operator logger,int n , int k , boolean newCollection) {
super(knnCollection, false, measure, parallel, numberOfThreads, logger,n ,k, newCollection);
this.lambda = lambda;
this.n = n;
this.k = k;
this.newCollection = newCollection;
}
/** The method is overridden to avoid doing extra computation **/
@Override
protected void setAnomalyScore(int i, double[] neighBorDistanceSoFar,
Expand Down
Loading

0 comments on commit f981026

Please sign in to comment.