-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(#23) Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #72
Conversation
This commit introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing of incomplete records within a dataset, in correspondence with issue Samyssmile#23. Changes Made: - Added 'handler' package under 'de.edux.data' - Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans: DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE - Introduced 'IIncompleteRecordsHandler' interface - Implemented 'DropIncompleteRecordsHandler' class - Implemented 'AverageFillIncompleteRecordsHandler' class - Implemented 'DoNotHandleIncompleteRecords' class because of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling() - Started to write the tests for the above mentioned classes and their methods - Revired the existing codebase to be compatible with the new enum
…m for Enhanced Data Handling Implemented the ImputationStrategy enum with constants of average and mode and the corresponding classes.
|
||
@Override | ||
public double[][] getTestFeatures( int[] inputColumns) { | ||
return getInputs(testData, inputColumns); | ||
public void drop_incomplete_records() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java we dont use snake case. Use camelCase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use snake case only for constants private final static String MY_CONSTANT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have writed a few lines in python before, and it seems I forgot how to name variables in Java 😅
public double[][] getTestFeatures( int[] inputColumns) { | ||
return getInputs(testData, inputColumns); | ||
public void drop_incomplete_records() { | ||
dataset = dataset.stream().filter((record) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not mix fuctional way and classic way. I suggest straight function way here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean ?
@@ -1,5 +1,17 @@ | |||
package de.edux.functions.imputation; | |||
|
|||
public enum ImputationStrategy { | |||
DUMMY, MEAN, AVERAGE, MODE; | |||
//TODO: DUMMY, MEAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
|
||
@Test | ||
void performImputationWithNumericalValuesTest() { | ||
String[] numerical_features_with_missing_values = {"1", "","2", "3", "", "4"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase...
skipHead(); | ||
} | ||
|
||
List<String> uniqueClasses = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use List<String> uniqueClasses = dataset.stream().map(row -> row[targetColumn]).distinct().toList();
Average imputation throwed RuntimeException beacuse the condition in the isDigit() method was returning false for blank values. Fixed with an or operator.
No description provided.