Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(#23) Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #72

Merged
merged 6 commits into from
Oct 31, 2023

Conversation

acsolle66
Copy link
Collaborator

No description provided.

acsolle66 and others added 5 commits October 30, 2023 21:36
This commit introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing
of incomplete records within a dataset, in correspondence with issue Samyssmile#23.

Changes Made:
- Added 'handler' package under 'de.edux.data'
- Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans:
  DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE
- Introduced 'IIncompleteRecordsHandler' interface
- Implemented 'DropIncompleteRecordsHandler' class
- Implemented 'AverageFillIncompleteRecordsHandler' class
- Implemented 'DoNotHandleIncompleteRecords' class because
  of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling()
- Started to write the tests for the above mentioned classes and their methods
- Revired the existing codebase to be compatible with the new enum
…m for Enhanced Data Handling

Implemented the ImputationStrategy enum with constants of average and mode and the corresponding classes.

@Override
public double[][] getTestFeatures( int[] inputColumns) {
return getInputs(testData, inputColumns);
public void drop_incomplete_records() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java we dont use snake case. Use camelCase

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use snake case only for constants private final static String MY_CONSTANT

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have writed a few lines in python before, and it seems I forgot how to name variables in Java 😅

public double[][] getTestFeatures( int[] inputColumns) {
return getInputs(testData, inputColumns);
public void drop_incomplete_records() {
dataset = dataset.stream().filter((record) -> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not mix fuctional way and classic way. I suggest straight function way here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean ?

@@ -1,5 +1,17 @@
package de.edux.functions.imputation;

public enum ImputationStrategy {
DUMMY, MEAN, AVERAGE, MODE;
//TODO: DUMMY, MEAN
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


@Test
void performImputationWithNumericalValuesTest() {
String[] numerical_features_with_missing_values = {"1", "","2", "3", "", "4"};
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

camelCase...

skipHead();
}

List<String> uniqueClasses = new ArrayList<>();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use List<String> uniqueClasses = dataset.stream().map(row -> row[targetColumn]).distinct().toList();

Average imputation throwed RuntimeException beacuse the condition in the isDigit() method
was returning false for blank values. Fixed with an or operator.
@Samyssmile Samyssmile merged commit e6dd0c3 into Samyssmile:main Oct 31, 2023
3 checks passed
@acsolle66 acsolle66 deleted the feat#23 branch October 31, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants