Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

Closed
wants to merge 4 commits into from

Conversation

acsolle66
Copy link
Collaborator

This PR introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing
of incomplete records within a dataset, in correspondence with issue #23.

Changes Made:

  • Added 'handler' package under 'de.edux.data'
  • Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans:
    DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE
  • Introduced 'IIncompleteRecordsHandler' interface
  • Implemented 'DropIncompleteRecordsHandler' class
  • Implemented 'AverageFillIncompleteRecordsHandler' class
  • Implemented 'DoNotHandleIncompleteRecords' class because
    of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling()
  • Implemented tests to ensure the functionality of handlers
  • Revired the existing codebase to be compatible with the new enum

acsolle66 and others added 3 commits October 19, 2023 00:06
This commit introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing
of incomplete records within a dataset, in correspondence with issue Samyssmile#23.

Changes Made:
- Added 'handler' package under 'de.edux.data'
- Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans:
  DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE
- Introduced 'IIncompleteRecordsHandler' interface
- Implemented 'DropIncompleteRecordsHandler' class
- Implemented 'AverageFillIncompleteRecordsHandler' class
- Implemented 'DoNotHandleIncompleteRecords' class because
  of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling()
- Started to write the tests for the above mentioned classes and their methods
- Revired the existing codebase to be compatible with the new enum

return dataset.stream().filter(this::containsOnlyCompletedFeatures).toList();
if (cleanedDataset.size() < dataset.size() * 0.5) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for 0.5?

return dataset.stream().filter(this::containsOnlyCompletedFeatures).toList();
if (cleanedDataset.size() < dataset.size() * 0.5) {
throw new RuntimeException(
"More than 50% of the records will be dropped with this IncompleteRecordsHandlerStrategy. "
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

@@ -26,41 +26,51 @@ void initializeList() {
void dropRecordsWithIncompleteCategoricalFeature() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like drop test in AverageFill Test class. Do Drop test in Drop Test Class and AverageFill Tests in AverageFill Test Classes...

}

@Test
void testDropThreeIncompleteResults() {
void testDropTwoIncompleteResult() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls rename the tests with this pattern "should*What Test should do" e.g. shouldDropTwoIncompleResults

@@ -0,0 +1,17 @@
package de.edux.data.handler;

public enum EIncompleteRecordsHandlerStrategy {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java World wie never prefix enums with 'E'. As in isssue#23 described you need name it "Imputation" here.

Imputation .DROP_RECORDS....


import java.util.List;

public interface IIncompleteRecordsHandler {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IImputationHandler

public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle, boolean filterIncompleteRecords) {
List<String[]> x = csvDataReader.readFile(csvFile, csvSeparator);
List<T> unmodifiableDataset = csvDataReader.readFile(csvFile, csvSeparator)
public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle, EIncompleteRecordsHandlerStrategy incompleteRecordHandlerStrategy) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need one more method here

public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle){ this.loadDataSetFromCSV(...., Imputation.Do Nothing)

@Samyssmile Samyssmile closed this Oct 27, 2023
@acsolle66 acsolle66 deleted the issue23-feat branch October 31, 2023 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants