fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

acsolle66 · 2023-10-19T10:53:22Z

This PR introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing
of incomplete records within a dataset, in correspondence with issue #23.

Changes Made:

Added 'handler' package under 'de.edux.data'
Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans:
DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE
Introduced 'IIncompleteRecordsHandler' interface
Implemented 'DropIncompleteRecordsHandler' class
Implemented 'AverageFillIncompleteRecordsHandler' class
Implemented 'DoNotHandleIncompleteRecords' class because
of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling()
Implemented tests to ensure the functionality of handlers
Revired the existing codebase to be compatible with the new enum

This commit introduces the 'handler' package under 'de.eudx.data' to handle the cleaning and processing of incomplete records within a dataset, in correspondence with issue Samyssmile#23. Changes Made: - Added 'handler' package under 'de.edux.data' - Introduced 'EIncompleteRecordsHandlerStratetgy' enum with the constans: DO_NOT_HANDLE, DROP_RECORDS, FILL_RECORDS_WITH_AVERAGE - Introduced 'IIncompleteRecordsHandler' interface - Implemented 'DropIncompleteRecordsHandler' class - Implemented 'AverageFillIncompleteRecordsHandler' class - Implemented 'DoNotHandleIncompleteRecords' class because of DataProcessorTest.testLoadDataWithoutNormalizationAndShuffling() - Started to write the tests for the above mentioned classes and their methods - Revired the existing codebase to be compatible with the new enum

…ecordsHandlers

Samyssmile · 2023-10-19T11:48:59Z

lib/src/main/java/de/edux/data/handler/DropIncompleteRecordsHandler.java


-    return dataset.stream().filter(this::containsOnlyCompletedFeatures).toList();
+    if (cleanedDataset.size() < dataset.size() * 0.5) {


What is the reason for 0.5?

Samyssmile · 2023-10-19T11:49:12Z

lib/src/main/java/de/edux/data/handler/DropIncompleteRecordsHandler.java

-    return dataset.stream().filter(this::containsOnlyCompletedFeatures).toList();
+    if (cleanedDataset.size() < dataset.size() * 0.5) {
+      throw new RuntimeException(
+          "More than 50% of the records will be dropped with this IncompleteRecordsHandlerStrategy. "


Samyssmile · 2023-10-19T11:50:23Z

lib/src/test/java/de/edux/data/handler/AverageFillIncompleteRecordHandlerTest.java

@@ -26,41 +26,51 @@ void initializeList() {
  void dropRecordsWithIncompleteCategoricalFeature() {


looks like drop test in AverageFill Test class. Do Drop test in Drop Test Class and AverageFill Tests in AverageFill Test Classes...

Samyssmile · 2023-10-19T11:51:49Z

lib/src/test/java/de/edux/data/handler/DropIncompleteRecordHandlerTest.java

  }

  @Test
-  void testDropThreeIncompleteResults() {
+  void testDropTwoIncompleteResult() {


pls rename the tests with this pattern "should*What Test should do" e.g. shouldDropTwoIncompleResults

Samyssmile · 2023-10-19T11:55:05Z

lib/src/main/java/de/edux/data/handler/EIncompleteRecordsHandlerStrategy.java

@@ -0,0 +1,17 @@
+package de.edux.data.handler;
+
+public enum EIncompleteRecordsHandlerStrategy {


In Java World wie never prefix enums with 'E'. As in isssue#23 described you need name it "Imputation" here.

Imputation .DROP_RECORDS....

Samyssmile · 2023-10-19T11:56:17Z

lib/src/main/java/de/edux/data/handler/IIncompleteRecordsHandler.java

+
+import java.util.List;
+
+public interface IIncompleteRecordsHandler {


IImputationHandler

Samyssmile · 2023-10-19T11:59:55Z

lib/src/main/java/de/edux/data/provider/DataProcessor.java

-    public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle, boolean filterIncompleteRecords) {
-        List<String[]> x = csvDataReader.readFile(csvFile, csvSeparator);
-        List<T> unmodifiableDataset = csvDataReader.readFile(csvFile, csvSeparator)
+    public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle, EIncompleteRecordsHandlerStrategy incompleteRecordHandlerStrategy) {


We need one more method here

public List<T> loadDataSetFromCSV(File csvFile, char csvSeparator, boolean normalize, boolean shuffle){ this.loadDataSetFromCSV(...., Imputation.Do Nothing)

acsolle66 and others added 3 commits October 19, 2023 00:06

Merge branch 'Samyssmile:main' into issue23-feat

b72612f

feat(Samyssmile#23): add exception handling and tests for incompleteR…

af01c5a

…ecordsHandlers

Samyssmile requested changes Oct 19, 2023

View reviewed changes

feat(Samyssmile#23): Prepare

8c969fd

Samyssmile closed this Oct 27, 2023

acsolle66 deleted the issue23-feat branch October 31, 2023 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

acsolle66 commented Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023

Samyssmile Oct 19, 2023


		return dataset.stream().filter(this::containsOnlyCompletedFeatures).toList();
		if (cleanedDataset.size() < dataset.size() * 0.5) {

		@@ -26,41 +26,51 @@ void initializeList() {
		void dropRecordsWithIncompleteCategoricalFeature() {

		@@ -0,0 +1,17 @@
		package de.edux.data.handler;

		public enum EIncompleteRecordsHandlerStrategy {


		import java.util.List;

		public interface IIncompleteRecordsHandler {

fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

fear(#23): Replace filterIncompleteRecords boolean with Imputation Enum for Enhanced Data Handling #61

Conversation

acsolle66 commented Oct 19, 2023

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment

Samyssmile Oct 19, 2023

Choose a reason for hiding this comment