-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Numeric Data Inspection and Introduce Positive/Negative Filtering #217
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
bfed010
add new py files
MooooCat 67daba2
update numeric inspector to support pos and neg
MooooCat 5b01193
Delete positive.py
MooooCat 16d55c5
Create positive_negative.py
MooooCat ff334d0
Update positive_negative.py
MooooCat d05cc0b
add test cases in test_filters_pos_neg.py
MooooCat 14df577
Update manager.py
MooooCat ce5b82a
translate comments
MooooCat 974367e
translate comments
MooooCat d438011
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] 6d8eaf6
fix fitted flat in PositiveNegativeFilter
MooooCat 8273c1b
Merge branch 'main' into feature-intro-rule-processor
MooooCat 76781fa
Update numeric.py
MooooCat 06eecdd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] cff7cc3
fix error in testcase
MooooCat 63dcafd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] d45b2a3
Update NumericInspector according to Wayland's Review
MooooCat bf7f6f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] b455f3f
Merge branch 'main' into feature-intro-rule-processor
MooooCat File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
from __future__ import annotations | ||
|
||
from sdgx.data_processors.base import DataProcessor | ||
|
||
|
||
class Filter(DataProcessor): | ||
""" | ||
Base class for all data filters. | ||
|
||
Filter is a module used to apply rules and remove sampled data that does not conform to the rules. | ||
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
from __future__ import annotations | ||
|
||
from typing import Any | ||
|
||
import pandas as pd | ||
|
||
from sdgx.data_models.metadata import Metadata | ||
from sdgx.data_processors.extension import hookimpl | ||
from sdgx.data_processors.filter.base import Filter | ||
from sdgx.utils import logger | ||
|
||
|
||
class PositiveNegativeFilter(Filter): | ||
""" | ||
A data processor for filtering positive and negative values. | ||
|
||
This filter is used to ensure that values in specific columns remain positive or negative. | ||
During the reverse conversion process, rows that do not meet the expected positivity or | ||
negativity will be removed. | ||
|
||
Attributes: | ||
int_columns (set): A set of column names containing integer values. | ||
float_columns (set): A set of column names containing float values. | ||
positive_columns (set): A set of column names that should contain positive values. | ||
negative_columns (set): A set of column names that should contain negative values. | ||
""" | ||
|
||
int_columns: set = set() | ||
""" | ||
A set of column names that contain integer values. | ||
""" | ||
|
||
float_columns: set = set() | ||
""" | ||
A set of column names that contain float values. | ||
""" | ||
|
||
positive_columns: set = set() | ||
""" | ||
A set of column names that are identified as containing positive numeric values. | ||
""" | ||
|
||
negative_columns: set = set() | ||
""" | ||
A set of column names that are identified as containing negative numeric values. | ||
""" | ||
|
||
def fit(self, metadata: Metadata | None = None, **kwargs: dict[str, Any]): | ||
""" | ||
Fit method for the data filter. | ||
""" | ||
logger.info("PositiveNegativeFilter Fitted.") | ||
|
||
# record int and float data | ||
self.int_columns = metadata.int_columns | ||
self.float_columns = metadata.float_columns | ||
|
||
# record pos and neg | ||
self.positive_columns = set(metadata.numeric_format["positive"]) | ||
self.negative_columns = set(metadata.numeric_format["negative"]) | ||
|
||
self.fitted = True | ||
|
||
def convert(self, raw_data: pd.DataFrame) -> pd.DataFrame: | ||
""" | ||
Convert method for data filter (No Action). | ||
""" | ||
|
||
logger.info("Converting data using PositiveNegativeFilter... Finished (No Action)") | ||
|
||
return raw_data | ||
|
||
def reverse_convert(self, processed_data: pd.DataFrame) -> pd.DataFrame: | ||
""" | ||
Reverse_convert method for the pos_neg data filter. | ||
|
||
Iterate through each row of data, check if there are negative values in positive_columns, | ||
or positive values in negative_columns. If the conditions are not met, discard the row. | ||
""" | ||
logger.info( | ||
f"Data reverse-converted by PositiveNegativeFilter Start with Shape: {processed_data.shape}." | ||
) | ||
|
||
# Create a boolean mask to mark the rows that need to be retained | ||
mask = pd.Series(True, index=processed_data.index) | ||
|
||
# Check positive_columns | ||
for col in self.positive_columns: | ||
if col in processed_data.columns: | ||
mask &= processed_data[col] >= 0 | ||
|
||
# Check negative_columns | ||
for col in self.negative_columns: | ||
if col in processed_data.columns: | ||
mask &= processed_data[col] <= 0 | ||
|
||
# Apply the mask to filter the data | ||
filtered_data = processed_data[mask] | ||
|
||
logger.info( | ||
f"Data reverse-converted by PositiveNegativeFilter with Output Shape: {filtered_data.shape}." | ||
) | ||
|
||
return filtered_data | ||
|
||
|
||
@hookimpl | ||
def register(manager): | ||
manager.register("PositiveNegativeFilter", PositiveNegativeFilter) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should indicate the PR and release version here, rather than the date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I also have another branch in development, I'll release after merging another PR. Due to various reasons, we haven't released a new version for a long time :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nerver mind, thanks for your work!