-
-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: dist_plot only displaying first numerical column #242
FIX: dist_plot only displaying first numerical column #242
Conversation
Reviewer's Guide by SourceryThis pull request addresses the issue of dist_plot only displaying the first numerical column by adding a check for empty numeric columns. Additionally, it includes several refactorings for better readability, updates to pre-commit hooks, and improvements to test cases. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @akanz1 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟢 Security: all looks good
- 🟡 Testing: 3 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
clean_column_names(self.df_clean_column_names).columns[i] | ||
== expected_results[i] | ||
) | ||
assert clean_column_names(self.df_clean_column_names).columns[i] == expected_results[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add tests for edge cases in clean_column_names
Consider adding tests for edge cases such as when the DataFrame has no columns, columns with special characters, or very long column names. This will help ensure that the clean_column_names function handles all possible scenarios.
assert ( | ||
convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i] | ||
) | ||
assert convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add tests for edge cases in convert_datatypes
It would be useful to add tests for edge cases such as when the DataFrame has mixed data types, missing values, or very large numbers. This will ensure that the convert_datatypes function is robust and handles all possible scenarios.
@@ -93,8 +93,7 @@ def test_output_type(self): | |||
def test_output_shape(self): | |||
# Test for output dimensions | |||
assert ( | |||
corr_mat(self.data_corr_df).data.shape[0] | |||
== corr_mat(self.data_corr_df).data.shape[1] | |||
corr_mat(self.data_corr_df).data.shape[0] == corr_mat(self.data_corr_df).data.shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add tests for edge cases in corr_mat
Consider adding tests for edge cases such as when the input data is empty, contains NaNs, or has only one column. This will help ensure that the corr_mat function handles all possible scenarios.
corr_mat(self.data_corr_df).data.shape[0] == corr_mat(self.data_corr_df).data.shape[1] | |
assert ( | |
corr_mat(pd.DataFrame()).data.shape[0] == corr_mat(pd.DataFrame()).data.shape[1] | |
) | |
assert ( | |
corr_mat(pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, None]})).data.shape[0] == corr_mat(pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, None]})).data.shape[1] | |
) | |
assert ( | |
corr_mat(pd.DataFrame({"A": [1, 2, 3]})).data.shape[0] == corr_mat(pd.DataFrame({"A": [1, 2, 3]})).data.shape[1] | |
) |
for i, _ in enumerate(expected_results): | ||
assert ( | ||
clean_column_names(self.df_clean_column_names).columns[i] | ||
== expected_results[i] | ||
) | ||
assert clean_column_names(self.df_clean_column_names).columns[i] == expected_results[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Avoid loops in tests. (no-loop-in-tests
)
Explanation
Avoid complex code, like loops, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
for i, _ in enumerate(expected_results): | ||
assert ( | ||
convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i] | ||
) | ||
assert convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Avoid loops in tests. (no-loop-in-tests
)
Explanation
Avoid complex code, like loops, in test functions.Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:
- loops
- conditionals
Some ways to fix this:
- Use parametrized tests to get rid of the loop.
- Move the complex logic into helpers.
- Move the complex part into pytest fixtures.
Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.
Software Engineering at Google / Don't Put Logic in Tests
@@ -41,7 +42,7 @@ def _optimize_floats(data: pd.Series | pd.DataFrame) -> pd.DataFrame: | |||
return data | |||
|
|||
|
|||
def clean_column_names(data: pd.DataFrame, hints: bool = True) -> pd.DataFrame: | |||
def clean_column_names(data: pd.DataFrame, *, hints: bool = True) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Low code quality found in clean_column_names - 24% (low-code-quality
)
Explanation
The quality score for this function is below the quality threshold of 25%.This score is a combination of the method length, cognitive complexity and working memory.
How can you solve this?
It might be worth refactoring this function to make it shorter and more readable.
- Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines. - Reduce nesting, perhaps by introducing guard clauses to return early.
- Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.
Quality Gate passedIssues Measures |
Summary by Sourcery
This pull request addresses a bug in the
dist_plot
function where only the first numerical column was displayed. It also includes several code refactorings for improved readability, updates to pre-commit hooks, enhanced test coverage, and the removal of an obsolete configuration file.dist_plot
where only the first numerical column was being displayed by adding a check for numeric columns and returning None if none are found.pre-commit-hooks
to v4.6.0 andruff-pre-commit
to v0.5.5._corr_selector
by adding new assertions.readthedocs.yml
file as it is no longer needed.