FIX: dist_plot only displaying first numerical column #242

akanz1 · 2024-07-28T13:06:07Z

Summary by Sourcery

This pull request addresses a bug in the dist_plot function where only the first numerical column was displayed. It also includes several code refactorings for improved readability, updates to pre-commit hooks, enhanced test coverage, and the removal of an obsolete configuration file.

Bug Fixes:
- Fixed issue in dist_plot where only the first numerical column was being displayed by adding a check for numeric columns and returning None if none are found.
Enhancements:
- Refactored multiple functions to improve readability by consolidating multi-line statements into single lines where appropriate.
- Added asterisks to function parameters to enforce keyword-only arguments in several functions for better clarity and usage.
Build:
- Updated pre-commit hooks to newer versions: pre-commit-hooks to v4.6.0 and ruff-pre-commit to v0.5.5.
Tests:
- Enhanced test coverage for _corr_selector by adding new assertions.
- Refactored test cases to improve readability and maintainability by consolidating multi-line assertions into single lines.
Chores:
- Removed the readthedocs.yml file as it is no longer needed.

sourcery-ai · 2024-07-28T13:06:18Z

Reviewer's Guide by Sourcery

This pull request addresses the issue of dist_plot only displaying the first numerical column by adding a check for empty numeric columns. Additionally, it includes several refactorings for better readability, updates to pre-commit hooks, and improvements to test cases.

File-Level Changes

Files	Changes
`src/klib/clean.py` `src/klib/describe.py` `src/klib/utils.py`	Refactored code for better readability and added positional-only argument markers in several functions.
`tests/test_util.py` `tests/test_clean.py` `tests/test_describe.py`	Refactored test assertions for better readability and added new test cases.
`.pre-commit-config.yaml`	Updated pre-commit hooks versions and added new arguments for ruff hook.

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @akanz1 - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 3 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-07-28T13:07:38Z

tests/test_clean.py

- clean_column_names(self.df_clean_column_names).columns[i]
- == expected_results[i]
- )
+ assert clean_column_names(self.df_clean_column_names).columns[i] == expected_results[i]


suggestion (testing): Add tests for edge cases in clean_column_names

Consider adding tests for edge cases such as when the DataFrame has no columns, columns with special characters, or very long column names. This will help ensure that the clean_column_names function handles all possible scenarios.

sourcery-ai · 2024-07-28T13:07:38Z

tests/test_clean.py

- assert (
- convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i]
- )
+ assert convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i]


suggestion (testing): Add tests for edge cases in convert_datatypes

It would be useful to add tests for edge cases such as when the DataFrame has mixed data types, missing values, or very large numbers. This will ensure that the convert_datatypes function is robust and handles all possible scenarios.

sourcery-ai · 2024-07-28T13:07:38Z

tests/test_describe.py

@@ -93,8 +93,7 @@ def test_output_type(self):
 def test_output_shape(self):
 # Test for output dimensions
 assert (
- corr_mat(self.data_corr_df).data.shape[0]
- == corr_mat(self.data_corr_df).data.shape[1]
+ corr_mat(self.data_corr_df).data.shape[0] == corr_mat(self.data_corr_df).data.shape[1]


suggestion (testing): Add tests for edge cases in corr_mat

Consider adding tests for edge cases such as when the input data is empty, contains NaNs, or has only one column. This will help ensure that the corr_mat function handles all possible scenarios.

Suggested change

corr_mat(self.data_corr_df).data.shape[0] == corr_mat(self.data_corr_df).data.shape[1]

assert (

corr_mat(pd.DataFrame()).data.shape[0] == corr_mat(pd.DataFrame()).data.shape[1]

)

assert (

corr_mat(pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, None]})).data.shape[0] == corr_mat(pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, None]})).data.shape[1]

)

assert (

corr_mat(pd.DataFrame({"A": [1, 2, 3]})).data.shape[0] == corr_mat(pd.DataFrame({"A": [1, 2, 3]})).data.shape[1]

)

sourcery-ai · 2024-07-28T13:07:38Z

tests/test_clean.py

 for i, _ in enumerate(expected_results):
- assert (
- clean_column_names(self.df_clean_column_names).columns[i]
- == expected_results[i]
- )
+ assert clean_column_names(self.df_clean_column_names).columns[i] == expected_results[i]


issue (code-quality): Avoid loops in tests. (no-loop-in-tests)

Explanation
Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

loops

conditionals

Some ways to fix this:

Use parametrized tests to get rid of the loop.

Move the complex logic into helpers.

Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

sourcery-ai · 2024-07-28T13:07:38Z

tests/test_clean.py

 for i, _ in enumerate(expected_results):
- assert (
- convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i]
- )
+ assert convert_datatypes(self.df_data_convert).dtypes[i] == expected_results[i]


issue (code-quality): Avoid loops in tests. (no-loop-in-tests)

Explanation
Avoid complex code, like loops, in test functions.
Google's software engineering guidelines says:
"Clear tests are trivially correct upon inspection"
To reach that avoid complex code in tests:

loops

conditionals

Some ways to fix this:

Use parametrized tests to get rid of the loop.

Move the complex logic into helpers.

Move the complex part into pytest fixtures.

Complexity is most often introduced in the form of logic. Logic is defined via the imperative parts of programming languages such as operators, loops, and conditionals. When a piece of code contains logic, you need to do a bit of mental computation to determine its result instead of just reading it off of the screen. It doesn't take much logic to make a test more difficult to reason about.

Software Engineering at Google / Don't Put Logic in Tests

sourcery-ai · 2024-07-28T13:07:39Z

src/klib/clean.py

@@ -41,7 +42,7 @@ def _optimize_floats(data: pd.Series | pd.DataFrame) -> pd.DataFrame:
 return data


-def clean_column_names(data: pd.DataFrame, hints: bool = True) -> pd.DataFrame:
+def clean_column_names(data: pd.DataFrame, *, hints: bool = True) -> pd.DataFrame:


issue (code-quality): Low code quality found in clean_column_names - 24% (low-code-quality)

Explanation
The quality score for this function is below the quality threshold of 25%.
This score is a combination of the method length, cognitive complexity and working memory.

How can you solve this?

It might be worth refactoring this function to make it shorter and more readable.

Reduce the function length by extracting pieces of functionality out into
their own functions. This is the most important thing you can do - ideally a
function should be less than 10 lines.

Reduce nesting, perhaps by introducing guard clauses to return early.

Ensure that variables are tightly scoped, so that code using related concepts
sits together within the function rather than being scattered.

sonarcloud · 2024-07-28T13:21:34Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

-

cdd03c1

akanz1 added the bug Something isn't working label Jul 28, 2024

akanz1 self-assigned this Jul 28, 2024

akanz1 linked an issue Jul 28, 2024 that may be closed by this pull request

[BUG] - ... The command -- klib.dist_plot(df) does not plot the distribution for all the numeric features of a Dataframa #241

Closed

misc

cc54f08

sourcery-ai bot reviewed Jul 28, 2024

View reviewed changes

akanz1 added 4 commits July 28, 2024 15:14

-

8534972

CI for 3.12

2f98417

update poetry lockfile

a1f7694

revert and do 3.12. support in a separate release

ca2c5cf

akanz1 merged commit 10c2af5 into main Jul 28, 2024
16 checks passed

akanz1 deleted the 241-bug-the-command-klibdist_plotdf-does-not-plot-the-distribution-for-all-the-numeric-features-of-a-dataframa branch July 28, 2024 13:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: dist_plot only displaying first numerical column #242

FIX: dist_plot only displaying first numerical column #242

akanz1 commented Jul 28, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 28, 2024 •

edited

Loading

sourcery-ai bot left a comment

sourcery-ai bot Jul 28, 2024

sourcery-ai bot Jul 28, 2024

sourcery-ai bot Jul 28, 2024

sourcery-ai bot Jul 28, 2024

sourcery-ai bot Jul 28, 2024

sourcery-ai bot Jul 28, 2024

sonarcloud bot commented Jul 28, 2024

FIX: dist_plot only displaying first numerical column #242

FIX: dist_plot only displaying first numerical column #242

Conversation

akanz1 commented Jul 28, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Jul 28, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 28, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Jul 28, 2024

Quality Gate passed

akanz1 commented Jul 28, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 28, 2024 •

edited

Loading