Skip to content

Commit

Permalink
[INF] Infra upgrades (#1294)
Browse files Browse the repository at this point in the history
* chore: update pre-commit hooks

- Removed unused hooks: isort, flake8
- Added new hook: ruff
- Updated black and interrogate hooks configuration
- Commented out darglint due to timeout issues on pre-commit CI

This update aims to improve code quality checks and ensure consistency across the codebase.

* feat: Add ruff tool configuration to pyproject.toml

This commit introduces a new feature to the codebase by adding the configuration for the ruff tool in the pyproject.toml file.
The configuration includes enabling pycodestyle and Pyflakes codes,
allowing fixes for all enabled rules,
excluding commonly ignored directories,
setting the line length to 88, allowing unused variables when underscore-prefixed,
and assuming Python 3.8.
It also sets the default complexity level to 10 for the mccabe tool under ruff.

* chore(pyproject.toml): Update target Python version to 3.10

This commit updates the target Python version in the pyproject.toml file from 3.8 to 3.10.

* chore(environment-dev): update Python and rdkit versions

- Python version updated from 3.9 to 3.10
- rdkit version constraint removed

* chore: Comment out darglint and add --fix arg to ruff

In this commit, the darglint pre-commit hook has been commented out. Additionally, the --fix argument has been added to the ruff pre-commit hook.

* infra: satisfy the shiny new ruff linter

* chore: disable darglint in pre-commit config

Due to performance issues, darglint has been commented out in the pre-commit configuration.
It may be replaced by ruff in the future. See astral-sh/ruff#458 for more details.

* feat: update pre-commit hooks and add pydoclint

- Updated the version of pre-commit-hooks from v4.4.0 to v4.5.0.
- Added pydoclint as an interim replacement for darglint with configuration in pyproject.toml.

* refactor(janitor/utils): use isinstance for type checking

Changed the type checking in the skipna function from using type() to isinstance()
for better Pythonic practice.

* refactor: update docstrings and remove redundant comments

In this commit, we have updated the docstring for the `_get_data_df` method
in the `DataDescription` class to provide more detailed information about its functionality.
We have also removed the redundant comments from the `__init__` method of the `col` class in `utils.py`
as they were not providing any additional value.

* chore: remove darglint checks workflow

This commit removes the darglint checks workflow from the GitHub actions.
The workflow was initially added to run darglint checks manually due to the pre-commit CI timing out.
Now that the issue has been resolved, the workflow is no longer needed.

* feat(janitor): add 'col' utility to functions

This commit introduces the 'col' utility from the utils module into the janitor package.
This utility can now be accessed directly from the janitor package.

* refactor(janitor): update import statements and function usage

- Updated import statement in __init__.py to include DropLabel from functions.utils
- Modified usage of expand_grid function in expand_grid.py to be directly called instead of through the janitor module

* test: remove redundant dataframe method registration tests

This commit removes the test_df_registration.py file, which contained redundant tests for dataframe method registration.
These tests were not necessary as the registration of these methods is guaranteed by the pandas-flavor library.

* feat(utils): add dynamic_import function and import janitor.chemistry in test

- Added a new function `dynamic_import` in `janitor/utils.py` that allows for dynamic importing of all modules in a directory.
- Imported `janitor.chemistry` in `tests/chemistry/test_maccs_keys_fingerprint.py` to ensure it's available during testing.
- Also added `importlib` and `pathlib.Path` to `janitor/utils.py` to support the new function.

* feat(janitor/functions): add dynamic import functionality

- Imported dynamic_import from janitor.utils
- Called dynamic_import function with __name__ as argument

* refactor: update dynamic_import argument and limit test examples

- In `janitor/functions/__init__.py`, the argument passed to `dynamic_import` has been updated from `__name__` to `Path(__name__)` to leverage the pathlib library for more robust path handling.
- In `tests/functions/test_conditional_join.py`, the number of examples for several tests has been limited to improve test performance and reduce runtime.

* refactor(janitor/functions): remove unused imports and dynamic import function

This commit removes the unused imports 'Path' from 'pathlib' and 'dynamic_import' from 'janitor.utils'.
It also removes the call to 'dynamic_import' function which is no longer needed.

* refactor(tests): import janitor module in test files

- Modified the import statements in test_expand_grid.py and test_factorize_columns.py to include the janitor module.
- This change ensures that the janitor module is explicitly imported in the test files.

* test: import janitor in test_fill_direction.py

This commit adds an import statement for the janitor module in the test_fill_direction.py file.
This is necessary for the proper functioning of the tests in this file.

* refactor(janitor): reorganize function imports and remove unused imports

This commit reorganizes the function imports in the janitor package to improve code readability and maintainability.
It also removes an unused import from the main __init__.py file.

* test: limit max examples in pytest settings to 10

This commit reduces the maximum number of examples generated by pytest for each test case from unlimited to 10.
This change is intended to speed up test execution time without significantly reducing test coverage.

* test: limit max examples in pytest settings to 10

In an effort to optimize testing time,
the maximum number of examples for each test in the pytest settings has been reduced to 10.
This change affects multiple test functions in the 'test_conditional_join.py' file.

* test: limit max examples in pytest settings to 10 for multiple test functions

* test: limit max examples in pytest to improve test performance

* feat(devguide): expand section on writing code

This commit expands the "Write the Code" section in the developer guide.
It provides more detailed instructions on best practices for writing code,
including committing early and often, staying updated with the dev branch, and writing tests.
It also updates the "Check your code" section to include information about pre-commit hooks.

* chore(github-actions): update checkout action and remove test matrix

This commit updates the version of the checkout action used in the GitHub Actions workflow from v3 to v4.
It also removes the matrix strategy for running tests, which previously included "turtle" and "not turtle" subsets.
Now, all tests will be run without any subset specification.

* test: Add execution test for conditional_join function

This commit introduces a new test for the conditional_join function in the test_conditional_join.py file.
The test uses an example directly from the conditional_join docstring to verify the function's correct operation.
  • Loading branch information
ericmjl authored Oct 14, 2023
1 parent 4ea22dc commit a4f1c0a
Show file tree
Hide file tree
Showing 101 changed files with 615 additions and 619 deletions.
42 changes: 0 additions & 42 deletions .github/workflows/darglint-checks.yml

This file was deleted.

6 changes: 2 additions & 4 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,6 @@ jobs:
run-tests:
strategy:
fail-fast: false
matrix:
test-subset: ["turtle", "not turtle"]
runs-on: ubuntu-latest
name: Run pyjanitor test suite

Expand All @@ -39,7 +37,7 @@ jobs:

steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4

# See: https://github.com/marketplace/actions/setup-miniconda
- name: Setup miniconda
Expand All @@ -58,7 +56,7 @@ jobs:
run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml --doctest-only janitor

- name: Run unit tests
run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml tests -m "${{ matrix.test-subset }}"
run: pytest -v -r a -n auto --color=yes --durations=0 --cov=janitor --cov-append --cov-report term-missing --cov-report xml tests

# https://github.com/codecov/codecov-action
- name: Upload code coverage
Expand Down
40 changes: 18 additions & 22 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,34 @@ repos:
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files

- repo: https://github.com/psf/black
rev: 23.9.1
hooks:
- id: black
args: [--config, pyproject.toml]

# - repo: https://github.com/pycqa/isort
# rev: 5.11.2
# hooks:
# - id: isort
# name: isort (python)

- repo: https://github.com/econchick/interrogate
rev: 1.5.0
hooks:
- id: interrogate
args: [-c, pyproject.toml]
# Taking out darglint because it takes too long to run.
# It may be superseded by ruff: https://github.com/astral-sh/ruff/issues/458
# - repo: https://github.com/terrencepreilly/darglint
# rev: v1.8.1
# hooks:
# - id: darglint
# args: [-v 2] # this config makes the error messages a bit less cryptic.

- repo: https://github.com/terrencepreilly/darglint
rev: v1.8.1
# The interim replacement for darglint is pydoclint.
- repo: https://github.com/jsh9/pydoclint
rev: 0.3.3
hooks:
- id: darglint
args: [-v 2] # this config makes the error messages a bit less cryptic.

- repo: https://github.com/PyCQA/flake8
rev: 6.1.0
- id: pydoclint
args:
- "--config=pyproject.toml"
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.0.292
hooks:
- id: flake8
args: [--exclude, nbconvert_config.py]

ci:
skip:
# FIXME: darglint is timing out on pre-commit CI (cf. #1236, #1246)
- darglint
- id: ruff
args: [--fix]
4 changes: 2 additions & 2 deletions environment-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ name: pyjanitor-dev
channels:
- conda-forge
dependencies:
- python=3.9
- python=3.10
- biopython
- black=22.12.0 # keep this in sync with `.pre-commit-config.yaml`
- bump2version=1.0.1
Expand Down Expand Up @@ -40,7 +40,7 @@ dependencies:
- pytest-xdist
- pytest-doctestplus
- python-language-server
- rdkit=2021.09.3
- rdkit
- recommonmark
- seaborn
- twine
Expand Down
5 changes: 4 additions & 1 deletion janitor/accessors/data_description.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ class DataDescription:
"""

def __init__(self, data):
"""Initialize DataDescription class."""
self._data = data
self._desc = {}

def _get_data_df(self) -> pd.DataFrame:
"""Get a table of descriptive information in a DataFrame format.
:returns: A DataFrame containing the descriptive information.
"""
df = self._data

data_dict = {}
Expand Down
1 change: 0 additions & 1 deletion janitor/engineering.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@

from .utils import check, import_message


try:
import unyt
except ImportError:
Expand Down
2 changes: 1 addition & 1 deletion janitor/finance.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@
import requests

from janitor.errors import JanitorError
from .utils import check, deprecated_alias, is_connected

from .utils import check, deprecated_alias, is_connected

currency_set = {
"AUD",
Expand Down
85 changes: 80 additions & 5 deletions janitor/functions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
from .expand_grid import expand_grid
from .factorize_columns import factorize_columns
from .fill import fill_direction, fill_empty
from .filter import filter_date, filter_column_isin, filter_on, filter_string
from .filter import filter_column_isin, filter_date, filter_on, filter_string
from .find_replace import find_replace
from .flag_nulls import flag_nulls
from .get_dupes import get_dupes
Expand All @@ -64,7 +64,7 @@
from .reorder_columns import reorder_columns
from .round_to_fraction import round_to_fraction
from .row_to_names import row_to_names
from .select import select_columns, select_rows, select
from .select import select, select_columns, select_rows
from .shuffle import shuffle
from .sort_column_value_order import sort_column_value_order
from .sort_naturally import sort_naturally
Expand All @@ -76,10 +76,85 @@
from .truncate_datetime import truncate_datetime_dataframe
from .update_where import update_where
from .utils import (
patterns,
unionize_dataframe_categories,
DropLabel,
get_index_labels,
col,
get_columns,
get_index_labels,
patterns,
unionize_dataframe_categories,
)

__all__ = [
"add_columns",
"also",
"bin_numeric",
"case_when",
"change_type",
"clean_names",
"coalesce",
"collapse_levels",
"complete",
"concatenate_columns",
"conditional_join",
"convert_excel_date",
"convert_matlab_date",
"convert_unix_date",
"count_cumulative_unique",
"currency_column_to_numeric",
"deconcatenate_column",
"drop_constant_columns",
"drop_duplicate_columns",
"dropnotnull",
"encode_categorical",
"expand_column",
"expand_grid",
"factorize_columns",
"fill_direction",
"fill_empty",
"filter_date",
"filter_column_isin",
"filter_on",
"filter_string",
"find_replace",
"flag_nulls",
"get_dupes",
"groupby_agg",
"groupby_topk",
"impute",
"jitter",
"join_apply",
"label_encode",
"limit_column_characters",
"min_max_scale",
"move",
"pivot_longer",
"pivot_wider",
"process_text",
"remove_columns",
"remove_empty",
"rename_column",
"rename_columns",
"reorder_columns",
"round_to_fraction",
"row_to_names",
"select_columns",
"select_rows",
"select",
"shuffle",
"sort_column_value_order",
"sort_naturally",
"take_first",
"then",
"to_datetime",
"toset",
"transform_column",
"transform_columns",
"truncate_datetime_dataframe",
"update_where",
"patterns",
"unionize_dataframe_categories",
"DropLabel",
"get_index_labels",
"col",
"get_columns",
]
7 changes: 4 additions & 3 deletions janitor/functions/_numba.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,15 @@

import numpy as np
import pandas as pd
from numba import njit, prange
from pandas.api.types import is_datetime64_dtype, is_extension_array_dtype

from janitor.functions.utils import (
_generic_func_cond_join,
_JoinOperator,
less_than_join_types,
greater_than_join_types,
less_than_join_types,
)
from numba import njit, prange
from pandas.api.types import is_extension_array_dtype, is_datetime64_dtype


def _numba_equi_join(df, right, eqs, ge_gt, le_lt):
Expand Down
7 changes: 4 additions & 3 deletions janitor/functions/add_columns.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from typing import Any, List, Tuple, Union

import numpy as np
import pandas as pd
import pandas_flavor as pf

from janitor.utils import check, deprecated_alias, refactored_function
import pandas as pd
from typing import Union, List, Any, Tuple
import numpy as np


@pf.register_dataframe_method
Expand Down
3 changes: 2 additions & 1 deletion janitor/functions/also.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
"""Implementation source for chainable function `also`."""
from typing import Any, Callable
import pandas_flavor as pf

import pandas as pd
import pandas_flavor as pf


@pf.register_dataframe_method
Expand Down
6 changes: 3 additions & 3 deletions janitor/functions/bin_numeric.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
"""Implementation source for `bin_numeric`."""
from typing import Any, Optional, Union, Sequence
import pandas_flavor as pf
from typing import Any, Optional, Sequence, Union

import pandas as pd
import pandas_flavor as pf

from janitor.utils import check, check_column, deprecated_alias


ScalarSequence = Sequence[float]


Expand Down
8 changes: 5 additions & 3 deletions janitor/functions/case_when.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
"""Implementation source for `case_when`."""
from pandas.core.common import apply_if_callable
import warnings
from typing import Any
import pandas_flavor as pf

import pandas as pd
import pandas_flavor as pf
from pandas.api.types import is_scalar
import warnings
from pandas.core.common import apply_if_callable

from janitor.utils import check, find_stack_level

warnings.simplefilter("always", DeprecationWarning)
Expand Down
9 changes: 5 additions & 4 deletions janitor/functions/clean_names.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
"""Functions for cleaning columns names."""
from janitor.utils import deprecated_alias
from janitor.functions.utils import get_index_labels, _is_str_or_cat
from pandas.api.types import is_scalar
import unicodedata
from typing import Hashable, Optional, Union

import pandas as pd
import pandas_flavor as pf
from pandas.api.types import is_scalar

from janitor.errors import JanitorError
import unicodedata
from janitor.functions.utils import _is_str_or_cat, get_index_labels
from janitor.utils import deprecated_alias


@pf.register_dataframe_method
Expand Down
3 changes: 2 additions & 1 deletion janitor/functions/coalesce.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
"""Function for performing coalesce."""
from typing import Any, Optional, Union

import pandas as pd
import pandas_flavor as pf

from janitor.utils import check, deprecated_alias
from janitor.functions.utils import get_index_labels
from janitor.utils import check, deprecated_alias


@pf.register_dataframe_method
Expand Down
Loading

0 comments on commit a4f1c0a

Please sign in to comment.