Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

future warning in test: tests/tests_preprocessing/test_label_binarizer.py::LabelBinarizerTest::test_inverse_transform #758

Closed
a-szulc opened this issue Aug 23, 2024 · 2 comments

Comments

@a-szulc
Copy link
Contributor

a-szulc commented Aug 23, 2024

============================= test session starts ==============================
platform linux -- Python 3.12.3, pytest-8.3.2, pluggy-1.5.0 -- /home/adas/mljar/mljar-supervised/venv/bin/python3
cachedir: .pytest_cache
rootdir: /home/adas/mljar/mljar-supervised
configfile: pytest.ini
plugins: cov-5.0.0
collecting ... collected 1 item

tests/tests_preprocessing/test_label_binarizer.py::LabelBinarizerTest::test_inverse_transform FAILED

=================================== FAILURES ===================================
__________________ LabelBinarizerTest.test_inverse_transform ___________________

self = NumpyBlock: 3 dtype: int64, indexer = slice(None, None, None)
value = 'a', using_cow = False

    def setitem(self, indexer, value, using_cow: bool = False) -> Block:
        """
        Attempt self.values[indexer] = value, possibly creating a new array.
    
        Parameters
        ----------
        indexer : tuple, list-like, array-like, slice, int
            The subset of self.values to set
        value : object
            The value being set
        using_cow: bool, default False
            Signaling if CoW is used.
    
        Returns
        -------
        Block
    
        Notes
        -----
        `indexer` is a direct slice/positional indexer. `value` must
        be a compatible shape.
        """
    
        value = self._standardize_fill_value(value)
    
        values = cast(np.ndarray, self.values)
        if self.ndim == 2:
            values = values.T
    
        # length checking
        check_setitem_lengths(indexer, value, values)
    
        if self.dtype != _dtype_obj:
            # GH48933: extract_array would convert a pd.Series value to np.ndarray
            value = extract_array(value, extract_numpy=True)
        try:
>           casted = np_can_hold_element(values.dtype, value)

venv/lib/python3.12/site-packages/pandas/core/internals/blocks.py:1409: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

dtype = dtype('int64'), element = 'a'

    def np_can_hold_element(dtype: np.dtype, element: Any) -> Any:
        """
        Raise if we cannot losslessly set this element into an ndarray with this dtype.
    
        Specifically about places where we disagree with numpy.  i.e. there are
        cases where numpy will raise in doing the setitem that we do not check
        for here, e.g. setting str "X" into a numeric ndarray.
    
        Returns
        -------
        Any
            The element, potentially cast to the dtype.
    
        Raises
        ------
        ValueError : If we cannot losslessly store this element with this dtype.
        """
        if dtype == _dtype_obj:
            return element
    
        tipo = _maybe_infer_dtype_type(element)
    
        if dtype.kind in "iu":
            if isinstance(element, range):
                if _dtype_can_hold_range(element, dtype):
                    return element
                raise LossySetitemError
    
            if is_integer(element) or (is_float(element) and element.is_integer()):
                # e.g. test_setitem_series_int8 if we have a python int 1
                #  tipo may be np.int32, despite the fact that it will fit
                #  in smaller int dtypes.
                info = np.iinfo(dtype)
                if info.min <= element <= info.max:
                    return dtype.type(element)
                raise LossySetitemError
    
            if tipo is not None:
                if tipo.kind not in "iu":
                    if isinstance(element, np.ndarray) and element.dtype.kind == "f":
                        # If all can be losslessly cast to integers, then we can hold them
                        with np.errstate(invalid="ignore"):
                            # We check afterwards if cast was losslessly, so no need to show
                            # the warning
                            casted = element.astype(dtype)
                        comp = casted == element
                        if comp.all():
                            # Return the casted values bc they can be passed to
                            #  np.putmask, whereas the raw values cannot.
                            #  see TestSetitemFloatNDarrayIntoIntegerSeries
                            return casted
                        raise LossySetitemError
    
                    elif isinstance(element, ABCExtensionArray) and isinstance(
                        element.dtype, CategoricalDtype
                    ):
                        # GH#52927 setting Categorical value into non-EA frame
                        # TODO: general-case for EAs?
                        try:
                            casted = element.astype(dtype)
                        except (ValueError, TypeError):
                            raise LossySetitemError
                        # Check for cases of either
                        #  a) lossy overflow/rounding or
                        #  b) semantic changes like dt64->int64
                        comp = casted == element
                        if not comp.all():
                            raise LossySetitemError
                        return casted
    
                    # Anything other than integer we cannot hold
                    raise LossySetitemError
                if (
                    dtype.kind == "u"
                    and isinstance(element, np.ndarray)
                    and element.dtype.kind == "i"
                ):
                    # see test_where_uint64
                    casted = element.astype(dtype)
                    if (casted == element).all():
                        # TODO: faster to check (element >=0).all()?  potential
                        #  itemsize issues there?
                        return casted
                    raise LossySetitemError
                if dtype.itemsize < tipo.itemsize:
                    raise LossySetitemError
                if not isinstance(tipo, np.dtype):
                    # i.e. nullable IntegerDtype; we can put this into an ndarray
                    #  losslessly iff it has no NAs
                    arr = element._values if isinstance(element, ABCSeries) else element
                    if arr._hasna:
                        raise LossySetitemError
                    return element
    
                return element
    
>           raise LossySetitemError
E           pandas.errors.LossySetitemError

venv/lib/python3.12/site-packages/pandas/core/dtypes/cast.py:1859: LossySetitemError

During handling of the above exception, another exception occurred:

self = <tests.tests_preprocessing.test_label_binarizer.LabelBinarizerTest testMethod=test_inverse_transform>

    def test_inverse_transform(self):
        d = {"col1": ["a", "a", "c"], "col2": ["w", "e", "d"]}
        df = pd.DataFrame(data=d)
        lb = LabelBinarizer()
        # check first column
        lb.fit(df, "col1")
        bb = lb.transform(df, "col1")
        self.assertTrue("col1_c" in bb.columns)
        self.assertTrue(np.sum(bb["col1_c"]) == 1)
>       bb = lb.inverse_transform(bb)

tests/tests_preprocessing/test_label_binarizer.py:184: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
supervised/preprocessing/label_binarizer.py:42: in inverse_transform
    old_col[:] = unique_value
venv/lib/python3.12/site-packages/pandas/core/series.py:1295: in __setitem__
    return self._set_values(indexer, value, warn=warn)
venv/lib/python3.12/site-packages/pandas/core/series.py:1419: in _set_values
    self._mgr = self._mgr.setitem(indexer=key, value=value, warn=warn)
venv/lib/python3.12/site-packages/pandas/core/internals/managers.py:415: in setitem
    return self.apply("setitem", indexer=indexer, value=value)
venv/lib/python3.12/site-packages/pandas/core/internals/managers.py:363: in apply
    applied = getattr(b, f)(**kwargs)
venv/lib/python3.12/site-packages/pandas/core/internals/blocks.py:1412: in setitem
    nb = self.coerce_to_target_dtype(value, warn_on_upcast=True)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = NumpyBlock: 3 dtype: int64, other = 'a', warn_on_upcast = True

    @final
    def coerce_to_target_dtype(self, other, warn_on_upcast: bool = False) -> Block:
        """
        coerce the current block to a dtype compat for other
        we will return a block, possibly object, and not raise
    
        we can also safely try to coerce to the same dtype
        and will receive the same block
        """
        new_dtype = find_result_type(self.values.dtype, other)
        if new_dtype == self.dtype:
            # GH#52927 avoid RecursionError
            raise AssertionError(
                "Something has gone wrong, please report a bug at "
                "https://github.com/pandas-dev/pandas/issues"
            )
    
        # In a future version of pandas, the default will be that
        # setting `nan` into an integer series won't raise.
        if (
            is_scalar(other)
            and is_integer_dtype(self.values.dtype)
            and isna(other)
            and other is not NaT
            and not (
                isinstance(other, (np.datetime64, np.timedelta64)) and np.isnat(other)
            )
        ):
            warn_on_upcast = False
        elif (
            isinstance(other, np.ndarray)
            and other.ndim == 1
            and is_integer_dtype(self.values.dtype)
            and is_float_dtype(other.dtype)
            and lib.has_only_ints_or_nan(other)
        ):
            warn_on_upcast = False
    
        if warn_on_upcast:
>           warnings.warn(
                f"Setting an item of incompatible dtype is deprecated "
                "and will raise an error in a future version of pandas. "
                f"Value '{other}' has dtype incompatible with {self.values.dtype}, "
                "please explicitly cast to a compatible dtype first.",
                FutureWarning,
                stacklevel=find_stack_level(),
            )
E           FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'a' has dtype incompatible with int64, please explicitly cast to a compatible dtype first.

venv/lib/python3.12/site-packages/pandas/core/internals/blocks.py:517: FutureWarning
=========================== short test summary info ============================
FAILED tests/tests_preprocessing/test_label_binarizer.py::LabelBinarizerTest::test_inverse_transform
============================== 1 failed in 2.04s ===============================
@a-szulc
Copy link
Contributor Author

a-szulc commented Aug 27, 2024

fixed in #764

@a-szulc a-szulc closed this as completed Aug 27, 2024
@pplonski
Copy link
Contributor

Great job @a-szulc !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants