Categorical in GroupBy with aggregations raise error under specific conditions

```python
ticks = pd.DataFrame.from_dict({
    'cid':   [1, 1, 2, 2, 3],
    'date':  ['2019-01-01' , '2020-01-02' , '2020-01-03' , '2019-01-04' , '2020-01-05'],
    'tid':   [1, 2, 3, 4, 5],
    'amount':[1, 1, 2, 2, 3],
})
ticks['date'] = pd.to_datetime(ticks['date'])
ticks['year'] = ticks['date'].dt.year
ticks['year'] = ticks['year'].astype('category')

ticks.groupby(['cid', 'year'], as_index=False, observed=False).agg({'amount': sum})

```
Outputs a: `ValueError: Length of values (5) does not match length of index (6)`

#### Full traceback
<details>

```python
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-15-502fa0f135e2> in <module>
      9 
     10 
---> 11         ticks.groupby(['cid', 'year'], as_index=False, observed=False).agg({'amount': sum})
     12 )

c:\users\a.jouanjean\htdocs\factbook-py\.venv\lib\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, engine, engine_kwargs, *args, **kwargs)
    992 
    993         if not self.as_index:
--> 994             self._insert_inaxis_grouper_inplace(result)
    995             result.index = np.arange(len(result))
    996 

c:\users\a.jouanjean\htdocs\factbook-py\.venv\lib\site-packages\pandas\core\groupby\generic.py in _insert_inaxis_grouper_inplace(self, result)
   1716             # When using .apply(-), name will be in columns already
   1717             if in_axis and name not in columns:
-> 1718                 result.insert(0, name, lev)
   1719 
   1720     def _wrap_aggregated_output(

c:\users\a.jouanjean\htdocs\factbook-py\.venv\lib\site-packages\pandas\core\frame.py in insert(self, loc, column, value, allow_duplicates)
   3620         """
   3621         self._ensure_valid_index(value)
-> 3622         value = self._sanitize_column(column, value, broadcast=False)
   3623         self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)
   3624 

c:\users\a.jouanjean\htdocs\factbook-py\.venv\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3761 
   3762             # turn me into an ndarray
-> 3763             value = sanitize_index(value, self.index)
   3764             if not isinstance(value, (np.ndarray, Index)):
   3765                 if isinstance(value, list) and len(value) > 0:

c:\users\a.jouanjean\htdocs\factbook-py\.venv\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index)
    745     """
    746     if len(data) != len(index):
--> 747         raise ValueError(
    748             "Length of values "
    749             f"({len(data)}) "

ValueError: Length of values (5) does not match length of index (6)
```

</details>

#### Problem description

After quite some time trying to narrow down the origin of a `ValueError: Length of values (N) does not match length of index (M)`, it seems to occur, only when these conditions are met:
* `groupby()` done using a categorical variables in the `by` list
* `as_index=False`, but `as_index=True` is OK
* `observed=False`, but `observed=True` is OK
* `aggregate()` is performed, while applying directly a `sum()` on the DataFrameGroupBy is OK

see the different combinations in details below.

#### Expected Output

Would it be possible to issue an early check and error/exception raise when such conditions are met?

It would definitely help user to understand clearly where the problem comes from and how to correct it.

I am aware that some parts of the issue are being addressed (see PR #35967), but this will not help the user to understand what is actually going  on when all conditions are met.


#### Conditions under which Error is not raised

<details>

```python
# with observed=True
ticks.groupby(['cid', 'year'], as_index=False, observed=True).agg({'amount': sum}),

# with as_index=True
ticks.groupby(['cid', 'year'], as_index=True, observed=False).agg({'amount': sum}),  

# without using aggregate() but sum() directly [will also sum tid, but still no error]
ticks.groupby(['cid', 'year'], as_index=False, observed=False).sum(),

```

</details>


#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : 2a7d3326dee660824a8433ffd01065f8ac37f7d6
python           : 3.8.3.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.18362
machine          : AMD64
processor        : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : fr_FR.cp1252

pandas           : 1.1.2
numpy            : 1.19.2
pytz             : 2020.1
dateutil         : 2.8.1
pip              : 20.2.3
setuptools       : 41.2.0
Cython           : None
pytest           : None
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : None
IPython          : 7.18.1
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : None
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : 1.0.1
pytables         : None
pyxlsb           : None
s3fs             : None
scipy            : None
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : None
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Categorical in GroupBy with aggregations raise error under specific conditions #36698

Full traceback

Problem description

Expected Output

Conditions under which Error is not raised

Output of `pd.show_versions()`

INSTALLED VERSIONS

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Categorical in GroupBy with aggregations raise error under specific conditions #36698

Description

Full traceback

Problem description

Expected Output

Conditions under which Error is not raised

Output of pd.show_versions()

INSTALLED VERSIONS

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Output of `pd.show_versions()`