@@ -90,6 +90,7 @@ By using some special functions:
9090 df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
9191 df.head(10 )
9292
93+ See :ref: `documentation <reshaping.tile.cut >` for :func: `~pandas.cut `.
9394
9495`Categoricals ` have a specific ``category `` :ref: `dtype <basics.dtypes >`:
9596
@@ -331,6 +332,57 @@ Operations
331332
332333The following operations are possible with categorical data:
333334
335+ Comparing `Categoricals ` with other objects is possible in two cases:
336+
337+ * comparing a `Categorical ` to another `Categorical `, when `level ` and `ordered ` is the same or
338+ * comparing a `Categorical ` to a scalar.
339+
340+ All other comparisons will raise a TypeError.
341+
342+ .. ipython :: python
343+
344+ cat = pd.Series(pd.Categorical([1 ,2 ,3 ], levels = [3 ,2 ,1 ]))
345+ cat_base = pd.Series(pd.Categorical([2 ,2 ,2 ], levels = [3 ,2 ,1 ]))
346+ cat_base2 = pd.Series(pd.Categorical([2 ,2 ,2 ]))
347+
348+ cat
349+ cat_base
350+ cat_base2
351+
352+ Comparing to a categorical with the same levels and ordering or to a scalar works:
353+
354+ .. ipython :: python
355+
356+ cat > cat_base
357+ cat > 2
358+
359+ This doesn't work because the levels are not the same:
360+
361+ .. ipython :: python
362+
363+ try :
364+ cat > cat_base2
365+ except TypeError as e:
366+ print (" TypeError: " + str (e))
367+
368+ .. note ::
369+
370+ Comparisons with `Series `, `np.array ` or a `Categorical ` with different levels or ordering
371+ will raise an `TypeError ` because custom level ordering would result in two valid results:
372+ one with taking in account the ordering and one without. If you want to compare a `Categorical `
373+ with such a type, you need to be explicit and convert the `Categorical ` to values:
374+
375+ .. ipython :: python
376+
377+ base = np.array([1 ,2 ,3 ])
378+
379+ try :
380+ cat > base
381+ except TypeError as e:
382+ print (" TypeError: " + str (e))
383+
384+ np.asarray(cat) > base
385+
334386 Getting the minimum and maximum, if the categorical is ordered:
335387
336388.. ipython :: python
@@ -489,34 +541,38 @@ but the levels of these `Categoricals` need to be the same:
489541
490542.. ipython :: python
491543
492- cat = pd.Categorical([" a" ," b" ], levels = [" a" ," b" ])
493- vals = [1 ,2 ]
494- df = pd.DataFrame({" cats" :cat, " vals" :vals})
495- res = pd.concat([df,df])
496- res
497- res.dtypes
544+ cat = pd.Categorical([" a" ," b" ], levels = [" a" ," b" ])
545+ vals = [1 ,2 ]
546+ df = pd.DataFrame({" cats" :cat, " vals" :vals})
547+ res = pd.concat([df,df])
548+ res
549+ res.dtypes
498550
499- df_different = df.copy()
500- df_different[" cats" ].cat.levels = [" a" ," b" ," c" ]
551+ In this case the levels are not the same and so an error is raised:
501552
502- try :
503- pd.concat([df,df])
504- except ValueError as e:
505- print (" ValueError: " + str (e))
553+ .. ipython :: python
554+
555+ df_different = df.copy()
556+ df_different[" cats" ].cat.levels = [" a" ," b" ," c" ]
557+ try :
558+ pd.concat([df,df_different])
559+ except ValueError as e:
560+ print (" ValueError: " + str (e))
506561
507562 The same applies to ``df.append(df) ``.
508563
509564Getting Data In/Out
510565-------------------
511566
512- Writing data (`Series `, `Frames `) to a HDF store that contains a ``category `` dtype will currently raise ``NotImplementedError ``.
567+ Writing data (`Series `, `Frames `) to a HDF store that contains a ``category `` dtype will currently
568+ raise ``NotImplementedError ``.
513569
514570Writing to a CSV file will convert the data, effectively removing any information about the
515571`Categorical ` (levels and ordering). So if you read back the CSV file you have to convert the
516572relevant columns back to `category ` and assign the right levels and level ordering.
517573
518574.. ipython :: python
519- :suppress:
575+ :suppress:
520576
521577 from pandas.compat import StringIO
522578
@@ -548,7 +604,7 @@ default not included in computations. See the :ref:`Missing Data section
548604<missing_data>`
549605
550606There are two ways a `np.nan ` can be represented in `Categorical `: either the value is not
551- available or `np.nan ` is a valid level.
607+ available ("missing value") or `np.nan ` is a valid level.
552608
553609.. ipython :: python
554610
@@ -560,9 +616,25 @@ available or `np.nan` is a valid level.
560616 s2.cat.levels = [1 ,2 ,np.nan]
561617 s2
562618 # three levels, np.nan included
563- # Note: as int arrays can't hold NaN the levels were converted to float
619+ # Note: as int arrays can't hold NaN the levels were converted to object
564620 s2.cat.levels
565621
622+ .. note ::
623+ Missing value methods like ``isnull `` and ``fillna `` will take both missing values as well as
624+ `np.nan ` levels into account:
625+
626+ .. ipython :: python
627+
628+ c = pd.Categorical([" a" ," b" ,np.nan])
629+ c.levels = [" a" ," b" ,np.nan]
630+ # will be inserted as a NA level:
631+ c[0 ] = np.nan
632+ s = pd.Series(c)
633+ s
634+ pd.isnull(s)
635+ s.fillna(" a" )
636+
637+
566638 Gotchas
567639-------
568640
@@ -579,15 +651,18 @@ object and not as a low level `numpy` array dtype. This leads to some problems.
579651 try :
580652 np.dtype(" category" )
581653 except TypeError as e:
582- print (" TypeError: " + str (e))
654+ print (" TypeError: " + str (e))
583655
584656 dtype = pd.Categorical([" a" ]).dtype
585657 try :
586658 np.dtype(dtype)
587659 except TypeError as e:
588660 print (" TypeError: " + str (e))
589661
590- # dtype comparisons work:
662+ Dtype comparisons work:
663+
664+ .. ipython :: python
665+
591666 dtype == np.str_
592667 np.str_ == dtype
593668
0 commit comments