@@ -89,12 +89,22 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
8989 df[" B" ] = raw_cat
9090 df
9191
92- You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype() ``:
92+ Anywhere above we passed a keyword ``dtype='category' ``, we used the default behavior of
93+
94+ 1. categories are inferred from the data
95+ 2. categories are unordered.
96+
97+ To control those behaviors, instead of passing ``'category' ``, use an instance
98+ of :class: `~pandas.api.types.CategoricalDtype `.
9399
94100.. ipython :: python
95101
96- s = pd.Series([" a" ," b" ," c" ," a" ])
97- s_cat = s.astype(" category" , categories = [" b" ," c" ," d" ], ordered = False )
102+ from pandas.api.types import CategoricalDtype
103+
104+ s = pd.Series([" a" , " b" , " c" , " a" ])
105+ cat_type = CategoricalDtype(categories = [" b" , " c" , " d" ],
106+ ordered = True )
107+ s_cat = s.astype(cat_type)
98108 s_cat
99109
100110 Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
@@ -133,6 +143,75 @@ constructor to save the factorize step during normal constructor mode:
133143 splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
134144 s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
135145
146+ .. _categorical.categoricaldtype :
147+
148+ CategoricalDtype
149+ ----------------
150+
151+ .. versionchanged :: 0.21.0
152+
153+ A categorical's type is fully described by
154+
155+ 1. ``categories ``: a sequence of unique values and no missing values
156+ 2. ``ordered ``: a boolean
157+
158+ This information can be stored in a :class: `~pandas.api.types.CategoricalDtype `.
159+ The ``categories `` argument is optional, which implies that the actual categories
160+ should be inferred from whatever is present in the data when the
161+ :class: `pandas.Categorical ` is created. The categories are assumed to be unordered
162+ by default.
163+
164+ .. ipython :: python
165+
166+ from pandas.api.types import CategoricalDtype
167+
168+ CategoricalDtype([' a' , ' b' , ' c' ])
169+ CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
170+ CategoricalDtype()
171+
172+ A :class: `~pandas.api.types.CategoricalDtype ` can be used in any place pandas
173+ expects a `dtype `. For example :func: `pandas.read_csv `,
174+ :func: `pandas.DataFrame.astype `, or in the Series constructor.
175+
176+ .. note ::
177+
178+ As a convenience, you can use the string ``'category' `` in place of a
179+ :class: `~pandas.api.types.CategoricalDtype ` when you want the default behavior of
180+ the categories being unordered, and equal to the set values present in the
181+ array. In other words, ``dtype='category' `` is equivalent to
182+ ``dtype=CategoricalDtype() ``.
183+
184+ Equality Semantics
185+ ~~~~~~~~~~~~~~~~~~
186+
187+ Two instances of :class: `~pandas.api.types.CategoricalDtype ` compare equal
188+ whenever they have the same categories and orderedness. When comparing two
189+ unordered categoricals, the order of the ``categories `` is not considered
190+
191+ .. ipython :: python
192+
193+ c1 = CategoricalDtype([' a' , ' b' , ' c' ], ordered = False )
194+
195+ # Equal, since order is not considered when ordered=False
196+ c1 == CategoricalDtype([' b' , ' c' , ' a' ], ordered = False )
197+
198+ # Unequal, since the second CategoricalDtype is ordered
199+ c1 == CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
200+
201+ All instances of ``CategoricalDtype `` compare equal to the string ``'category' ``
202+
203+ .. ipython :: python
204+
205+ c1 == ' category'
206+
207+ .. warning ::
208+
209+ Since ``dtype='category' `` is essentially ``CategoricalDtype(None, False) ``,
210+ and since all instances ``CategoricalDtype `` compare equal to ``'category' ``,
211+ all instances of ``CategoricalDtype `` compare equal to a
212+ ``CategoricalDtype(None, False) ``, regardless of ``categories `` or
213+ ``ordered ``.
214+
136215Description
137216-----------
138217
@@ -184,7 +263,7 @@ It's also possible to pass in the categories in a specific order:
184263
185264 .. ipython :: python
186265
187- s = pd.Series(list (' babc' )).astype(' category ' , categories = list (' abcd' ))
266+ s = pd.Series(list (' babc' )).astype(CategoricalDtype( list (' abcd' ) ))
188267 s
189268
190269 # categories
@@ -301,7 +380,9 @@ meaning and certain operations are possible. If the categorical is unordered, ``
301380
302381 s = pd.Series(pd.Categorical([" a" ," b" ," c" ," a" ], ordered = False ))
303382 s.sort_values(inplace = True )
304- s = pd.Series([" a" ," b" ," c" ," a" ]).astype(' category' , ordered = True )
383+ s = pd.Series([" a" ," b" ," c" ," a" ]).astype(
384+ CategoricalDtype(ordered = True )
385+ )
305386 s.sort_values(inplace = True )
306387 s
307388 s.min(), s.max()
@@ -401,9 +482,15 @@ categories or a categorical with any list-like object, will raise a TypeError.
401482
402483.. ipython :: python
403484
404- cat = pd.Series([1 ,2 ,3 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
405- cat_base = pd.Series([2 ,2 ,2 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
406- cat_base2 = pd.Series([2 ,2 ,2 ]).astype(" category" , ordered = True )
485+ cat = pd.Series([1 ,2 ,3 ]).astype(
486+ CategoricalDtype([3 , 2 , 1 ], ordered = True )
487+ )
488+ cat_base = pd.Series([2 ,2 ,2 ]).astype(
489+ CategoricalDtype([3 , 2 , 1 ], ordered = True )
490+ )
491+ cat_base2 = pd.Series([2 ,2 ,2 ]).astype(
492+ CategoricalDtype(ordered = True )
493+ )
407494
408495 cat
409496 cat_base
0 commit comments