Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc: Adds example of categorical data for efficient storage and consistency across DataFrames #19245

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 30 additions & 12 deletions doc/source/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ functionality below.
Set logic on the other axes
~~~~~~~~~~~~~~~~~~~~~~~~~~~

When gluing together multiple DataFrames, you have a choice of how to handle
When gluing together multiple ``DataFrame``s, you have a choice of how to handle
the other axes (other than the one being concatenated). This can be done in
the following three ways:

Expand Down Expand Up @@ -323,13 +323,6 @@ the name of the ``Series``.
labels=['df1', 's1'], vertical=False);
plt.close('all');

.. note::

Since we're concatenating a ``Series`` to a ``DataFrame``, we could have
achieved the same result with :meth:`DataFrame.assign`. To concatenate an
arbitrary number of pandas objects (``DataFrame`` or ``Series``), use
``concat``.

If unnamed ``Series`` are passed they will be numbered consecutively.

.. ipython:: python
Expand Down Expand Up @@ -583,7 +576,7 @@ and ``right`` is a subclass of DataFrame, the return type will still be

``merge`` is a function in the pandas namespace, and it is also available as a
``DataFrame`` instance method :meth:`~DataFrame.merge`, with the calling
``DataFrame`` being implicitly considered the left object in the join.
``DataFrame `` being implicitly considered the left object in the join.

The related :meth:`~DataFrame.join` method, uses ``merge`` internally for the
index-on-index (by default) and column(s)-on-index join. If you are joining on
Expand Down Expand Up @@ -636,7 +629,7 @@ key combination:

Here is a more complicated example with multiple join keys. Only the keys
appearing in ``left`` and ``right`` are present (the intersection), since
``how='inner'`` by default.
``how='inner'```by default.

.. ipython:: python

Expand Down Expand Up @@ -721,7 +714,32 @@ either the left or right tables, the values in the joined table will be
labels=['left', 'right'], vertical=False);
plt.close('all');

Here is another example with duplicate join keys in DataFrames:
To join a Series and a DataFrame, the Series has to be transformed into a DataFrame first:

.. ipython:: python

df = pd.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})
df

# The series has a multi-index with levels corresponding to columns in the DataFrame we want to merge with
ser = pd.Series(
['a', 'b', 'c', 'd', 'e', 'f'],
index=pd.MultiIndex.from_arrays([["A", "B", "C"]*2, [1, 2, 3, 4, 5, 6]])
)
ser

# Name the row index levels
ser.index.names=['Let','Num']
ser

# reset_index turns the multi-level row index into columns, which requires a DataFrame
df2 = ser.reset_index()
type(df2)

# Now we merge the DataFrames
pd.merge(df, df2, on=['Let','Num'])

Here is another example with duplicate join keys in ``DataFrame``s:

.. ipython:: python

Expand Down Expand Up @@ -1202,7 +1220,7 @@ Overlapping value columns
~~~~~~~~~~~~~~~~~~~~~~~~~

The merge ``suffixes`` argument takes a tuple of list of strings to append to
overlapping column names in the input ``DataFrame``\ s to disambiguate the result
overlapping column names in the input ``DataFrame``s to disambiguate the result
columns:

.. ipython:: python
Expand Down