ENH/VIS: Pass DataFrame column to size argument in df.scatter #8244

TomAugspurger · 2014-09-11T15:53:59Z

You can already kind of do this by passing in the numpy array

In [83]: df = pd.DataFrame(np.random.randn(100, 2))

In [84]: df['z'] = np.random.uniform(size=(100))

In [85]: df.plot(kind='scatter', x=0, y=1, s=df.z.values * 1000)
Out[85]: <matplotlib.axes._subplots.AxesSubplot at 0x11a81df60>

But when I merge #7780 (coloring by column) it would be natural (and awesome) to do df.plot(kind='scatter', x='x', y='y', c='color', s='size')

Shouldn't be too hard if we're willing.

The text was updated successfully, but these errors were encountered:

shoyer · 2014-09-11T17:24:38Z

The reason I didn't do this in #7780 is because, unlike coloring by column, you need to have "size" in the right units to make the result look reasonable. So we would need to invent another argument (e.g., s_scale) to adjust printer points to the right size. We could pick some sort of sane default based on the statistics of the "size" column. Possibly would be worth looking at how ggplot handles this.

jorisvandenbossche · 2014-09-11T21:27:17Z

@TomAugspurger Something else, which matplotlib style did you use in the plot above? I think the plots in out docs should look like that! Is it a style that you can express in rcParams, then we could update https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L34 (eg the grid lines -> white lines)

shoyer · 2014-09-11T21:42:18Z

@jorisvandenbossche This is the style you get from importing seaborn. Just import seaborn should do it.

By the way, if you haven't tried Seaborn, you should definitely check it out. It's has a very well thought out design (both the API and the graphics style).

jorisvandenbossche · 2014-09-11T21:47:29Z

Ah, OK. Yes, I know seaborn, but have not yet really used it. In any case, we could maybe copy some the rcParams to update the style of the plots in our docs.

onesandzeroes · 2014-09-11T21:59:09Z

The seaborn style looks like it's just ggplot's default style. It's one of the built in styles in matplotlib 1.4, so if you wanted to use that for the docs, then you could just do:

import matplotlib.style
matplotlib.style.use('ggplot')

onesandzeroes · 2014-09-11T23:22:29Z

Also, at first glance the way ggplot handles this doesn't seem super complicated, it seems like it's all done here. So basically, it sets up a range between 1 and 6 (units are arbitrary, we'll just have to pick a range that looks good I guess) and normalizes the values to that range.

The main difference is that I think ggplot is scaling based on the radius, whereas matplotlib markersize sets the area, so we might need to transform? There's a bit of discussion on SO here, the scaling in the second example looks quite good.

onesandzeroes · 2014-09-12T11:36:52Z

To me, the sizes seem pretty good if we just pick sensible defaults for the min and max point size, and then normalize the values to that range, e.g.:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

def convert_to_points(vals, size_range=(50, 1000)):
    min_size, max_size = size_range
    val_range = vals.max() - vals.min()
    normalized_vals = (vals - vals.min()) / val_range
    point_sizes = (min_size + (normalized_vals * (max_size - min_size)))
    return point_sizes

df2 = pd.DataFrame({
    'x': np.linspace(0, 50, 6),
    'y': np.linspace(0, 20, 6)
})
df2.plot(kind='scatter', x='x', y='y', s=convert_to_points(df2.x.values))

I can't claim to have the best eye for visual design though, so if anyone can suggest scaling methods that work better than a straight linear transform I'm happy to hear them. If the aim is to provide an argument that lets people adjust the min and max size up and down, it might also be nice to present the user with more sensible numbers like ggplot does with its default (1, 6) range

TomAugspurger · 2018-07-06T22:30:27Z

Dupe of #16827

TomAugspurger added Enhancement Prio-low Visualization plotting labels Sep 11, 2014

TomAugspurger added this to the Someday milestone Sep 11, 2014

shoyer mentioned this issue Oct 3, 2014

shape and size arguments for plotting on data-aware grids mwaskom/seaborn#310

Closed

onesandzeroes mentioned this issue Nov 24, 2014

ENH/VIS: Pass DataFrame column to size argument in DataFrame.scatter #8885

Closed

jreback changed the title ~~ENH/VIS: Pass DataFrame column to size argument in df.scatter~~ ENH/VIS: Pass DataFrame column to size argument in df.scatter Nov 24, 2014

jreback modified the milestones: 0.16.0, Someday Dec 6, 2014

jreback modified the milestones: 0.16.1, 0.16.0 Mar 5, 2015

jreback modified the milestones: 0.16.1, 0.17.0 Apr 28, 2015

jreback modified the milestones: Next Major Release, 0.17.0 Aug 15, 2015

TomAugspurger mentioned this issue Feb 26, 2016

DataFrame.plot.scatter() raises a TypeError when plotting bubble plots #12466

Closed

TomAugspurger mentioned this issue Jul 5, 2017

Scatter plot with colour_by and size_by variables #16827

Closed

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

TomAugspurger closed this as completed Jul 6, 2018

TomAugspurger modified the milestones: Contributions Welcome, No action Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH/VIS: Pass DataFrame column to size argument in df.scatter #8244

ENH/VIS: Pass DataFrame column to size argument in df.scatter #8244

TomAugspurger commented Sep 11, 2014

shoyer commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

shoyer commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

onesandzeroes commented Sep 11, 2014

onesandzeroes commented Sep 11, 2014

onesandzeroes commented Sep 12, 2014

TomAugspurger commented Jul 6, 2018

ENH/VIS: Pass DataFrame column to size argument in df.scatter #8244

ENH/VIS: Pass DataFrame column to size argument in df.scatter #8244

Comments

TomAugspurger commented Sep 11, 2014

shoyer commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

shoyer commented Sep 11, 2014

jorisvandenbossche commented Sep 11, 2014

onesandzeroes commented Sep 11, 2014

onesandzeroes commented Sep 11, 2014

onesandzeroes commented Sep 12, 2014

TomAugspurger commented Jul 6, 2018