Skip to content

Naming Conventions: "_data", "data", "values", "_values" #19294

Closed
@jbrockmendel

Description

@jbrockmendel

A bunch of different classes have one or more of the attributes data, _data, values, _values, plus an assortment of external_values, internal_values, formatting_values, get_values. These mean different things in different places.

Maintenance would be easier if the naming conventions were more uniform. Index has all four of these attributes and I'm not sure there exists a nice backwards-compatible way to reconcile them with the naming in Series/DataFrame. Any thoughts? Does anything else think this matters?

(Motivating example: "Where are all the places in the code that touch a BlockManager. Let's just grep for \.data...")

The lowest-hanging fruit for cleanup here is in the Accessor classes. StringAccessor, SeriesPlotMethods, and FramePlotMethods all define _data to point back to their parent Series/Index, Series, and Frame, respectively. I suggest that _data be replaced with just _parent. The other two existing accessors CategoricalAccessor and CombinedDatetimelikeProperties use categories and values for these, respectively. Ideally these would get standardized to _parent in the process.

Another option would be to change NDFrame._data to something like NDFrame._mgr so it there is little risk of name-overlap. I expect this would meet more resistance than the accessor cleanup idea.

Metadata

Metadata

Assignees

No one assigned

    Labels

    DeprecateFunctionality to remove in pandasInternalsRelated to non-user accessible pandas implementation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions