rename features._Feature to datapoints._Datapoint #7002

pmeier · 2022-12-02T12:38:37Z

After some extensive offline discussions, we decided to rename the class from _Feature to Datapoint and the namespace from features to datapoints.

This is the result of trying to balance between something generic and something too narrow. Here are a few examples:

Our current naming scheme of "feature" is quite ambiguous and will be hard to google. At the same time, the phrase might also be too limiting if we consider the feature extraction context.
Naming schemes that prefix "Vision" like VisionTensor somewhat exclude things like labels, which are very much part of our API, but go far beyond vision tasks.
Naming schemes that use phrases like "trait", "element", "structure", are generic enough to bundle everything, but have little connection to the data they are trying to represent.

Apart from Datapoint we also had

DataTensor, torchvision.tensors
SampleTensor, torchvision.tensors

as candidates.

DataTensor was rejected since it is very generic (every tensor carries data) and is fairly close to the Tensor.data attribute from PyTorch core.

SampleTensor was rejected due to the "sample" phrase. Although it is technically correct that we are sampling something from a dataset here, it mostly carries the "random sampling" connotation that is to be avoided.

Finally, Datapoint is not perfect as well. After this PR, it basically describes the following scenario:

dataset = ...
sample = dataset[0]
datapoint = sample[0]

although one could argue that

dataset = ...
datapoint = dataset[0]

is also a valid interpretation. SampleTensor has this issue as well. That is something that we need to handle through documentation.

Per title. Refactoring was done through IDE in most cases. The ten extra affected lines (897 additions and 887 deletions) stem from the fact that datapoints._Datapoint is four characters longer features._Feature and in very few cases this was sufficient for an extra line break.

cc @vfdev-5 @datumbox @bjuncek

datumbox

@pmeier LGTM, thanks! I had a quick look, even though I understand you made the change with the IDE, just to be safe. Let's wait for green tests before merging.

torchvision/prototype/datapoints/_datapoint.py

torchvision/prototype/transforms/_augment.py

pmeier · 2022-12-05T13:33:38Z

d675ff4 also added quite a few extra lines, since we now either need two lines if we import Datapoint and something else like Image or use datapoints._datapoint.Datapoint instead of datapoints.Datapoint which also adds some extra line breaks.

datumbox · 2022-12-05T13:50:30Z

LGTM still. The failing test seems flaky. cc @toni057

vadimkantorov · 2022-12-08T21:47:17Z

I also proposed VisionModality in the past: #5045 (comment)

Datapoint IMO mostly makes think of input data points, e.g. input images or some abstract input data

Also, I saw that you put a bunch of transforms as instance method on _Datapoint class. I wonder what is the guidance in selecting the subset (especially for solarize/equalize which seemed quite arbitrary and not very special compared to tons of other image transforms). In theory, this list would only be growing. And if not very needed, maybe some other dispatch mechanism is better?

pmeier · 2022-12-09T07:01:05Z

I also proposed VisionModality in the past:

Yup, and we acknowledged so in #6753 (comment). The list above is far from exhaustive of the names that we had in our pool. In this particular case, "VisionModality" has the "vision" restriction explained above while "modality" is also being too disconnected from what it would be representing.

Also, I saw that you put a bunch of transforms as instance method on _Datapoint class. I wonder what is the guidance in selecting the subset (especially for solarize/equalize which seemed quite arbitrary and not very special compared to tons of other image transforms).

Right now, the rule of thumb is that all dispatchers should have an associated method on the class, but of course there are some exceptions for now. We agreed to flesh out this part in later iterations of the API.

In theory, this list would only be growing. And if not very needed, maybe some other dispatch mechanism is better?

Not sure what you are saying here? Yes, adding new kernels or dispatchers also entails new methods on the datapoint subclasses. This architecture leaves us the option for users to inject custom datapoint subclasses into our API if they just implement these methods as proposed in #6753 (comment). We agreed that this is a really neat feature to have and will go for it in the next iterations: #6753 (comment)

In general, please post you feedback in the dedicated thread: #6753. Otherwise it gets lost easily.

vadimkantorov · 2022-12-09T07:45:11Z

Right now, the rule of thumb is that all dispatchers should have an associated method on the class, but of course there are some exceptions for now. We agreed to flesh out this part in later iterations of the API.

IMO solarize/equalize and others are mostly related only to images, not to boxes or masks. so it's strange to see them on the base class _datapoint

vadimkantorov · 2022-12-09T10:35:55Z

Yup, and we acknowledged so in #6753 (comment).

I guess I missed it because I"ve unsubscribed from that thread some long time ago as I'm not believer in this proposed object-oriented design, and transforms treating automatically (?) all sorts of "modalities". I have a feeling that it will lead to over-engineered codes and sometimes surprising and hard to work-around behaviors about what exactly gets transformed and how to prevent it (sometimes we tag along with a input example some masks/boxes that we do not want to be transformed in any way as they correspond to ground truth and are for some visualization).

I will be happy to be wrong about this though.

In the meantime, I only hope that the purely-functional, "plumbing" functions that do not require any object-oriented wrappers
and accepting/returning plain-old-tensors are also available. However, I think you had told earlier that it will be the case. Because these functions are useful for all users (including use in libraries such as detectron2, mmdetection, albumentations, kornia, augly and others) even without buying into the huge redesign that's more than one year in the making. Whether the proposed design of object-orientation/transforms is significantly better other attempts in other libraries, remains to be proven by the future.

My feedback on this is not new, so nothing new to add :(

pmeier · 2022-12-09T11:25:04Z

I have a feeling that it will lead to over-engineered codes and sometimes surprising and hard to work-around behaviors about what exactly gets transformed and how to prevent it (sometimes we tag along with a input example some masks/boxes that we do not want to be transformed in any way as they correspond to ground truth and are for some visualization).

So far the feedback from others that have tried the new API has been positive on all accounts. Please have a look at the feedback thread and if you have concerns, please post them there.

In the meantime, I only hope that the purely-functional, "plumbing" functions that do not require any object-oriented wrappers
and accepting/returning plain-old-tensors are also available.

The new API has three layers, with the lowest level being the functional kernels. The are namespaced by attaching the supported type to the name, i.e. F.resize (dispatcher) and F.resize_image_tensor, F.resize_mask, F.resize_bounding_box, ... (kernels). They work on plain tensors only.

The dispatchers and transforms itself continue to work with plain tensors, which will be treated as images or videos where applicable. However, neither of them work with plain tensors for any other type like bounding boxes.

vadimkantorov · 2022-12-09T11:40:46Z

The new API has three layers, with the lowest level being the functional kernels. The are namespaced by attaching the supported type to the name, i.e. F.resize (dispatcher) and F.resize_image_tensor, F.resize_mask, F.resize_bounding_box, ... (kernels). They work on plain tensors only.

Great! These are all I need: image kernels, various box/mask format conversion/aug kernels :) Hope the parts of transforms that are sampling the random params are also available as staticmethods.

So far the feedback from others that have tried the new API has been positive on all accounts

We will see how it plays out in the future. I hope it's successful! My personal experience reading codes using detectron2/mmdetection have been so far that it's overcomplicated: too many wrappers and layers of indirection and fragile outside of the standard task, but of course it's also a matter of personal taste and of specific task at hand.

Summary: * rename features._Feature to datapoints.Datapoint * _Datapoint to Datapoint * move is_simple_tensor to transforms.utils * fix CI * move Datapoint out of public namespace Reviewed By: datumbox Differential Revision: D41836898 fbshipit-source-id: ff11dcb220346d98d07c807a12f3b3e59fba6146

rename features._Feature to datapoints.Datapoint

64cbedd

pmeier added module: transforms code quality prototype labels Dec 2, 2022

pmeier requested review from vfdev-5 and datumbox December 2, 2022 12:38

facebook-github-bot added the cla signed label Dec 2, 2022

datumbox approved these changes Dec 2, 2022

View reviewed changes

torchvision/prototype/datapoints/_datapoint.py Outdated Show resolved Hide resolved

torchvision/prototype/transforms/_augment.py Outdated Show resolved Hide resolved

pmeier added 5 commits December 2, 2022 14:58

_Datapoint to Datapoint

8395bf8

move is_simple_tensor to transforms.utils

074d71f

fix CI

5dc222b

move Datapoint out of public namespace

d675ff4

Merge branch 'main' into rename-features

a62022c

pmeier marked this pull request as ready for review December 5, 2022 13:16

pmeier mentioned this pull request Dec 5, 2022

[FEEDBACK] Transforms V2 API #6753

Closed

pmeier merged commit a8007dc into pytorch:main Dec 5, 2022

pmeier deleted the rename-features branch December 5, 2022 14:48

pmeier mentioned this pull request Jan 20, 2023

update naming feature -> datapoint in prototype test suite #7117

Merged

pmeier mentioned this pull request Feb 2, 2023

Current way to use torchvision.prototype.transforms #7168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rename features._Feature to datapoints._Datapoint #7002

rename features._Feature to datapoints._Datapoint #7002

pmeier commented Dec 2, 2022 •

edited

Loading

datumbox left a comment

pmeier commented Dec 5, 2022

datumbox commented Dec 5, 2022

vadimkantorov commented Dec 8, 2022 •

edited

Loading

pmeier commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022 •

edited

Loading

pmeier commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022 •

edited

Loading

rename features._Feature to datapoints._Datapoint #7002

rename features._Feature to datapoints._Datapoint #7002

Conversation

pmeier commented Dec 2, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

pmeier commented Dec 5, 2022

datumbox commented Dec 5, 2022

vadimkantorov commented Dec 8, 2022 • edited Loading

pmeier commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022 • edited Loading

pmeier commented Dec 9, 2022

vadimkantorov commented Dec 9, 2022 • edited Loading

pmeier commented Dec 2, 2022 •

edited

Loading

vadimkantorov commented Dec 8, 2022 •

edited

Loading

vadimkantorov commented Dec 9, 2022 •

edited

Loading

vadimkantorov commented Dec 9, 2022 •

edited

Loading