Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The plot methods in Datasets should be flexible #116

Closed
calebrob6 opened this issue Sep 9, 2021 · 13 comments
Closed

The plot methods in Datasets should be flexible #116

calebrob6 opened this issue Sep 9, 2021 · 13 comments
Assignees
Labels
datasets Geospatial or benchmark datasets
Milestone

Comments

@calebrob6
Copy link
Member

As a user I will frequently want to plot images, predictions, masks, etc. The current definition of plot in RasterDataset is limited.

@adamjstewart adamjstewart added the datasets Geospatial or benchmark datasets label Sep 9, 2021
@adamjstewart adamjstewart self-assigned this Nov 3, 2021
@calebrob6
Copy link
Member Author

As a user I'll want to call plot either with a single sample from the Dataset object (e.g. in a Jupyter Notebook while doing EDA), or with a single sample + predictions (e.g. from a val_step in a LightningModule). In my mind the signature looks something like:

@staticmethod
def plot(sample: Dict[str, Tensor], titles=False, suptitle=None) --> plt.Figure??? (not sure what this type is):
    """Plots a single data sample from the dataset.

     Uses the "image", "mask", and optionally, "prediction" keys in sample
     to create a matplotlib Figure. The "prediction" key is optional, and, if included,
     will add another subplot to the Figure with a rendering of the predictions.

     Args:
         sample: a sample returned by `__getitem__` with an optional additional "prediction" key
         titles: flag indicating whether to render the titles above each subplot
         suptitle: optional super title

    Returns:
         matplotlib Figure object that can be immediately rendered or further customized
    """

@isaaccorley @adamjstewart thoughts?

@adamjstewart
Copy link
Collaborator

I agree with making the (optional) prediction be an entry in the sample dict.

As for titles and suptitle, I'm not sure if these are necessary. Since the figure is returned, users can always override these themselves.

What I would add is a **kwargs entry that gets passed to imshow (or whatever plotting function is used) that allows users to control exactly how it gets plotted, since that part can't be customized after the fact. Not sure if we need to have a different kwargs for each call to imshow or whether we can use the same kwargs for all.

Originally I was thinking about how to make plot handle both individual samples (returned by Dataset) and batches of data (returned by DataLoader), but I think we should just stick to the first for now. The latter is too complicated, and we can always add a function that reverses the collation to get just the first sample.

@calebrob6
Copy link
Member Author

Agree on the batching -- we could just run plot(...) in a loop for batch. Another thing to consider, if we are calling this method from val_step(...) then the samples will already be transformed (e.g. per channel standardization). How do we best deal with this? (I think an inverse preprocess makes sense)

Mild disagree on removing the titles/suptitle -- those are nice convenience features. Users could do those by themselves, but the point of a plot is to make it easy to get a quick rendering of the data.

What sort of **kwargs to imshow do you think would need to be passed. All that I can think of are the ones that should be hardcoded to properly display the imagery or labels (e.g. cmap, interpolation, vmin, vmax).

@adamjstewart
Copy link
Collaborator

To clarify, I think we should always add titles/suptitles, I don't think we need a parameter to control whether or not they get added.

For kwargs, I could see someone wanting a custom cmap maybe. imshow might not be the best example, but I'm not sure if there are any other plotting methods we might need.

@calebrob6
Copy link
Member Author

calebrob6 commented Nov 11, 2021

(I'm not understanding you) So you think title and suptitle should be non optional?

For kwargs, I can see where your coming from -- but now, how do they get routed to the correct imshow(...). E.g. for a plot(...) for a land cover dataset you will have an imshow(imagery) and also an imshow(mask, cmap=default_cmap, vmin=0, vmax=7, interpolation="none"). You would need to have separate kwargs routed to both of them. Then, for other datasets maybe you only have a single imshow(...).

@adamjstewart
Copy link
Collaborator

adamjstewart commented Nov 11, 2021

Yes, titles should be non-optional. I wonder if there is a way to remove them after they've been added...

We could require that kwargs has the same dict structure as sample such that **kwargs['mask'] returns the kwargs for mask plotting. Or we could just ignore kwargs for now until someone actually needs it.

@calebrob6
Copy link
Member Author

calebrob6 commented Nov 11, 2021

Gotcha, would renaming what I have as titles to show_titles that defaults to True (instead of False) work in this case?

I'm still a fan of having an optional suptitle argument that is just a shortcut for writing out a super title in the correct place/size.

Or we could just ignore kwargs for now until someone actually needs it.

Seems good!

@isaaccorley
Copy link
Collaborator

I don't really have too many opinions, but a couple thoughts:

  • kornia has a denormalize which can perform the opposite of Normalize given channel means and stds
  • torchvision.utils.make_grid is nice when you have a batch of tensors and want to create an array to plot them in a grid fashion. The output of this is a RGB tensor but can be used in plt.imshow.

@adamjstewart
Copy link
Collaborator

@calebrob6 my only complaint about that is that we'll need to add this arg to every plot method, copy the docs, and add an extra if-statement for it. It might be simpler to tell users to clear the title themselves if they really don't like it.

@calebrob6
Copy link
Member Author

Yep makes sense! I just see that as part of the normal work (and not like an extra burden).

I think we roughly agree (that plotting should happen in the dataset, that it should assume the values it gets are the same values that are returned by a call to getitem, and that we should return a figure).

The only tricky part I see is "denormalizing" samples in val_step(...) (or anywhere in a LightningModule). If a LightningModule knows about its DataModule then this should be fixed!

@adamjstewart
Copy link
Collaborator

I propose that we skip title, suptitle, kwargs, etc. and just have a single way to plot a sample from each dataset. If someone asks for more customization in the future, we can always add it later. Since these values are all optional, it won't be a breaking API change to add it later. I think it's better to keep it simple and add complexity only when we need to.

@calebrob6
Copy link
Member Author

Skipping titles is something I use often (which is why I added it -- e.g. I often create figures in draw.io where I stack several samples and add the titles / other annotation manually). suptitles is especially useful for adding the sample id of an image when debugging (e.g. if validation dataloader is on shuffle, it is useful to do this so you can find an interesting sample again).

@adamjstewart
Copy link
Collaborator

I think we've decided on a basic API for now, tracking of which datasets still need a new/updated plot method can happen in #253.

@adamjstewart adamjstewart added this to the 0.2.0 milestone Nov 20, 2021
@adamjstewart adamjstewart added utilities Utilities for working with geospatial data and removed utilities Utilities for working with geospatial data labels Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets
Projects
None yet
Development

No branches or pull requests

3 participants