-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracker for Debuggability Improvement #284
Comments
I made a similar Google Doc here. Not sure if it is easier to brainstorm using GitHub issue or the doc to flesh out what we want to do for each point. |
We should keep design in google doc. Adding this issue for OSS users to gather their feedback |
Regarding graph visualization: I needed this today and hacked something together: from __future__ import annotations
import dataclasses
from typing import Optional, Any
import matplotlib.pyplot as plt
import networkx as nx
from torch.utils.data.graph import traverse
@dataclasses.dataclass(repr=False)
class Node:
obj: Any
child: Optional[Node] = None
def __repr__(self):
return type(self.obj).__name__
def __hash__(self):
return hash(self.obj)
def scan(graph, child=None):
for node, parents in graph.items():
current = Node(node, child)
yield current
yield from scan(parents, child=current)
def visualize_graph(dp):
G = nx.DiGraph()
for node in set(scan(traverse(dp))):
if node.child is not None:
G.add_edge(node, node.child)
nx.draw_networkx(G)
plt.show() Simple example: from torchdata.datapipes.iter import FileLister, FileOpener
dp = FileLister()
dp = FileOpener(dp).filter(bool).map(list)
visualize_graph(dp) Complex example: from torchvision.prototype import datasets
dp = datasets.load("coco")
visualize_graph(dp) I've used |
Thanks @pmeier , This looks great. Code reference from tensorboard about how to transform it into |
One thing that would also simplify debugging, is for each datapipe to have a class IterDataPipe:
def extra_repr(self) -> str:
return ""
def __repr__(self) -> str:
return f"{type(self).__name__.replace('IterDataPipe', '')}({self.extra_repr()})"
class MinimalIterDataPipe(IterDataPipe):
pass
def my_map(x):
return x
class MapperIterDataPipe(IterDataPipe):
def __init__(self, fn):
self.fn = fn
def extra_repr(self):
return self.fn.__name__
print(MinimalIterDataPipe())
print(MapperIterDataPipe(my_map))
That would also improve the graph visualization from #299, since each node could contain the |
I would propose to change it to |
Not sure what you mean. Could you give an example? Why does the functional interface need special handling? For example, calling |
Yeah. I mean
|
Maybe just color the nodes according to their type? Since it seems this is only relevant for the graph visualization, let's move the discussion to #299 if there is more to it. |
…ceptions are raised within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…sed within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…ceptions are raised within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…sed within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…ceptions are raised within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…sed within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…ceptions are raised within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
…sed within IterDataPipe" This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284. For example, for this code snippet: ```python from torchdata.datapipes.iter import IterableWrapper, Mapper def no_op(x): return x def exception_when_one(x): if x == 1: raise RuntimeError("x cannot equal to 1.") return x map_dp = IterableWrapper(range(10)) map_dp = Mapper(map_dp, no_op) map_dp = Mapper(map_dp, exception_when_one) map_dp = Mapper(map_dp, no_op) print(list(map_dp)) ``` Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible. ``` Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. ``` Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are: ``` Traceback (most recent call last): File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__ yield self._apply_fn(data) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn return self.fn(data) File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one raise RuntimeError("x cannot equal to 1.") RuntimeError: x cannot equal to 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/ktse/scratch/debugability.py", line 63, in <module> print(list(map_dp)) File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator response = gen.send(request) File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__ for data in self.datapipe: File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator raise DataPipeException(msg, e) from e torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType) ``` Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`. [ghstack-poisoned]
🚀 Key Features
This is a tracker issue for the improvement for debuggability.
traverse
function #299can be printed out as the following
[DataPipe] Improving debug message when exceptions are raised within IterDataPipe pytorch#75618
Provide unique name using incremental number for the instance of the same DataPipe class__len__
is invoked..fork
.shuffle
.demux
and similar pipes #210Nice to Haves
This section is tracking potential features that we may want
IterDataPipe
,MapDataPipe
, torcharrowDataFrame
Motivation, pitch
This would help our users and developers to easily understand what's going on with the pipeline.
Feel free to post more request for debuggability.
Alternatives
No response
Additional context
No response
cc: @NivekT @VitalyFedyunin
The text was updated successfully, but these errors were encountered: