Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker for Debuggability Improvement #284

Open
3 of 7 tasks
ejguan opened this issue Mar 8, 2022 · 9 comments
Open
3 of 7 tasks

Tracker for Debuggability Improvement #284

ejguan opened this issue Mar 8, 2022 · 9 comments

Comments

@ejguan
Copy link
Contributor

ejguan commented Mar 8, 2022

🚀 Key Features

This is a tracker issue for the improvement for debuggability.

graph TD;
DP1-->DP2;
DP2-->DP3;
DP2-->DP4;
DP3-->DP5;
DP4-->DP6;
DP5-->DP6;
DP6-->output;
Loading

can be printed out as the following

>>> print_graph(traverse(dp6))
DP1 -> DP2 -> DP3 -> DP5
         \             \
         DP4 --------> DP6 ->

Nice to Haves

This section is tracking potential features that we may want

  • Handling mixed usage of IterDataPipe, MapDataPipe, torcharrow DataFrame
    • Are users able to clearly differentiate these when they are using a mixture of these classes?
  • Connect profiling result with graph
    • Different colors for nodes based on their performance (similar to TensorBoard)

Motivation, pitch

This would help our users and developers to easily understand what's going on with the pipeline.
Feel free to post more request for debuggability.

Alternatives

No response

Additional context

No response

cc: @NivekT @VitalyFedyunin

@NivekT
Copy link
Contributor

NivekT commented Mar 8, 2022

I made a similar Google Doc here. Not sure if it is easier to brainstorm using GitHub issue or the doc to flesh out what we want to do for each point.

@ejguan
Copy link
Contributor Author

ejguan commented Mar 8, 2022

We should keep design in google doc. Adding this issue for OSS users to gather their feedback

@pmeier
Copy link
Contributor

pmeier commented Mar 16, 2022

Regarding graph visualization: I needed this today and hacked something together:

from __future__ import annotations

import dataclasses
from typing import Optional, Any

import matplotlib.pyplot as plt
import networkx as nx
from torch.utils.data.graph import traverse


@dataclasses.dataclass(repr=False)
class Node:
    obj: Any
    child: Optional[Node] = None

    def __repr__(self):
        return type(self.obj).__name__

    def __hash__(self):
        return hash(self.obj)


def scan(graph, child=None):
    for node, parents in graph.items():
        current = Node(node, child)
        yield current
        yield from scan(parents, child=current)


def visualize_graph(dp):
    G = nx.DiGraph()
    for node in set(scan(traverse(dp))):
        if node.child is not None:
            G.add_edge(node, node.child)
    nx.draw_networkx(G)
    plt.show()

Simple example:

from torchdata.datapipes.iter import FileLister, FileOpener

dp = FileLister()
dp = FileOpener(dp).filter(bool).map(list)

visualize_graph(dp)

sinple

Complex example:

from torchvision.prototype import datasets

dp = datasets.load("coco")

visualize_graph(dp)

coco


I've used networkx as backend here, but we can use any graph visualization library. Let me know, if I should prettify the plots and send a PR.

@ejguan
Copy link
Contributor Author

ejguan commented Mar 16, 2022

Thanks @pmeier , This looks great.
For the backend lib, I personally would prefer tensorboard to visualize the graph because PyTorch is using it to visualize the model graph. https://pytorch.org/tutorials/intermediate/tensorboard_tutorial.html#inspect-the-model-using-tensorboard

Code reference from tensorboard about how to transform it into GraphDef proto: https://github.com/tensorflow/tensorboard/blob/b2b50a328f9bf6755a72cc6f9fd25bdd792b2bbe/tensorboard/plugins/graph/keras_util.py#L193-L203

@pmeier
Copy link
Contributor

pmeier commented Mar 18, 2022

One thing that would also simplify debugging, is for each datapipe to have a __repr__. We can follow the approach nn.Module is going:

class IterDataPipe:
    def extra_repr(self) -> str:
        return ""

    def __repr__(self) -> str:
        return f"{type(self).__name__.replace('IterDataPipe', '')}({self.extra_repr()})"


class MinimalIterDataPipe(IterDataPipe):
    pass


def my_map(x):
    return x


class MapperIterDataPipe(IterDataPipe):
    def __init__(self, fn):
        self.fn = fn

    def extra_repr(self):
        return self.fn.__name__


print(MinimalIterDataPipe())
print(MapperIterDataPipe(my_map))
Minimal()
Mapper(my_map)

That would also improve the graph visualization from #299, since each node could contain the __repr__ of the datapipe rather than just its name.

@ejguan
Copy link
Contributor Author

ejguan commented Mar 21, 2022

One thing that would also simplify debugging, is for each datapipe to have a __repr__. We can follow the approach nn.Module is going:

I would propose to change it to iter.ABC from ABCIterDataPipe because the same name can be shared by both iter and map DataPipe.

@pmeier
Copy link
Contributor

pmeier commented Mar 21, 2022

Not sure what you mean. Could you give an example? Why does the functional interface need special handling? For example, calling dp.map(...) internally simply does Mapper(dp, ...), right? If so, the returned datapipe will have a __repr__ if Mapper has one.

@ejguan
Copy link
Contributor Author

ejguan commented Mar 21, 2022

Yeah. I mean MapperMapDataPipe vs MapperIterDataPipe. If the whole pipeline contains both IterDataPipe and MapDataPipe ( I totally understand it's confusing ), we need to differentiate them in visualized graph/name.

iter_dp = IterableWrapper(range(10))  # IterDataPipe
iter_dp = iter_dp.map(fn)  # iter.Mapper
map_dp = IterToMapConverter(iter_dp)  # MapDataPipe
map_dp = map_dp.map(fn)  # map.Mapper

@pmeier
Copy link
Contributor

pmeier commented Mar 21, 2022

Maybe just color the nodes according to their type? Since it seems this is only relevant for the graph visualization, let's move the discussion to #299 if there is more to it.

NivekT added a commit to pytorch/pytorch that referenced this issue Apr 13, 2022
…ceptions are raised within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 13, 2022
…sed within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…ceptions are raised within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…sed within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…ceptions are raised within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…sed within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…ceptions are raised within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
NivekT added a commit to pytorch/pytorch that referenced this issue Apr 19, 2022
…sed within IterDataPipe"


This PR aims to improve the error message coming from `IterDataPipe`. Fixes the "differentiate DataPipe instances" part of pytorch/data#284.

For example, for this code snippet:
```python
from torchdata.datapipes.iter import IterableWrapper, Mapper

def no_op(x):
    return x

def exception_when_one(x):
    if x == 1:
        raise RuntimeError("x cannot equal to 1.")
    return x

map_dp = IterableWrapper(range(10))
map_dp = Mapper(map_dp, no_op)
map_dp = Mapper(map_dp, exception_when_one)
map_dp = Mapper(map_dp, no_op)
print(list(map_dp))
```

Here is the trace prior to this PR. It doesn't clearly state that a DataPipe has encountered an error and which DataPipe is responsible.
```
Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 369, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.
```

Here is the trace with this PR. In this version, it more clearly shows that a `MapperIterDataPipe` is throwing the exception and what the input arguments to the DataPipe are:
```
Traceback (most recent call last):
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 111, in __iter__
    yield self._apply_fn(data)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 76, in _apply_fn
    return self.fn(data)
  File "/Users/ktse/scratch/debugability.py", line 55, in exception_when_one
    raise RuntimeError("x cannot equal to 1.")
RuntimeError: x cannot equal to 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/ktse/scratch/debugability.py", line 63, in <module>
    print(list(map_dp))
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 395, in wrap_generator
    response = gen.send(request)
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/iter/callable.py", line 110, in __iter__
    for data in self.datapipe:
  File "/Users/ktse/pytorch/torch/utils/data/datapipes/_typing.py", line 406, in wrap_generator
    raise DataPipeException(msg, e) from e
torch.utils.data.datapipes._typing.DataPipeException: thrown by __iter__ of MapperIterDataPipe(datapipe=MapperIterDataPipe, fn=exception_when_one, input_col=NoneType, output_col=NoneType)
```

Improvements to make: make the traceback shorter, such as by skipping over parts related to `response = gen.send(request)` or `for data in self.datapipe`.

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants