You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
defMapper(source: BaseNode[X], map_fn: Callable[[X], T]) ->"ParallelMapper[T]":
"""Returns a :class:`ParallelMapper` node with num_workers=0, which will execute map_fn in the current process/thread. Args: source (BaseNode[X]): The source node to map over. map_fn (Callable[[X], T]): The function to apply to each item from the source node. """returnParallelMapper(
source=source,
map_fn=map_fn,
num_workers=0,
)
I'm sure it has been discussed before making this decision. But I'd like to ask / discuss. An alternative would be something like this.
classMapper(ParallelMapper[T]):
"""Mapper applies map_fn to each item from source sequentially. This is a simplified version of ParallelMapper that operates in a single thread, equivalent to ParallelMapper with num_workers=0. Args: source (BaseNode[X]): The source node to map over. map_fn (Callable[[X], T]): The function to apply to each item from the source node. snapshot_frequency (int): The frequency at which to snapshot the state of the source node. Default is 1. prebatch (Optional[int]): Optionally perform pre-batching of items from source before mapping. For small items, this may improve throughput at the expense of peak memory. """def__init__(
self,
source: BaseNode[X],
map_fn: Callable[[X], T],
snapshot_frequency: int=1,
prebatch: Optional[int] =None,
):
# Call parent constructor with num_workers=0 and other params set to their simplest form# since we're doing sequential processingsuper().__init__(
source=source,
map_fn=map_fn,
num_workers=0, # Key difference - forces sequential processingin_order=True, # Always in order for sequential processingmethod="thread", # Method doesn't matter since num_workers=0multiprocessing_context=None, # Not used since no parallel processingmax_concurrent=None, # Not used since no parallel processingsnapshot_frequency=snapshot_frequency,
prebatch=prebatch,
)
defreset(self, initial_state: Optional[Dict[str, Any]] =None):
super().reset(initial_state)
ifinitial_stateisnotNone:
self._it.reset(initial_state[self.IT_STATE_KEY])
else:
self._it.reset()
defnext(self) ->T:
returnnext(self._it) # type: ignore[arg-type, union-attr]defget_state(self) ->Dict[str, Any]:
return {self.IT_STATE_KEY: self._it.state_dict()} # type: ignore[union-attr]
In my quick thought, this seems pretty good. What am I missing in the decision making process? While I'm not sure about it, the downside of the current approach is pretty clear. Mapper looks like a class but it is not. It confused me, and it would confuse other users too. For example, I tried to subclass Mapper and surely it didn't work.
The text was updated successfully, but these errors were encountered:
HI @keunwoochoi I think this makes a lot of sense, generally I personally try to avoid inheritance whenever possible but IMO this is a reasonable use and we could land this change.
At
0.10.1
,torchdata.nodes.Mapper
is a function, not a class.code
I'm sure it has been discussed before making this decision. But I'd like to ask / discuss. An alternative would be something like this.
In my quick thought, this seems pretty good. What am I missing in the decision making process? While I'm not sure about it, the downside of the current approach is pretty clear.
Mapper
looks like a class but it is not. It confused me, and it would confuse other users too. For example, I tried to subclassMapper
and surely it didn't work.The text was updated successfully, but these errors were encountered: