Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation between input_col and fn for Callable DataPipes #362

Closed
ejguan opened this issue Apr 18, 2022 · 10 comments
Closed

Add validation between input_col and fn for Callable DataPipes #362

ejguan opened this issue Apr 18, 2022 · 10 comments
Labels
good first issue Good for newcomers

Comments

@ejguan
Copy link
Contributor

ejguan commented Apr 18, 2022

🚀 The feature

Using inspect to validate the input_col matching fn input for Mapper, Flatmapper and other DataPipe using both input_col and fn as the arguments.

Motivation, pitch

This would help users to identify the wrong arguments at the construction time for such IterDataPipe rather than at iteration time.

Alternatives

No response

Additional context

No response

@ejguan ejguan added the good first issue Good for newcomers label Apr 18, 2022
@davidegavio
Copy link

I'd like to work on this if possible

@ejguan
Copy link
Contributor Author

ejguan commented Apr 19, 2022

@davidegavio
Thanks for taking interest. Go for it. For Mapper and Filter, you may need to create a PR in PyTorch Core. https://github.com/pytorch/pytorch

@bushshrub
Copy link
Contributor

bushshrub commented May 27, 2022

Was this ever worked on? If not I could take a stab at this.

@ejguan
Copy link
Contributor Author

ejguan commented May 31, 2022

Hi @davidegavio, do you still have plan to work on this issue?

@bushshrub
Copy link
Contributor

I'll give this a try when I get home

@bushshrub
Copy link
Contributor

@ejguan just to check, is this to check that fn has type annotations for returning a Sequence?

@ejguan
Copy link
Contributor Author

ejguan commented Jun 9, 2022

If the provided fn has type annotations, we should validate the number of function's arguments matches input_col.

Let's say there is only 1 argument in fn, but input_col is specified as 1. We should raise Error.

@bushshrub
Copy link
Contributor

What does input_col refer to in this scenario? Is it the dimension?

@ejguan
Copy link
Contributor Author

ejguan commented Jun 9, 2022

Yeah. Let's say you have a DataPipe yielding tuples like (1, 2), (3, 4), ....

dp = IterableWrapper([(1, 2), (3, 4), ...])
dp = dp.map(lambda x: x + 1, input_col=1)

And, input_col would specify which element in each tuple should be applied by the fn. In this case, as input_col=1, the function will apply to 2, 4, ....

We should add inline doc to explain the behavior of input_col

@bushshrub
Copy link
Contributor

Alright, I will work on this and open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
3 participants