Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #579 #580

Merged
merged 1 commit into from
Dec 9, 2022
Merged

fix #579 #580

merged 1 commit into from
Dec 9, 2022

Conversation

bung87
Copy link
Contributor

@bung87 bung87 commented Dec 8, 2022

fix #579

@bung87
Copy link
Contributor Author

bung87 commented Dec 9, 2022

@Vindaar @mratsim pls merge this and bump new version

@Vindaar Vindaar merged commit bef0fd6 into mratsim:master Dec 9, 2022
@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

sorry, this is wrong, it only result to un used import.

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

Ok, only saw this after commenting on the draft PR. I'll check out that PR and see if I understand the issue.

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

It can be fixed by changing proc numerical_gradient*[T](input:T, to proc numerical_gradient*[T: SomeNumber](input:T, but that's not araq expected. the matchesAux will make it matches non tensor version numerical_gradient, it only consider params, no the return type, so during processing semcheck it will throw error.

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

It doesn't require that. There are two overloads for the numerical_gradient. The first one is intended for the scalar case, so it shouldn't be just T anyway, but rather T: not Tensor (well, in the past the compiler deduced the types correctly, so I guess it may still be considered a bug?).

edit: yeah, thinking about this it should be clarified somewhere if overloads of the kind:

proc foo[T](x: T): T = discard
proc foo[T](x: Bar[T]): Bar[T] = discard

should be "valid" in general, i.e. the compiler should always correctly deduce to use the Bar[T] case when an argument is Bar[T]. What seems to be happening now is that the compiler sees the sum overload and decides "ah, the lambda return type must be a float", then picks the sum for Tensor[T] -> T and thus numerical_gradient cannot be satisfied anymore.

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

as you can see there's little bit ambiguous, Tensor[P] -> T , T: not Tensor not sure if this work, as Tensor lose generic param.

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

Yeah, the following does work:

proc numerical_gradient*[T: not Tensor](input: T, f: (proc(x: T): T), h = T(1e-5)): T {.inline.} =
  ...
proc numerical_gradient*[T](input: Tensor[T], f: (proc(x: Tensor[T]): T), h = T(1e-5)): Tensor[T] {.noinit.} =
  ....

and no lost type information. But as I mention in my edit in my last message, I'm not sure if the compiler shouldn't be smart enough to figure out the correct overload itself?

Why does it pick the sum with signature Tensor[T] -> T instead of Tensor[T] -> Tensor[T] ?

edit again: Ah, of course. We don't have a direct sum for Tensor[T] -> Tensor[T] without an additional axis argument!

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

Give me a few minutes to understand this...

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

it should go proc sum*[T](arg: Tensor[T]): T and proc numerical_gradient*[T](input: Tensor[T], f: (proc(x: Tensor[T]): T), h = T(1e-5)): Tensor[T] {.noinit.}

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

yeah, that's hard to understand for me, as all the test calls use proc => which return type are auto, and these two procs are both overloaded. so I made this wrong PR.

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

So:
numerical_gradient receives a Tensor[T] argument, which implies
the correct overload to choose is the

proc numerical_gradient*[T](input: Tensor[T], f: (proc(x: Tensor[T]): T), h = T(1e-5)): Tensor[T] {.noinit.}

overload. However, at this point the other overload using only T
still matches.

The lambda generated by => has auto input and output types. The
body of the numerical_gradient procedure calls the given lambda
using the input argument directly, so in the test case dinput is
used as lambda(dinput). As we know from the signature of the
expected lambda, the return type must be T for Tensor[T], in this
case float.

The call conv2d takes Tensor[T] arguments and returns Tensor[T]
itself (even if of another shape). This implies the conv2D return
type is Tensor[T] and sum therefore consumes Tensor[T].

As the only sum overload defined is Tensor[T] -> T, the lambda
therefore returns T. This implies the existing code satisfies the
required condition.

If however the compiler picks the overload

proc numerical_gradient*[T](input: T, f: (proc(x: T): T), h = T(1e-5)): T {.inline.} 

first, the type signatures cannot be satisfied anymore, as the lambda
does not return T given T (where T represents a Tensor[U]).

So the breakage in the nim compiler seems to be the inability to
choose the correct overload of the outer procedure
(numerical_gradient) based on a the (still correctly deduced) type of
the lambda (hence the compilation error message

Error: type mismatch: got 'float' for 'sum(conv2d(x, dkernel, dbias,
padding, stride, Im2ColGEMM))' but expected 'Tensor[system.float]'

To me it seems like the issue is that previously the compiler picked
the overload of in such generic cases after deducing the types of
the lambda, whereas now it first picks the "easiest" overload, then
deduces the type of the lambda and finally figures out the types don't
match (due to too eager overload resolution).

Does this make any sense?

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

In either case, clarifying the signature for the numerical_gradient taking only T to be T: not Tensor is better anyway, even just for readers of the code. I'm still not sure if the compiler shouldn't be smart enough to figure this out itself though (as it did in the past).

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

yes, exactly, your description is correct, I just tried T: not Tensor works, it's better change the signature for the project itsself. the only reason remain the old is it's a good test case for compiler.

@Vindaar
Copy link
Collaborator

Vindaar commented Dec 10, 2022

the only reason remain the old is it's a good test case for compiler.

Yeah, I agree. Will you add a test case for the behavior?

@bung87
Copy link
Contributor Author

bung87 commented Dec 10, 2022

yeah, I'll , it should be a test in nim repo not relys on package's

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants