Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Look into whether parameter creation from a tensor leads to incorrect dispose scope statistics. #1397

Open
NiklasGustafsson opened this issue Oct 28, 2024 · 14 comments · May be fixed by #1398

Comments

@NiklasGustafsson
Copy link
Contributor

I think there's a problem handling the statistics for this situation (in Parameter.cs):

        public class Parameter : Tensor
        {
            /// <summary>
            /// Constructor
            /// </summary>
            /// <param name="data">A tensor, which will become empty.</param>
            /// <param name="requires_grad"></param>
            public Parameter(Tensor data, bool requires_grad = true) :
                base(data.with_requires_grad(requires_grad).MoveHandle())
            {
                var scope = data.OwningDisposeScope;
                if (scope is not null) {
                    this.OwningDisposeScope = scope;
                    scope.Attach(this);
                    scope.Detach(data);
                }
            }

            /// <summary>
            /// Constructor
            /// </summary>
            /// <param name="handle">A tensor handle.</param>
            internal Parameter(System.IntPtr handle) : base(handle)
            {
            }
        };

I don't have a small repo, but I don't think the stats are collected properly when a parameter takes over the native handle from a non-Parameter tensor and registers itself with its dispose scope.

@NiklasGustafsson
Copy link
Contributor Author

@mvphelps -- I can't assign this to you, but could you take a look at the issue?

@NiklasGustafsson
Copy link
Contributor Author

Maybe there should be a ReplaceWith(old,new) method on scopes for this purpose. It's a bit like move semantics in C++.

@mvphelps
Copy link
Contributor

mvphelps commented Oct 28, 2024

@NiklasGustafsson Sure happy to look at it. To clarify, the code you show above is not present in the main branch. So is this a proposed implementation for Parameter?

I do think this implementation makes perfect sense - much like PackedSequence, it should start owning the data tensor, and should also dispose it properly, all while managing the statistics correctly. Additionally the second ctor that accepts just the IntPtr handle for a Tensor, I'm thinking we just have the create the Tensor from that handle and then delegate to the other ctor. I'd also add another test fixture to deal specifically with stats for parameters.

Edit: Also I do not mean adding a complete new set of statistics for Parameters, these are still Tensors.

Is that what you are thinking? Just want to make sure I understand fully, before I go making the PR. Thank you.

Also, could you explain the MoveHandle method on Tensor, since it is being used here? I see the code and it is simple but I don't understand why it's necessary and it has no comments. My guess on what it's doing is something unobvious with trying to avoid having multiple references to the same handle? Essentially taking the handle for the original tensor, and now making it the handle for the Parameter instead? And this method ensures only one reference on the IntPtr so GC will be able to do it's thing?

@mvphelps
Copy link
Contributor

I like the ReplaceWith idea - so it would just move the scope from the old object to the scope on the new object, and null out the scope on the old. The nice thing with that is it wouldn't actually increment anything. Seems like a pretty clean way to do it.

@NiklasGustafsson
Copy link
Contributor Author

This code is already in the main branch. Where it is failing is in a branch I'm working on, where many of the Modules are reimplemented in C# and using functional APIs to get to the native code. That means that the code needs to create Parameter instances, which are tensors used as module parameters.

Parameter is a sub-class of Tensor, but you can't just downcast, so the construction of a Parameter takes a Tensor and then takes over its native tensor handle. This means that there needs to be a special treatment of the handle for this case.

Anyway, it all works, including the Attach/Detach pair, except that it seems to mess with the new, more precise statistics gathering that you introduced.

Here is, for example, the new constructor for Linear:

internal Linear(long inputSize, long outputSize, bool hasBias = true, Device? device = null, ScalarType? dtype = null) : base(nameof(Linear))
{
    this.in_features = inputSize;
    this.out_features = outputSize;

    weight = torch.empty(outputSize, inputSize, device: device, dtype: dtype).AsParameter();
    init.kaiming_uniform_(weight, a: _sqrt5);

    if (hasBias) {
        bias = torch.empty(outputSize, device: device, dtype: dtype).AsParameter();
        var (fanIn, _) = init.CalculateFanInAndFanOut(weight);
        var bound = fanIn > 0 ? 1 / Math.Sqrt(fanIn) : 0;
        init.uniform_(_bias, -bound, bound);
    }
    //NOTE: it's important not to call 'RegisterComponents' here.
}

and the bias and weight properties:

public Parameter? bias {
    get => _bias;
    set {
        _bias?.Dispose();
        _bias = value?.DetachFromDisposeScope() as Parameter;
        ConditionallyRegisterParameter(BiasComponentName, _bias);
    }
}

public Parameter weight {
    get => _weight!;
    set {
        if (value is null) throw new ArgumentNullException(nameof(weight));
        if (value.Handle != _weight?.Handle) {
            _weight?.Dispose();
            _weight = (value.DetachFromDisposeScope() as Parameter)!;
            ConditionallyRegisterParameter(WeightComponentName, _weight);
        }
    }
}

Note that parameters inside modules must not be in a dispose scope.

@mvphelps
Copy link
Contributor

I must be missing something. I do not see this on main at all, and I'm looking directly here on GitHub:
https://github.com/dotnet/TorchSharp/blob/main/src/TorchSharp/NN/Parameter.cs

What am I missing?

@NiklasGustafsson
Copy link
Contributor Author

You're right (again :-)). It's on my soon-to-be created PR. Forgot that it wasn't on main. Come back in an hour...

Sorry!

@mvphelps
Copy link
Contributor

No worries, it happens. I'm on Central timezone and it is near EOD for me, I'll check it out in the morning. Should be able to have a PR for you tomorrow.

@NiklasGustafsson
Copy link
Contributor Author

I'll try the 'ReplaceWith' approach and see if that works. Maybe that's all it takes.

@mvphelps
Copy link
Contributor

Ok great. I think the only thing missing from what you have is that the OwningDisposeScope is not being cleared on the tensor. So even if the Parameter is detached, the Tensor still is and will get wiped when the scope disposes.

Once you do that I think it should work.

@NiklasGustafsson
Copy link
Contributor Author

This is what it looks like (and it seems to work:

        /// <summary>
        /// Replaces registration of one tensor with another.
        /// </summary>
        /// <param name="original">The original tensor, possibly registered under a dispose scope.</param>
        /// <param name="replacement">The replacement tensor.</param>
        internal static void ReplaceWith(torch.Tensor original, torch.Tensor replacement)
        {
            DisposeScope? scope = original.OwningDisposeScope;

            if (scope != null && scope.Disposables.Remove(original)) {
                original.OwningDisposeScope = null;
                AddToOther(scope, replacement);
            }
        }

@NiklasGustafsson
Copy link
Contributor Author

See:

#1398

@mvphelps
Copy link
Contributor

mvphelps commented Oct 29, 2024

@NiklasGustafsson Looks good. I suggest adding the following test to TestDisposeScopesStatisticsTensor.cs. We don't need anything more comprehensive as this captures the unique behavior. Figured it would be easiest for you if I just pasted it here:

        [Fact]
        public void ParameterCreatedFromScopedTensorOnlyCountsDisposeForParameter()
        {
            var scope = torch.NewDisposeScope();
            var t = torch.tensor(3.0f);
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            var p = new TorchSharp.Modules.Parameter(t);
            //Stats should not change when converting the tensor to a parameter
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            t.Dispose();
            //The tensor doesn't own the lifetime now, the parameter does, so again no change.
            AssertTensorCounts(0, 0, 1, 0, 0, 0, 1);
            Assert.True(t.IsInvalid);
            Assert.False(p.IsInvalid);
            Assert.Equal(3.0f, p.ToSingle());
            scope.Dispose();
            //We can count the dispose when the parameter goes away.
            AssertTensorCounts(0, 0, 1, 1, 0, 0, 0);
            Assert.True(p.IsInvalid);
        }

@NiklasGustafsson
Copy link
Contributor Author

Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants