Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load entry point models #1951

Merged
merged 11 commits into from
Jan 28, 2019

Conversation

Ivanidzo4ka
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka commented Dec 21, 2018

Address #1104
As long as NimbusML uses old version ml.net (pre 0.9) I don't have other way around other than this hacky way of loading it.

@Ivanidzo4ka Ivanidzo4ka changed the title WIP Load entry point models Load entry point models Dec 31, 2018
@Ivanidzo4ka Ivanidzo4ka requested a review from TomFinley January 2, 2019 18:04
}
catch
{
var chain = ModelFileUtils.LoadPipeline(env, stream, new MultiFileSource(null), extractInnerPipe: false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually when you do something that has a bad code smell (e.g., some sort of global try/catch of literally any type of exception whatsoever) it is either best to either (1) explain why it must be so or (2) try not to do it. :) Either is fine with me.

/// <summary>
/// Returns parent transfomer which uses this mapper.
/// </summary>
ITransformer GetTransformer();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this one strikes me as a little mysterious. The reason why IRowMapper exists at all is to make constructing a single ITransformer more convenient -- it is analogous in some respects, in this way, to a more serious IValueMapper, and a more specific mapper than IRowToRowMapper. So why does this specific thing need it? I suspect that it doesn't really belong here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just trying to achieve my goal of loading models in old format in less invasive way.
Right now I can load CompositeLoader which contains list of RowToRowMapperTransforms, (I don't even know is it Normalize, Pca, or any other transform, just RowToRowMapperTransform) if transform was converted, or just instance of IDataTransform.
If it's RowToRowMapperTransform, I need a way to get real ITransformer out of it, and mapper (which is IRowMapper) is only thing which I can use.

So basically since Mapper is pretty much all the time know it's parent (ITransformer) I'm just adding this method to it in order to get it back during walk through CompositeLoader transforms.


In reply to: 244916610 [](ancestors = 244916610)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @Ivanidzo4ka if I have missed something -- that's entirely possible -- but what does that have to do with this interface? All that tells me is that by using CompositeDataLoader we used the wrong class, or it is using the wrong class, or something. I mean, if I had a structure that took IEnumerable, but then I realized I wanted in fact the IList or whatever, I would think, "ok, this structure should take an IList instead!" I wouldn't think, "gee, can't possibly change the structure, I should change IEnumerable to produce the IList which it really is, since practically that's going to be the source of the IEnumerable all the time." I mean, that just seems backwards.

In addition to that, it seems contrary to our plans for how we've previously discussed how to load old models. The plan, as I recall, was the following: things that had been saved using the old (now disfavored) IDataTransform interface models, we will attempt to load as the new, shiny, ITransformer implementations. So: a normalizer (for instance) saved as IDataTransform in old version, we will read that form as an ITransformer. If it is something we have deliberately chosen not to convert to an ITransform (e.g., a filter), then it is dropped. (Pursuant to #993.) We would load them as ITransformer or nothing. We wouldn't have some "liminal" state where we use the old structures (which are being retired!) before converting them to the "real" structures we actually want. We would just use the new structures directly.

Now, from what I see, we are not doing that. Instead, I see we are keeping the IDataTransform instances and all the other baggage associated with them, and continue to rely on their presence. I am in fact going to keep using not only IDataTransform, but this old IDataLoader interface implementation CompositeDataLoader. But, because what we really wanted to do was load ITransform instances, we are now providing a bunch of backdoors through these created objects to recover the original ITransform instances upon which they are now based, instead of just using ITransformer directly and being done with it. I mean, why is IDataTransform or IDataLoader entering the picture at all? Doesn't that just complicate our ultimate desire to do away with them? (Pursuant to #1995.)

I get that you wanted to get away with just re-using the old CompositeDataLoader, and so had to make a bunch of old compromises because of that choice, but the correct thing to do was not use it, is it not so?

This seems architecturally a little backwards to me.


In reply to: 245146290 [](ancestors = 245146290,244916610)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomFinley , this code was written by me and Ivan together, so I can probably answer.
There are three kinds of models that we are facing now:

  1. 'Clean TLC' models. These are saved via the old IDataLoader as CompositeDataLoader's.
  2. 'Clean ML.NET' models. These are transformer chains, saved as TransformerChain, and every serialized transformer there is bitwise equal to the representation in the first type.
  3. 'Poisoned ML.NET' models. These are transformer chains, but some (or all) of the transformers are actually WrappedTransformer's. So they INSIDE contain IDataTransform, and load these IDataTransform's.

I think it is totally feasible to take the type-1 model file and load it into ML.NET. This is a matter of replicating the load code of CompositeDataLoader, except load the chunks directly as ITransformers.

Unfortunately, the type-3 models cannot be treated the same way: they contain CompositeDataLoader inside themselves. These type-3 models do exist: in fact, NimbusML produces them right now.

With the above change, we are trying to achieve the following:

  1. We wrote the 'unpack transforms of CompositeDataLoader into an ITransformer here. This allows us to load BOTH type-1 and type-3 models, and then subsequently save them as type-2 models.
  2. After we release this version, it can be used as a 'transition version' that can convert all legacy model formats into the blessed one that can be loaded without any knowledge of IDataTransform.
  3. After the transition version is out, we remove all the code to load legacy models, and only load the type-2 models now and into the eternity.

Given that this is an intermediate step, I believe it is acceptable to have this 'backwards' logic of 'get a transformer from a row mapper'. It is the code that 'unwraps' all the 'previously wrapped' transformers into 'normal' ones.

Let's say we remove the last WrappedTransformer in version 0.11. Then we can do the following.

  1. In version 0.11, write a command 'maml UpgradeModel' that takes any model file and saves it back in the new ML.NET format.
  2. In version 0.12, remove all the legacy loading code from TransformerChain.
  3. At the same time we can remove this offending GetTransformer from IRowMapper.

For now, I guess we can comment on this method that it is only present to enable legacy loading and 'unwrapping'. Maybe even rename somehow to indicate this.

I hope I was clear :)

Copy link
Contributor

@TomFinley TomFinley Jan 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, so this is intended to be a temporary situation. Could we somehow clarify this on IRowMapper interface? It just seems so odd, and the comment is not describing its "temporary nature" too well at the moment. Right now I see a fairly central interface with a method on it, I naturally think, "oh this must be important for this interface," but it actually isn't, it's just this sort of ugly thing we bolted on as a temporary situation, but that was not actually explained anywhere adequately.

The issue just says, "introduce polymorphic behavior in MLContext.Model.Load`" or words to that effect. Nothing about that polymorphic behavior being temporary, in exactly one "transitional" version of ML.NET.

So, something needs to change, either the description of the issue, or the documentation of this code, or something, to make this actually clear, because until I read your reply I found this situation entirely confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change description and modified comment to state what this is temporary solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivmatan@microsoft.com the world makes more sense to me after reading this comment. Shall we save this explanation as code comments in the TransformerChain.LoadFrom?


In reply to: 245194301 [](ancestors = 245194301)

@Zruty0
Copy link
Contributor

Zruty0 commented Jan 19, 2019

Is this work being abandoned? I viewed this as quite important, since Nimbus models were not loadable with ML.NET. Has that been updated?

@Ivanidzo4ka Ivanidzo4ka requested a review from yaeldekel January 24, 2019 23:45
@codecov
Copy link

codecov bot commented Jan 25, 2019

Codecov Report

Merging #1951 into master will increase coverage by 0.06%.
The diff coverage is 74.76%.

@@            Coverage Diff             @@
##           master    #1951      +/-   ##
==========================================
+ Coverage   69.82%   69.89%   +0.06%     
==========================================
  Files         786      786              
  Lines      144185   144268      +83     
  Branches    16617    16635      +18     
==========================================
+ Hits       100684   100841     +157     
+ Misses      38954    38877      -77     
- Partials     4547     4550       +3
Flag Coverage Δ
#Debug 69.89% <74.76%> (+0.06%) ⬆️
#production 66.16% <71.87%> (+0.08%) ⬆️
#test 85% <100%> (ø) ⬆️

Copy link
Contributor

@TomFinley TomFinley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi
Copy link
Member

sfilipi commented Jan 28, 2019

@ivmatan@microsoft.com i see that you added a bunch of models, in the test/data/backcompat . Did you intend to use them in tests?

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Ivanidzo4ka Ivanidzo4ka merged commit e78c255 into dotnet:master Jan 28, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants