Load entry point models #1951

Ivanidzo4ka · 2018-12-21T16:18:28Z

Address #1104
As long as NimbusML uses old version ml.net (pre 0.9) I don't have other way around other than this hacky way of loading it.

…orms.

TomFinley · 2019-01-02T18:07:15Z

src/Microsoft.ML.Data/DataLoadSave/TransformerChain.cs

+                }
+                catch
+                {
+                    var chain = ModelFileUtils.LoadPipeline(env, stream, new MultiFileSource(null), extractInnerPipe: false);


Usually when you do something that has a bad code smell (e.g., some sort of global try/catch of literally any type of exception whatsoever) it is either best to either (1) explain why it must be so or (2) try not to do it. :) Either is fine with me.

TomFinley · 2019-01-03T05:59:53Z

src/Microsoft.ML.Data/DataView/RowToRowMapperTransform.cs

+        /// <summary>
+        /// Returns parent transfomer which uses this mapper.
+        /// </summary>
+        ITransformer GetTransformer();


So, this one strikes me as a little mysterious. The reason why IRowMapper exists at all is to make constructing a single ITransformer more convenient -- it is analogous in some respects, in this way, to a more serious IValueMapper, and a more specific mapper than IRowToRowMapper. So why does this specific thing need it? I suspect that it doesn't really belong here.

I'm just trying to achieve my goal of loading models in old format in less invasive way.
Right now I can load CompositeLoader which contains list of RowToRowMapperTransforms, (I don't even know is it Normalize, Pca, or any other transform, just RowToRowMapperTransform) if transform was converted, or just instance of IDataTransform.
If it's RowToRowMapperTransform, I need a way to get real ITransformer out of it, and mapper (which is IRowMapper) is only thing which I can use.

So basically since Mapper is pretty much all the time know it's parent (ITransformer) I'm just adding this method to it in order to get it back during walk through CompositeLoader transforms.

In reply to: 244916610 [](ancestors = 244916610)

Sorry @Ivanidzo4ka if I have missed something -- that's entirely possible -- but what does that have to do with this interface? All that tells me is that by using CompositeDataLoader we used the wrong class, or it is using the wrong class, or something. I mean, if I had a structure that took IEnumerable, but then I realized I wanted in fact the IList or whatever, I would think, "ok, this structure should take an IList instead!" I wouldn't think, "gee, can't possibly change the structure, I should change IEnumerable to produce the IList which it really is, since practically that's going to be the source of the IEnumerable all the time." I mean, that just seems backwards.

In addition to that, it seems contrary to our plans for how we've previously discussed how to load old models. The plan, as I recall, was the following: things that had been saved using the old (now disfavored) IDataTransform interface models, we will attempt to load as the new, shiny, ITransformer implementations. So: a normalizer (for instance) saved as IDataTransform in old version, we will read that form as an ITransformer. If it is something we have deliberately chosen not to convert to an ITransform (e.g., a filter), then it is dropped. (Pursuant to #993.) We would load them as ITransformer or nothing. We wouldn't have some "liminal" state where we use the old structures (which are being retired!) before converting them to the "real" structures we actually want. We would just use the new structures directly.

Now, from what I see, we are not doing that. Instead, I see we are keeping the IDataTransform instances and all the other baggage associated with them, and continue to rely on their presence. I am in fact going to keep using not only IDataTransform, but this old IDataLoader interface implementation CompositeDataLoader. But, because what we really wanted to do was load ITransform instances, we are now providing a bunch of backdoors through these created objects to recover the original ITransform instances upon which they are now based, instead of just using ITransformer directly and being done with it. I mean, why is IDataTransform or IDataLoader entering the picture at all? Doesn't that just complicate our ultimate desire to do away with them? (Pursuant to #1995.)

I get that you wanted to get away with just re-using the old CompositeDataLoader, and so had to make a bunch of old compromises because of that choice, but the correct thing to do was not use it, is it not so?

This seems architecturally a little backwards to me.

In reply to: 245146290 [](ancestors = 245146290,244916610)

@TomFinley , this code was written by me and Ivan together, so I can probably answer.
There are three kinds of models that we are facing now:

'Clean TLC' models. These are saved via the old IDataLoader as CompositeDataLoader's.

'Clean ML.NET' models. These are transformer chains, saved as TransformerChain, and every serialized transformer there is bitwise equal to the representation in the first type.

'Poisoned ML.NET' models. These are transformer chains, but some (or all) of the transformers are actually WrappedTransformer's. So they INSIDE contain IDataTransform, and load these IDataTransform's.

I think it is totally feasible to take the type-1 model file and load it into ML.NET. This is a matter of replicating the load code of CompositeDataLoader, except load the chunks directly as ITransformers.

Unfortunately, the type-3 models cannot be treated the same way: they contain CompositeDataLoader inside themselves. These type-3 models do exist: in fact, NimbusML produces them right now.

With the above change, we are trying to achieve the following:

We wrote the 'unpack transforms of CompositeDataLoader into an ITransformer here. This allows us to load BOTH type-1 and type-3 models, and then subsequently save them as type-2 models.

After we release this version, it can be used as a 'transition version' that can convert all legacy model formats into the blessed one that can be loaded without any knowledge of IDataTransform.

After the transition version is out, we remove all the code to load legacy models, and only load the type-2 models now and into the eternity.

Given that this is an intermediate step, I believe it is acceptable to have this 'backwards' logic of 'get a transformer from a row mapper'. It is the code that 'unwraps' all the 'previously wrapped' transformers into 'normal' ones.

Let's say we remove the last WrappedTransformer in version 0.11. Then we can do the following.

In version 0.11, write a command 'maml UpgradeModel' that takes any model file and saves it back in the new ML.NET format.

In version 0.12, remove all the legacy loading code from TransformerChain.

At the same time we can remove this offending GetTransformer from IRowMapper.

For now, I guess we can comment on this method that it is only present to enable legacy loading and 'unwrapping'. Maybe even rename somehow to indicate this.

I hope I was clear :)

Ah OK, so this is intended to be a temporary situation. Could we somehow clarify this on IRowMapper interface? It just seems so odd, and the comment is not describing its "temporary nature" too well at the moment. Right now I see a fairly central interface with a method on it, I naturally think, "oh this must be important for this interface," but it actually isn't, it's just this sort of ugly thing we bolted on as a temporary situation, but that was not actually explained anywhere adequately.

The issue just says, "introduce polymorphic behavior in MLContext.Model.Load`" or words to that effect. Nothing about that polymorphic behavior being temporary, in exactly one "transitional" version of ML.NET.

So, something needs to change, either the description of the issue, or the documentation of this code, or something, to make this actually clear, because until I read your reply I found this situation entirely confusing.

I change description and modified comment to state what this is temporary solution.

@ivmatan@microsoft.com the world makes more sense to me after reading this comment. Shall we save this explanation as code comments in the TransformerChain.LoadFrom?

In reply to: 245194301 [](ancestors = 245194301)

Zruty0 · 2019-01-19T03:56:56Z

Is this work being abandoned? I viewed this as quite important, since Nimbus models were not loadable with ML.NET. Has that been updated?

codecov · 2019-01-25T00:13:55Z

Codecov Report

Merging #1951 into master will increase coverage by 0.06%.
The diff coverage is 74.76%.

@@            Coverage Diff             @@
##           master    #1951      +/-   ##
==========================================
+ Coverage   69.82%   69.89%   +0.06%     
==========================================
  Files         786      786              
  Lines      144185   144268      +83     
  Branches    16617    16635      +18     
==========================================
+ Hits       100684   100841     +157     
+ Misses      38954    38877      -77     
- Partials     4547     4550       +3

Flag	Coverage Δ
#Debug	`69.89% <74.76%> (+0.06%)`	⬆️
#production	`66.16% <71.87%> (+0.08%)`	⬆️
#test	`85% <100%> (ø)`	⬆️

TomFinley

sfilipi · 2019-01-28T22:22:22Z

@ivmatan@microsoft.com i see that you added a bunch of models, in the test/data/backcompat . Did you intend to use them in tests?

sfilipi

Ivan Matantsev added 6 commits December 19, 2018 17:04

First iteration. load models but need to work on not converted transf…

2a0f957

…orms.

don't save wrapped transforms

ccf8bd9

sync two machines

bc84d46

no multiple lines

c66a2eb

less doulbe lines

d626e93

merge with master

ad91963

Ivanidzo4ka changed the title ~~WIP Load entry point models~~ Load entry point models Dec 31, 2018

small cleanup

04c726f

Ivanidzo4ka requested a review from TomFinley January 2, 2019 18:04

TomFinley reviewed Jan 2, 2019

View reviewed changes

TomFinley reviewed Jan 3, 2019

View reviewed changes

Merge branch 'master' into Ivanidze/LoadEntryPointModels

6140d8b

Ivan Matantsev added 2 commits January 24, 2019 15:24

merge with master

bf69216

modify comment to be more clear regarding purpose of GetTransformer

e3bc457

Ivanidzo4ka requested a review from yaeldekel January 24, 2019 23:45

TomFinley approved these changes Jan 26, 2019

View reviewed changes

Ivanidzo4ka requested review from sfilipi, artidoro, abgoswam and zeahmed January 28, 2019 21:40

sfilipi approved these changes Jan 28, 2019

View reviewed changes

Merge with master

8507a1e

Ivanidzo4ka merged commit e78c255 into dotnet:master Jan 28, 2019

ghost locked as resolved and limited conversation to collaborators Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load entry point models #1951

Load entry point models #1951

Ivanidzo4ka commented Dec 21, 2018 •

edited

Loading

TomFinley Jan 2, 2019

TomFinley Jan 3, 2019

Ivanidzo4ka Jan 3, 2019

TomFinley Jan 3, 2019

Zruty0 Jan 4, 2019

TomFinley Jan 4, 2019 •

edited

Loading

Ivanidzo4ka Jan 24, 2019

sfilipi Jan 28, 2019

Zruty0 commented Jan 19, 2019

codecov bot commented Jan 25, 2019 •

edited

Loading

TomFinley left a comment

sfilipi commented Jan 28, 2019

sfilipi left a comment

Load entry point models #1951

Load entry point models #1951

Conversation

Ivanidzo4ka commented Dec 21, 2018 • edited Loading

TomFinley Jan 2, 2019

Choose a reason for hiding this comment

TomFinley Jan 3, 2019

Choose a reason for hiding this comment

Ivanidzo4ka Jan 3, 2019

Choose a reason for hiding this comment

TomFinley Jan 3, 2019

Choose a reason for hiding this comment

Zruty0 Jan 4, 2019

Choose a reason for hiding this comment

TomFinley Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Ivanidzo4ka Jan 24, 2019

Choose a reason for hiding this comment

sfilipi Jan 28, 2019

Choose a reason for hiding this comment

Zruty0 commented Jan 19, 2019

codecov bot commented Jan 25, 2019 • edited Loading

Codecov Report

TomFinley left a comment

Choose a reason for hiding this comment

sfilipi commented Jan 28, 2019

sfilipi left a comment

Choose a reason for hiding this comment

Ivanidzo4ka commented Dec 21, 2018 •

edited

Loading

TomFinley Jan 4, 2019 •

edited

Loading

codecov bot commented Jan 25, 2019 •

edited

Loading