-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load entry point models #1951
Load entry point models #1951
Conversation
} | ||
catch | ||
{ | ||
var chain = ModelFileUtils.LoadPipeline(env, stream, new MultiFileSource(null), extractInnerPipe: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually when you do something that has a bad code smell (e.g., some sort of global try/catch
of literally any type of exception whatsoever) it is either best to either (1) explain why it must be so or (2) try not to do it. :) Either is fine with me.
/// <summary> | ||
/// Returns parent transfomer which uses this mapper. | ||
/// </summary> | ||
ITransformer GetTransformer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this one strikes me as a little mysterious. The reason why IRowMapper
exists at all is to make constructing a single ITransformer
more convenient -- it is analogous in some respects, in this way, to a more serious IValueMapper
, and a more specific mapper than IRowToRowMapper
. So why does this specific thing need it? I suspect that it doesn't really belong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just trying to achieve my goal of loading models in old format in less invasive way.
Right now I can load CompositeLoader which contains list of RowToRowMapperTransforms, (I don't even know is it Normalize, Pca, or any other transform, just RowToRowMapperTransform) if transform was converted, or just instance of IDataTransform.
If it's RowToRowMapperTransform, I need a way to get real ITransformer out of it, and mapper (which is IRowMapper) is only thing which I can use.
So basically since Mapper is pretty much all the time know it's parent (ITransformer) I'm just adding this method to it in order to get it back during walk through CompositeLoader transforms.
In reply to: 244916610 [](ancestors = 244916610)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @Ivanidzo4ka if I have missed something -- that's entirely possible -- but what does that have to do with this interface? All that tells me is that by using CompositeDataLoader
we used the wrong class, or it is using the wrong class, or something. I mean, if I had a structure that took IEnumerable
, but then I realized I wanted in fact the IList
or whatever, I would think, "ok, this structure should take an IList
instead!" I wouldn't think, "gee, can't possibly change the structure, I should change IEnumerable
to produce the IList
which it really is, since practically that's going to be the source of the IEnumerable
all the time." I mean, that just seems backwards.
In addition to that, it seems contrary to our plans for how we've previously discussed how to load old models. The plan, as I recall, was the following: things that had been saved using the old (now disfavored) IDataTransform
interface models, we will attempt to load as the new, shiny, ITransformer
implementations. So: a normalizer (for instance) saved as IDataTransform
in old version, we will read that form as an ITransformer
. If it is something we have deliberately chosen not to convert to an ITransform
(e.g., a filter), then it is dropped. (Pursuant to #993.) We would load them as ITransformer
or nothing. We wouldn't have some "liminal" state where we use the old structures (which are being retired!) before converting them to the "real" structures we actually want. We would just use the new structures directly.
Now, from what I see, we are not doing that. Instead, I see we are keeping the IDataTransform
instances and all the other baggage associated with them, and continue to rely on their presence. I am in fact going to keep using not only IDataTransform
, but this old IDataLoader
interface implementation CompositeDataLoader
. But, because what we really wanted to do was load ITransform
instances, we are now providing a bunch of backdoors through these created objects to recover the original ITransform
instances upon which they are now based, instead of just using ITransformer
directly and being done with it. I mean, why is IDataTransform
or IDataLoader
entering the picture at all? Doesn't that just complicate our ultimate desire to do away with them? (Pursuant to #1995.)
I get that you wanted to get away with just re-using the old CompositeDataLoader
, and so had to make a bunch of old compromises because of that choice, but the correct thing to do was not use it, is it not so?
This seems architecturally a little backwards to me.
In reply to: 245146290 [](ancestors = 245146290,244916610)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomFinley , this code was written by me and Ivan together, so I can probably answer.
There are three kinds of models that we are facing now:
- 'Clean TLC' models. These are saved via the old
IDataLoader
asCompositeDataLoader
's. - 'Clean ML.NET' models. These are transformer chains, saved as
TransformerChain
, and every serialized transformer there is bitwise equal to the representation in the first type. - 'Poisoned ML.NET' models. These are transformer chains, but some (or all) of the transformers are actually
WrappedTransformer
's. So they INSIDE containIDataTransform
, and load theseIDataTransform
's.
I think it is totally feasible to take the type-1 model file and load it into ML.NET. This is a matter of replicating the load code of CompositeDataLoader
, except load the chunks directly as ITransformer
s.
Unfortunately, the type-3 models cannot be treated the same way: they contain CompositeDataLoader
inside themselves. These type-3 models do exist: in fact, NimbusML produces them right now.
With the above change, we are trying to achieve the following:
- We wrote the 'unpack transforms of
CompositeDataLoader
into anITransformer
here. This allows us to load BOTH type-1 and type-3 models, and then subsequently save them as type-2 models. - After we release this version, it can be used as a 'transition version' that can convert all legacy model formats into the blessed one that can be loaded without any knowledge of
IDataTransform
. - After the transition version is out, we remove all the code to load legacy models, and only load the type-2 models now and into the eternity.
Given that this is an intermediate step, I believe it is acceptable to have this 'backwards' logic of 'get a transformer from a row mapper'. It is the code that 'unwraps' all the 'previously wrapped' transformers into 'normal' ones.
Let's say we remove the last WrappedTransformer
in version 0.11. Then we can do the following.
- In version 0.11, write a command 'maml UpgradeModel' that takes any model file and saves it back in the new ML.NET format.
- In version 0.12, remove all the legacy loading code from
TransformerChain
. - At the same time we can remove this offending
GetTransformer
fromIRowMapper
.
For now, I guess we can comment on this method that it is only present to enable legacy loading and 'unwrapping'. Maybe even rename somehow to indicate this.
I hope I was clear :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK, so this is intended to be a temporary situation. Could we somehow clarify this on IRowMapper
interface? It just seems so odd, and the comment is not describing its "temporary nature" too well at the moment. Right now I see a fairly central interface with a method on it, I naturally think, "oh this must be important for this interface," but it actually isn't, it's just this sort of ugly thing we bolted on as a temporary situation, but that was not actually explained anywhere adequately.
The issue just says, "introduce polymorphic behavior in MLContext.Model.Load`" or words to that effect. Nothing about that polymorphic behavior being temporary, in exactly one "transitional" version of ML.NET.
So, something needs to change, either the description of the issue, or the documentation of this code, or something, to make this actually clear, because until I read your reply I found this situation entirely confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change description and modified comment to state what this is temporary solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivmatan@microsoft.com the world makes more sense to me after reading this comment. Shall we save this explanation as code comments in the TransformerChain.LoadFrom?
In reply to: 245194301 [](ancestors = 245194301)
Is this work being abandoned? I viewed this as quite important, since Nimbus models were not loadable with ML.NET. Has that been updated? |
Codecov Report
@@ Coverage Diff @@
## master #1951 +/- ##
==========================================
+ Coverage 69.82% 69.89% +0.06%
==========================================
Files 786 786
Lines 144185 144268 +83
Branches 16617 16635 +18
==========================================
+ Hits 100684 100841 +157
+ Misses 38954 38877 -77
- Partials 4547 4550 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivmatan@microsoft.com i see that you added a bunch of models, in the test/data/backcompat . Did you intend to use them in tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address #1104
As long as NimbusML uses old version ml.net (pre 0.9) I don't have other way around other than this hacky way of loading it.