-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First draft of returnn setup #55
Conversation
All these things look like they actually belong to the returnn_common repo, which was intended exactly for this, i.e. all common RETURNN helper code (independent of Sisyphus or any job management; except maybe that you use Also, in returnn_common, there are actually already some very similar classes. E.g. look at We should maybe sync that effort. However, one aspect I'm not really sure about, which I somehow don't like that much: This always was incomplete, and it felt like I would just unnecessarily add another wrapper around all the classes in RETURNN. For the layer and net dict creation, this was a similar case. A compromise was to at least create the wrappers automatically so it's easy to keep them in sync and complete. Although after working with this a bit, this also has its downsides. I don't really have a good solution here but I just wanted to raise this point. |
This code here is, as noted, derived from my and Mohammads helper code. So this is also targeted for older setups that do not make use of RETURNN common. Exactly replicating old models with If you say having helpers independent of |
I personally would not mind to keep this kind of setup under my username. I also currently do not have any hiwis that could be affected by this PR. So we could instead implement these helpers (if not all are implemented yet) in |
I don't exactly understand. This PR here adds this code under this new location. So currently no setup exists which uses it this way. Also, what exactly would be the difference for any setup whether you have this code in
I don't understand. I mean to just put this code which you have here in this PR to How does that have any other influence? Why would that take a long time in the second case but not in the first case? This is totally independent of
I don't exactly understand. Why? Why is a dependency to Also, then later we would end up with two copies, or even worse two variants of the same code, in
I'm speaking specifically about RETURNN helpers, i.e. exactly the purpose of returnn_common. Sure, we could also replicate
I don't exactly understand the comment. How could anybody be affected by this PR? This PR adds code in a new location. So it does not change any existing code.
No, this is not the case. It was supposed to contain all RETURNN related helpers (which can be treated independently of Sisyphus). In fact, |
The idea was then to point the setups here instead of the private location. Of course you can move the same code to I just talked to Benedikt, and I will just update the code at the old location for us (and Timur), so under my user folder. We will then continue to figure out how to integrate
So what do we do about the non-independent helpers? Only keeping those here and moving the rest to |
No, why? I already argued above that the reason why I see this esp a much better fit for
No, why do you think so? Why does the code location (i6_exp vs returnn_common) has an influence on other dependencies (RETURNN, TF)?
Such as?
I think we should maybe individually discuss this for each helper where it is not clear. But currently I don't really see cases where it is not already clear. |
And the |
Yes, this is what I explained before. This is unnecessary and should be replaced by sth like
And this can (should) be separated. This is an example for i6_experiment. |
Why is this unnecessary? |
It is unnecessary for these dataset helpers. The exact same code of these dataset helpers can be used outside of Sisyphus. It is unnecessary that it must be a That is the whole point I explained before. returnn_common is about generic RETURNN common code, which is independent of Sisyphus. Again, as I explained before, RETURNN common pretty much already has very similar (but incomplete) dataset helpers. It is unnecessary in those helpers that we have |
But I do not care at all about any stuff outside Sisyphus. Yes, we do not want redundant code, but I want helpers that provide the correct interface for Sisyphus usage, not anything else. If this is then wrapping something from |
But there is no contradiction here. As explained above, you can simply have
This is your code. If you don't introduce the TF/RETURNN dependency, why should it be there? |
Because it might simplify things in the future if we want the helpers to create the correct
Ok, I am not super happy with splitting the helpers into two parts (with Sisyphus dependency and without), but we can do it. I will update my user code instead to continue working, and you can decide what from my code here should be moved into |
I don't really understand. First you say, you don't want the RETURNN dependency. Now you say, you want it. So, what do you mean?
This is your code, so maybe you can better decide that? But as said, isn't ist basically just everything? Except Sisyphus pipeline logic, like Just everything which is independent of Sisyphus. |
Wanting and needing are two separate things. The helpers were originally designed completely independent of So I will go with a different approach, and first add only things to I will close this, as then we discuss this code either in the context of |
I don't really understand why you mention I thought this PR here is just about the dataset helpers? But if not, then I misunderstood that. Although, I think it would be good if we separate these things. |
I think what Nick means is that even though in theory one could probably split some helpers and make them general, they come from the idea of being used in sisyphus. So the "dependency" is rather implicit in my opinion. For me it makes sense to have the helpers be tailored to sispyphus and not be too general, but this ofc. is discussable. The question might be how much this will be used without sisyphus.
While dataset helpers are a bigger part of this PR its not the only thing (c.f. Datastreams) and possibly other similar helpers which might be added in the future (which would then be merged based on the decision here). |
I think the discussion is a bit too abstract. I was specifically talking about what the PR introduces, for example returnn-common also already has such code, which is probably even older (March 2021, see here) than this code here, which does pretty much the same thing. Only |
This is the code I am currently using for constructing some of the test cases for both RETURNN-based ASR systems (in contrast to RASR-based) as well as TTS systems. The code is somewhat inspired by the existing helper functions of @mmz33 and older setups I used in the past.
This is nowhere near done or sufficiently documented, but please feel free to discuss this here while I am on vacation.
To get an idea how this code may be used, have a look at one of the prototype pipelines.
EDIT:
Additional information: All these helpers are independent of
returnn_common
and can be used for both legacy as well asreturnn_common
based networks. Thereturnn_common
specific extensions of this will follow in another update, and will not be undercommon/setups/returnn
butcommon/setups/returnn_common
.