Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of returnn setup #55

Closed
wants to merge 1 commit into from
Closed

Conversation

JackTemaki
Copy link
Contributor

@JackTemaki JackTemaki commented Apr 7, 2022

This is the code I am currently using for constructing some of the test cases for both RETURNN-based ASR systems (in contrast to RASR-based) as well as TTS systems. The code is somewhat inspired by the existing helper functions of @mmz33 and older setups I used in the past.

This is nowhere near done or sufficiently documented, but please feel free to discuss this here while I am on vacation.

To get an idea how this code may be used, have a look at one of the prototype pipelines.

EDIT:
Additional information: All these helpers are independent of returnn_common and can be used for both legacy as well as returnn_common based networks. The returnn_common specific extensions of this will follow in another update, and will not be under common/setups/returnn but common/setups/returnn_common.

@albertz
Copy link
Member

albertz commented Apr 7, 2022

All these things look like they actually belong to the returnn_common repo, which was intended exactly for this, i.e. all common RETURNN helper code (independent of Sisyphus or any job management; except maybe that you use tk.Path instead of pathlib.Path which is not really relevant, we can just make it a Pathlike type).

Also, in returnn_common, there are actually already some very similar classes. E.g. look at returnn_common.datasets.asr.librispeech. And I have used them in some setups. E.g. look here for a full config (it still refers to the old location returnn-experiments/common, which has become returnn_common).

We should maybe sync that effort.

However, one aspect I'm not really sure about, which I somehow don't like that much: This always was incomplete, and it felt like I would just unnecessarily add another wrapper around all the classes in RETURNN. For the layer and net dict creation, this was a similar case. A compromise was to at least create the wrappers automatically so it's easy to keep them in sync and complete. Although after working with this a bit, this also has its downsides. I don't really have a good solution here but I just wanted to raise this point.

@JackTemaki
Copy link
Contributor Author

all common RETURNN helper code

This code here is, as noted, derived from my and Mohammads helper code. So this is also targeted for older setups that do not make use of RETURNN common. Exactly replicating old models with returnn_common is a task that will take a long time (So far I could never get closer than 1% WER difference to the old "hand written" model). I definitely want to keep the option to run some experiments without returnn_common dependency at least for the next months (also this is needed for some of my hiwis), so I will keep this code in my user folder anyway.

If you say having helpers independent of returnn_common is misleading/redundant, and I (including my hiwis) keep being the only persons using this, then I would discard this PR. ( I am not sure if @mmz33 is currently interested in looking at this)

@mmz33
Copy link
Member

mmz33 commented Apr 30, 2022

I personally would not mind to keep this kind of setup under my username. I also currently do not have any hiwis that could be affected by this PR. So we could instead implement these helpers (if not all are implemented yet) in returnn-common instead of here to not have redundant helpers. I actually thought first that returnn-common will be used only for network construction but now i see the point of @albertz.

@albertz
Copy link
Member

albertz commented May 2, 2022

This code here is, as noted, derived from my and Mohammads helper code. So this is also targeted for older setups that do not make use of RETURNN common.

I don't exactly understand. This PR here adds this code under this new location. So currently no setup exists which uses it this way.

Also, what exactly would be the difference for any setup whether you have this code in i6_experiments.common.setups.returnn.datasets or in returnn_common.datasets?

Exactly replicating old models with returnn_common is a task that will take a long time

I don't understand. I mean to just put this code which you have here in this PR to returnn_common.datasets instead of putting it to i6_experiments.common.setups.returnn.datasets.

How does that have any other influence? Why would that take a long time in the second case but not in the first case?

This is totally independent of returnn_common.nn. Maybe you confuse that now?

I definitely want to keep the option to run some experiments without returnn_common dependency at least for the next months (also this is needed for some of my hiwis), so I will keep this code in my user folder anyway.

I don't exactly understand. Why? Why is a dependency to i6_experiments.common.setups.returnn.datasets allowed but not to return_common.datasets? This seems like a pretty artificial constraints you are making up here?

Also, then later we would end up with two copies, or even worse two variants of the same code, in i6_experiments.common.setups.returnn.datasets and in returnn_common.datasets. I don't think this is a good idea.

If you say having helpers independent of returnn_common is misleading/redundant, and I (including my hiwis) keep being the only persons using this, then I would discard this PR. ( I am not sure if @mmz33 is currently interested in looking at this)

I'm speaking specifically about RETURNN helpers, i.e. exactly the purpose of returnn_common. Sure, we could also replicate returnn_common.nn in i6_experiments.common.setups.returnn.nn or so (and there is similar code already in some users dirs) but I don't really see the point of having it all duplicated. Then we could also just not use returnn_common at all and move all the code over to i6_experiments. But I don't think this would be a good idea.

I also currently do not have any hiwis that could be affected by this PR

I don't exactly understand the comment. How could anybody be affected by this PR? This PR adds code in a new location. So it does not change any existing code.

I actually thought first that returnn-common will be used only for network construction

No, this is not the case. It was supposed to contain all RETURNN related helpers (which can be treated independently of Sisyphus). In fact, returnn_common.nn came only later. We already had such dataset stuff in there before. It actually looks a bit like the code from this PR is even derived from it?

@JackTemaki
Copy link
Contributor Author

So currently no setup exists which uses it this way.

The idea was then to point the setups here instead of the private location. Of course you can move the same code to returnn_common (but maybe you do not want to have i6_core/sisyphus dependency there which you would get then, but this is another topic). The thing is that then those setups would get new dependencies for the manager (Returnn, Tensorflow) which are not needed right now.

I just talked to Benedikt, and I will just update the code at the old location for us (and Timur), so under my user folder. We will then continue to figure out how to integrate returnn_common into our existing setups, and as soon as we have something usable we can switch also the datasets dependency to returnn_common.

It was supposed to contain all RETURNN related helpers (which can be treated independently of Sisyphus)

So what do we do about the non-independent helpers? Only keeping those here and moving the rest to returnn_common?

@albertz
Copy link
Member

albertz commented May 2, 2022

but maybe you do not want to have i6_core/sisyphus dependency there which you would get then

No, why? I already argued above that the reason why I see this esp a much better fit for returnn_common is because this is independent from i6_core/sisyphus.

The thing is that then those setups would get new dependencies for the manager

No, why do you think so? Why does the code location (i6_exp vs returnn_common) has an influence on other dependencies (RETURNN, TF)?

So what do we do about the non-independent helpers?

Such as? ReturnnTrainingJob and others are already in i6_core. A Sisyphus pipeline based on RETURNN, this would stay in i6_experiments (obviously).

Only keeping those here and moving the rest to returnn_common?

I think we should maybe individually discuss this for each helper where it is not clear. But currently I don't really see cases where it is not already clear.

@JackTemaki
Copy link
Contributor Author

JackTemaki commented May 2, 2022

Such as?

tk.Path so that people to not accidentally use strings where it would cause issues.

And the add_global_statistics_to_audio_feature_datastream function, which uses the ExtractDatasetMeanStddevJob.

@albertz
Copy link
Member

albertz commented May 2, 2022

tk.Path so that people to not accidentally use strings where it would cause issues.

Yes, this is what I explained before. This is unnecessary and should be replaced by sth like Pathlike. returnn_common should not have a dependency on Sisyphus.

And the add_global_statistics_to_audio_feature_datastream function, which uses the ExtractDatasetMeanStddevJob.

And this can (should) be separated. This is an example for i6_experiment.

@JackTemaki
Copy link
Contributor Author

This is unnecessary

Why is this unnecessary? tk.Path is the only accepted and correct type for any file from another job (e.g. bpe, ogg.zip) you pass to the returnn config in the scope of a Sisyphus setup. Everything else will cause broken dependencies. Sisyphus uses the creator attribute of tk.Path to determine which jobs to wait on.

@albertz
Copy link
Member

albertz commented May 2, 2022

This is unnecessary

Why is this unnecessary? tk.Path is the only accepted and correct type for any file from another job (e.g. bpe, ogg.zip) you pass to the returnn config in the scope of a Sisyphus setup. Everything else will cause broken dependencies. Sisyphus uses the creator attribute of tk.Path to determine which jobs to wait on.

It is unnecessary for these dataset helpers. The exact same code of these dataset helpers can be used outside of Sisyphus. It is unnecessary that it must be a tk.Path. It would be totally valid (outside of Sisyphus) that it is a str or Pathlike or whatever.

That is the whole point I explained before. returnn_common is about generic RETURNN common code, which is independent of Sisyphus.

Again, as I explained before, RETURNN common pretty much already has very similar (but incomplete) dataset helpers. It is unnecessary in those helpers that we have tk.Path as type. This is only relevant when you use them for i6_experiments. But when you use them outside of i6_experiment, you don't want to restrict it to that type.

@JackTemaki
Copy link
Contributor Author

This is only relevant when you use them for i6_experiments. But when you use them outside of i6_experiment, you don't want to restrict it to that type.

But I do not care at all about any stuff outside Sisyphus. Yes, we do not want redundant code, but I want helpers that provide the correct interface for Sisyphus usage, not anything else. If this is then wrapping something from returnn_common, it can be fine, as long as there is no Tensorflow or Returnn dependency.

@albertz
Copy link
Member

albertz commented May 2, 2022

This is only relevant when you use them for i6_experiments. But when you use them outside of i6_experiment, you don't want to restrict it to that type.

But I do not care at all about any stuff outside Sisyphus. Yes, we do not want redundant code, but I want helpers that provide the correct interface for Sisyphus usage, not anything else.

But there is no contradiction here.

As explained above, you can simply have Pathlike. You simply would set Pathlike = tk.Path or so. But other users might want to set/use Pathlike differently. I don't see why you want to restrict it for other people. You can restrict it for you if you want by setting Pathlike = tk.Path. But the code it fully valid for other types as well.

If this is then wrapping something from returnn_common, it can be fine, as long as there is no Tensorflow or Returnn dependency.

This is your code. If you don't introduce the TF/RETURNN dependency, why should it be there?

@JackTemaki
Copy link
Contributor Author

If you don't introduce the TF/RETURNN dependency, why should it be there?

Because it might simplify things in the future if we want the helpers to create the correct Dims and Data directly. (There is one example for that here, but I am not really happy with that, this is why this is a draft).

But there is no contradiction here.

Ok, I am not super happy with splitting the helpers into two parts (with Sisyphus dependency and without), but we can do it. I will update my user code instead to continue working, and you can decide what from my code here should be moved into returnn_common.

@albertz
Copy link
Member

albertz commented May 2, 2022

If you don't introduce the TF/RETURNN dependency, why should it be there?

Because it might simplify things in the future if we want the helpers to create the correct Dims and Data directly.

I don't really understand. First you say, you don't want the RETURNN dependency. Now you say, you want it. So, what do you mean?

But there is no contradiction here.

Ok, I am not super happy with splitting the helpers into two parts (with Sisyphus dependency and without), but we can do it. I will update my user code instead to continue working, and you can decide what from my code here should be moved into returnn_common.

This is your code, so maybe you can better decide that?

But as said, isn't ist basically just everything? Except Sisyphus pipeline logic, like add_global_statistics_to_audio_feature_datastream.

Just everything which is independent of Sisyphus.

@JackTemaki
Copy link
Contributor Author

First you say, you don't want the RETURNN dependency

Wanting and needing are two separate things. The helpers were originally designed completely independent of returnn_common / returnn_common.nn (the later one did not even exist at that time). The dataset helpers are independent of both sisyphus and returnn_common.nn, but the datastream/data helpers (I noticed I pushed this incomplete, but this will be a separate PR) have siyphus dependencies in some parts, but if one want to use them together with returnn_common.nn, they would also have dependency on that, as you would want to get the correct Dims and Data object to your datastream. I added this for testing also in the code here. But now that it is not clear at all how we will design the interaction between Sisyphus and returnn_common.nn I am not sure if this code is not removed anyway.

So I will go with a different approach, and first add only things to i6_experiments that have a clear Sisyphus dependency, and figure out together with @Atticus1806 how we can actually work with returnn_common.nn in our setups. Because I do not like to push any code that does not come somehow (or directly) out of a running setup.

I will close this, as then we discuss this code either in the context of returnn_common or in the context of a PR that only has Sisyphus things.

@JackTemaki JackTemaki closed this May 3, 2022
@albertz
Copy link
Member

albertz commented May 3, 2022

returnn_common.nn

I don't really understand why you mention returnn_common.nn. This PR here, and also the dataset helpers in general are and will be totally independent of returnn_common.nn.

I thought this PR here is just about the dataset helpers? But if not, then I misunderstood that. Although, I think it would be good if we separate these things.

@Atticus1806
Copy link
Contributor

I think what Nick means is that even though in theory one could probably split some helpers and make them general, they come from the idea of being used in sisyphus. So the "dependency" is rather implicit in my opinion. For me it makes sense to have the helpers be tailored to sispyphus and not be too general, but this ofc. is discussable. The question might be how much this will be used without sisyphus.

I thought this PR here is just about the dataset helpers?

While dataset helpers are a bigger part of this PR its not the only thing (c.f. Datastreams) and possibly other similar helpers which might be added in the future (which would then be merged based on the decision here).

@albertz
Copy link
Member

albertz commented May 3, 2022

I think the discussion is a bit too abstract. I was specifically talking about what the PR introduces, for example OggZipDataset, ControlDataset, GenericDataset, ReturnnAudioFeatureOptions, Datastream, AudioFeatureDatastream, LabelDatastream. Despite the tk.Path there is no Sisyphus dependency and there doesn't need to be. This is generic code which can also be used outside, e.g. when you write a config directly.

returnn-common also already has such code, which is probably even older (March 2021, see here) than this code here, which does pretty much the same thing.

Only add_global_statistics_to_audio_feature_datastream and similar pipeline related helpers need Sisyphus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants