Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sisyphus is too slow #90

Open
albertz opened this issue Jun 21, 2022 · 13 comments
Open

Sisyphus is too slow #90

albertz opened this issue Jun 21, 2022 · 13 comments

Comments

@albertz
Copy link
Member

albertz commented Jun 21, 2022

What I measure (what is most relevant for me): startup time of the sis manager, up to the prompt.

My current benchmark: i6_experiments.users.zeyer.experiments.chris_hybrid_2021.test_run. This creates a graph of 526 jobs.

I run this benchmark on a local computer with extremely fast FS, to separate the influence of slow FS at this point.

I would expect that the startup time takes a few millisecs, not more.

It took around 21 seconds the first time I tried this.
Now, via #84, #85, #87, it takes around 14 seconds for me.
For #85, I use GRAPH_WORKER = 1.

So, this issue here is to discuss why this specific case is still so slow, and what we can potentially do to improve it.
I guess other cases will be different, so maybe we should better discuss them separately, to not mix-up things. When the FS is slow, I think we also can improve a lot. I would still expect that this runs in maybe 2-3 seconds max. But let's discuss that separately.


For profiling, I current use austin, and then I visualize the output in VSCode. Not sure what you would recommend to use instead. I run:

echo n | sudo austin ./sis m config/exp2022_06_15_hybrid.py > sample7.austin

Or just timing:

time echo n | ./sis m config/exp2022_06_15_hybrid.py
@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

With current master (after #84, #85, #87), I get this flame graph:
Screenshot 2022-06-21 at 11 26 12

There is run_monophone_step, and below meta.System.train, which results in JobSingleton.__call__. Zoomed in on this:
Screenshot 2022-06-21 at 11 29 54
Note that JobSingleton.__call__ is called everywhere, however, of course with different arguments, and different job classes, so different __init__.

So there is sis_hash_helper which takes some time. (#86)

deepcopy also takes some time, and it's not really clear whether this is really needed.

But then also _sis_init takes time. Here specifically AlignmentJob.__init__, and there mostly log_file_output_path. Although I don't really see why this is slow. Zoomed in:
Screenshot 2022-06-21 at 11 36 12

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

deepcopy also takes some time, and it's not really clear whether this is really needed.

Ok, removing this gets me from 14 secs to 8 secs, so this has quite some impact. Also, the hash does not change for me. But I'm not sure if it might be dangerous in other cases. (@critias ?)

@JackTemaki
Copy link
Contributor

the deepcopy is just a security measure that people do not mess with the args passed to jobs. The problem is if people relied on this, this can break setups in a very ugly way.

@JackTemaki
Copy link
Contributor

JackTemaki commented Jun 21, 2022

Although I don't really see why this is slow.

But you plot shows why it is slow? Due to os calls. Or do you mean you do not know why the os calls are slow?

What wonders me much more is why log_file_output_path is called twice, this should not be.

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

Although I don't really see why this is slow.

But you plot shows why it is slow? Due to os calls. Or do you mean you do not know why the os calls are slow?

Yes, why are they slow? They just do some simple string manipulation. This should be very fast.
But also, it does not cover the full runtime. There might be some time lost elsewhere which is not shown.
I just don't really understand it.

@JackTemaki
Copy link
Contributor

JackTemaki commented Jun 21, 2022

This is join for python3.8 on posix-related systems:

def join(a, *p):
    """Join two or more pathname components, inserting '/' as needed.
    If any component is an absolute path, all previous path components
    will be discarded.  An empty last part will result in a path that
    ends with a separator."""
    a = os.fspath(a)
    sep = _get_sep(a)
    path = a
    try:
        if not p:
            path[:0] + sep  #23780: Ensure compatible data type even if p is null.
        for b in map(os.fspath, p):
            if b.startswith(sep):
                path = b
            elif not path or path.endswith(sep):
                path += b
            else:
                path += sep + b
    except (TypeError, AttributeError, BytesWarning):
        genericpath._check_arg_types('join', a, *p)
        raise
    return path

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

Yes I looked at os.path.join already. There is some overhead due to the fspath calls, and also the map, also the for loop, also the try-except logic around it. But I would actually not expect that this makes such a big difference. For testing, we could replace all os.path.join(a, b) by simply a + "/" + b and see how much faster we get. But as said, I don't fully understand it.

@JackTemaki
Copy link
Contributor

For me it brings about 1 second if I replace it in get_path and _sis_path

@critias
Copy link
Contributor

critias commented Jun 21, 2022

deepcopy also takes some time, and it's not really clear whether this is really needed.

Ok, removing this gets me from 14 secs to 8 secs, so this has quite some impact. Also, the hash does not change for me. But I'm not sure if it might be dangerous in other cases. (@critias ?)

As @JackTemaki mentioned this can have problematic side effects. e.g.:

Add(Job):
  def __init__(self, params):
    self.params = params
    self.out = self.output_var('out')
  def run(self):
    self.out.set(self.params['foo'] + self.params['bar'])
params = {'foo': 7, 'bar': 42}
a = Add(params).out
# Expected content of a is 49
params['foo'] = -21
# Now: a == 21

I would not expect the results to change after a Job was created. So if we want to support this as a speed up option we have to put a big warning sign next to it...

@critias
Copy link
Contributor

critias commented Jun 21, 2022

Yes I looked at os.path.join already. There is some overhead due to the fspath calls, and also the map, also the for loop, also the try-except logic around it. But I would actually not expect that this makes such a big difference. For testing, we could replace all os.path.join(a, b) by simply a + "/" + b and see how much faster we get. But as said, I don't fully understand it.

I didn't expect os.path.join to be that expensive! In any case, it would be nicer to use os.path.sep instead of "/" (not that I would expect Sisyphus to be runnable right now on a non Unix system, but anyway...)

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

I didn't expect os.path.join to be that expensive! In any case, it would be nicer to use os.path.sep instead of "/" (not that I would expect Sisyphus to be runnable right now on a non Unix system, but anyway...)

Note that Windows also supports "/". So I'm not sure what other exotic system we would really support here. I usually use "/". Also it makes it simpler/cleaner in f-strings.

@albertz
Copy link
Member Author

albertz commented Jun 21, 2022

deepcopy also takes some time, and it's not really clear whether this is really needed.

Ok, removing this gets me from 14 secs to 8 secs, so this has quite some impact. Also, the hash does not change for me. But I'm not sure if it might be dangerous in other cases. (@critias ?)

As @JackTemaki mentioned this can have problematic side effects. e.g.:
...
I would not expect the results to change after a Job was created. So if we want to support this as a speed up option we have to put a big warning sign next to it...

Maybe we can still get a lot of the speedup by doing a more clever deepcopy.

Some of the big objects I have are i6_core.util.MultiOutputPath and i6_core.rasr.config.RasrConfig.
I wonder if I can have a more clever __deepcopy__... Maybe by some copy-on-write logic. But not sure. The nested structure is problematic.

@critias
Copy link
Contributor

critias commented Jun 22, 2022

Note that Windows also supports "/". So I'm not sure what other exotic system we would really support here. I usually use "/". Also it makes it simpler/cleaner in f-strings.

True, this would be annoying with f-strings. Using os.path.sep was more of a principle thing, but it's probably not worth the effort at this point.

Side note: A while ago when I tried to introduced f-strings there where complains of some people still using Python 3.5. I'm considering to bump the minimal Python version to 3.6 to be able to use f-strings 🤔

Maybe we can still get a lot of the speedup by doing a more clever deepcopy.

You could consider implementing a __deepcopy__ method for problematic objects https://docs.python.org/3/library/copy.html#copy.deepcopy

Some of the big objects I have are i6_core.util.MultiOutputPath and i6_core.rasr.config.RasrConfig. I wonder if I can have a more clever __deepcopy__... Maybe by some copy-on-write logic. But not sure. The nested structure is problematic.

A copy on write logic would be tough to implement, you would have to catch all write access to the original object and everything nested inside of it. I don't think I would like to rely on it. In any case if you want to try, I found this attempt to implement CoW: https://github.com/bannsec/pyCoW
It should be working with Python 3.5, but the last commit is over 4 years old.

I'll take a look at the state update strategy, I already have a few ideas how to clean and speed up that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants