Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up import #1948

Open
1 of 2 tasks
huard opened this issue Oct 8, 2024 · 5 comments
Open
1 of 2 tasks

Speed up import #1948

huard opened this issue Oct 8, 2024 · 5 comments
Assignees
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request priority Immediate priority
Milestone

Comments

@huard
Copy link
Collaborator

huard commented Oct 8, 2024

Addressing a Problem?

Import takes 2.5s on my laptop.

Benchmark using
python -X importtime test.py
where test.py is just import xclim

Potential Solution

  • Lazy import of virtual module (cf, anuclim, icclim): 0.2 s
  • Lazy import of xclim.indicators: 0.1 s
  • Lazy import of xclim.indices: 0.7 s

For reference, here are import times for some of our dependencies. Note that these numbers are only valid in the xclim context, you'd get different results by testing them individually, since they import each other.

  • xarray: 0.4
  • pint: 0.4
  • cf_xarray.units: .3
  • numba: .2
  • scipy.stats: .2
  • numpy: 0.1

Additional context

Code for lazy import (https://docs.python.org/3/library/importlib.html#implementing-lazy-imports)

import importlib.util
import sys
def lazy_import(name):
    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    loader.exec_module(module)
    return module

Note that if we lazy import indicators, then they're not in the xclim registry. So the virtual module creation, which relies on the registry, would need to trigger their import.

Contribution

  • I would be willing/able to open a Pull Request to contribute this feature.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@huard huard added the enhancement New feature or request label Oct 8, 2024
@aulemahal
Copy link
Collaborator

I ran the same tests and piped it through tuna, like I did in #1135 and here's a snapshot:

image

I fear that most time is not lost by loading indicators. xclim.indices shows up at the top only because of the order of operations. The longest-loading submodule seems to be in the fire indicators, and that might be numba jitting functions eagerly rather than lazily. Some gain could be made there.

@huard
Copy link
Collaborator Author

huard commented Oct 9, 2024

Regarding the load time of indices, what I did is I commented from indices import * in the __init__ and commented another side import of indices elsewhere in indicators.py. I computed the difference between the import time in this scenario and the base scenario.

@SarahG-579462
Copy link
Contributor

This would certainly help in #1955 , since the main slow-down for command-line tools is the import time for xclim (followed by the start-up time for python).

Would it be possible to have the register for indices (needed for the CLI) be created during pip install xclim or conda install xclim ? The numba functions could also be compiled here, if needed, couldn't they? we don't use ufuncs for them, which is one of the limitations of Ahead-of-time compilation

@aulemahal
Copy link
Collaborator

We could export a json of the indicators and parameters, on install. That would break the idea that "virtual submodule" are loaded live, but maybe that's ok for the CLI. However, as long as we don't change how the indicators modules are structured, I don't think that would improve anything else than xclim info, no? Once you found the indicator you want to run, "loading" it will result in importing the rest of the package anyway.

Another issue of ahead-of-time compilation is that we need to specify all possible signatures in advance, no? Not impossible, but seems sub-optimal.

@SarahG-579462
Copy link
Contributor

I think the command is xclim indices?

Indeed, and with the deprecation for numba.pycc coming, this doesn't seem like the best approach.

@Zeitsperre Zeitsperre added this to the v0.55.0 milestone Dec 10, 2024
@Zeitsperre Zeitsperre added priority Immediate priority dependencies Pull requests that update a dependency file labels Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request priority Immediate priority
Projects
None yet
Development

No branches or pull requests

5 participants