Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use plugins to extend functionalities in Cylc #2959

Closed
4 tasks
kinow opened this issue Feb 19, 2019 · 14 comments
Closed
4 tasks

Use plugins to extend functionalities in Cylc #2959

kinow opened this issue Feb 19, 2019 · 14 comments
Assignees
Labels
speculative blue-skies ideas
Milestone

Comments

@kinow
Copy link
Member

kinow commented Feb 19, 2019

This has been on my mind for a while, since I started working on #2834 . This is a broader issue than #2835. #2835 would actually be a sub-issue of this one in Jira.

Current situation

In Cylc we have a few parts of the code where a functionality has a different behaviour depending on the user configuration. Some times it is referred as extension points, which Cylc has a few.

Users are able to make Cylc talk to a PBS server, or copy the handler for PBS and create their own for another existing scheduler (or even for a internally-patched scheduler if necessary).

Users have the suite.rc file, which is parsed by Jinja by default (i.e. jinja is a hard dependency of Cylc, and users must install it. Jinja is also the vanilla flavour for suites), but can extend it and use EmPy. Or they can look at how it was done, copy one of the .py files and extend more this feature, allowing other formats.

The problem

This approach has is a few issues for both users and developers. As a user, if I have a PBS implementation that behaves differently from what Cylc provides, I have a few ways to make it work. I can hack the PBS handler provided in Cylc. Which is problematic when upgrading (think that users may not version it in git/svn/etc, and eventually update and lose it).

A user could also create a new file, but then it means installing Cylc via pip, zip, etc, the user would have to remember to put the file in the folder expected by Cylc. And if we changed the folder for any reason, it would break their environment.

For a developer it can be problematic too. See #2958 for a recent example. Lately we have been discussing other formats for the suite (or workflow) file. If we consider that it could be possible to have 10 formats supported in Cylc, possibly contributed by users in different sites.

It would mean that now we will have a lot more of dependencies (regardless of optional) and the code for those dependencies in our code repository. Even though these dependencies are treated as optional, see how unit tests have to be written with the try { import } catch { skip test } pattern. And we also have to handle these dependencies in Travis, and in setup.py.

If One of these dependencies break with Python (the case for #2958) this could lead to other issues with our build in Travis and local developer workstations.

A possible solution

As the Perl motto goes, there's more than one way to do it. We can solve this in many different ways.

My favourite solution is using what is provided by Python out of the box, via setuptools and its entry points. This approach is used in PyLint, pytest, flask, and other frameworks.

See the example from the Flask documentation, which talks about plugins for commands executed in the command line.

image

The entry points in setuptools' setup.py of Cylc would define which entry points (or extension points) we support. We could have something like:

setup(
    name='cylc',
    ...,
    entry_points={
        'cylc.batch_sys_handlers': [],
        'cylc.suite_parsers': [], # we would define Jinja here by default
        ...
    }
)

This has a few benefits over our current approach. Users would have to create a small python project to support it. Without ever touching Cylc. In the example of EmPy, it could be done completely in a user repository, or in a repository under the cylc organisation in GitHub.

Then other users could install it via pip. Once installed, the entry point is automatically detected by setuptools. And then we are able to let the user enable/disable this via the user settings (as it is done right now).

So for a user, Cylc would come with less parts, and it would be easier to customize Cylc. An important advantage here is that it would reduce security risks too, as users would have only what's necessary for Cylc initially. Then build up on top of that making sure no extra code is ever included, unless necessary.

For developers, it's easier to maintain less code. And it's also hard to have code contamination. Which means that we wouldn't accidentally use some of the code in the EmPy files in other parts.

Of course for this solution, #2834 would have to be done first.

What can be a plugin

(Edit: Bruno 20190401, not a lie)

List of possible places to apply entry points in Cylc:

  • cylc.subprocpool.get_func
    • function used by the SubProcPool to find xtriggers (tangent, could reside in `cylc.xtrigger_mgr or cylc.xtriggers?)
    • tries to import files from lib/python or dynamically import cylc.xtriggers.$FUNCTION
  • Jinja2Filters (Make Jinja filters entry points #2835)
  • empy support
    • as well as other libraries
  • cylc.batch_sys_handlers
    • Batch system handlers do not share a common object parent, but implement certain methods. The scripts that contain the BATCH_SYS_HANDLER constant are used to import the supported systems. We can use entrypoints and remove this constant, making it easier to support more environments later.

Cheers
Bruno

@kinow kinow added the speculative blue-skies ideas label Feb 19, 2019
@kinow kinow self-assigned this Feb 19, 2019
@hjoliver
Copy link
Member

Sounds good to me @kinow !

@sadielbartholomew
Copy link
Collaborator

That's an important issue to register, Bruno, & I like your solution as it seems clean and good for maintenance.

As the Perl motto goes, there's more than one way to do it

I will add (off topic really, apologies) that this tickled me, as the Zen of Python says the exact opposite: "There should be one—and preferably only one—obvious way to do it." What is the poor multilingual programmer to endorse?

I think in terms of general problem solving, as here, the Perl statement is correct but I am on Team Python in terms of the implementation details, at least for what a general language should aim to establish.

@kinow
Copy link
Member Author

kinow commented Feb 20, 2019

Thanks @sadielbartholomew ! And on the Perl vs. Python, I will try to be diplomatic, and go with a language must find its balance in what its provides, its use cases, and users. I think Python has been changing lately because of the number of users now. Who would imagine that a language for Amoeba operational system would be taking the world and claiming some space even in the web development world! :-)

But we can move this discussion to Riot. Quite an interesting topic!

@TomekTrzeciak
Copy link
Contributor

@kinow, this seems like a nice solution to enable cylc customisation (never heard of setuptools entry points before). In regards to EmPy, it requires support in Rose as well, not just Cylc, so it would be good to wait for #1885 to happen first before getting this issue.

@matthewrmshin matthewrmshin added this to the cylc-8.0.0 milestone Feb 25, 2019
@oliver-sanders
Copy link
Member

+1

We have been talking about extension interfaces for other functionality as well (e.g. possibly for the future rose-suite.conf type file).

Beyond making installation a little more involved there is one major disadvantage which needs to be raised, it would become very easy for Cylc developers to break extensions that people are relying on. Especially as if we partition off batch schedulers, etc, Cylc would become pretty useless by itself.

So long as we keep these extensions close to hand and preferably include them in the Travis-CI tests? We should be OK.

@kinow
Copy link
Member Author

kinow commented May 3, 2019

From experimentation with a test cylc-jinja2fiters and some testing branches, found that things are still not working as expected, so a bit more of investigation/experimentation is necessary.

We cannot use native namespaces (most recent implementation, Py3.3+) with cylc, as for that we cannot have a __init__.py file in cylc package.

So other options are pkgutil implementation (Py2.3+), which I'm trying today, or pkg_resources (Setuptools), which doesn't seem to work with init.py either:

You must NOT include any other code and data in a namespace package’s init.py. Even though it may appear to work during development, or when projects are installed as .egg files, it will not work when the projects are installed using “system” packaging tools – in such cases the init.py files will not be installed, let alone executed.

@hjoliver
Copy link
Member

hjoliver commented May 5, 2019

@kinow - from our discussion on Friday, I think we can use native namespaces (which are the "modern" way to go, and as such likely the best bet for sustainability) ... if we use cylc/cylc-flow for the workflow service component instead of just cylc ... right? (and that has the advantage of being consistent with our treatment of the other components now too).

@kinow
Copy link
Member Author

kinow commented May 5, 2019

@hjoliver we would need to move the code in cylc-flow which is in the package cylc, to under cylc.flow. This way removing cylc.__init__.py and fulfilling the requirement for native namespaces, which is no __init__.py in the namespace package (we can still have __init__.py in other normal packages).

The major issue with this is that this would be a pull request touching several files, causing existing pull requests to be invalid, and have to be rebased. But being the most recent approach, and the only approach that does not require users to modify their __init__.py (pkg-util and pkg_resources both demand that you call some code in the init.py file), this would be my preferred option.

@hjoliver
Copy link
Member

hjoliver commented May 5, 2019

I agree, and touching the import statements in many files is fine (and easy to adapt to, for extant PRs).

@matthewrmshin
Copy link
Contributor

👍 to re-package of cylc-flow library to under cylc.flow.

@kinow
Copy link
Member Author

kinow commented May 7, 2019

👍 99% done in this branch https://github.com/kinow/cylc-flow/tree/incorporate-jinja2filters-2

I think I understood how native namespaces work. Only downside I found so far is that find_packages from setuptools does not work with that (they mention it in the documentation).

So you must specify all packages you want with packages=[...]. I finished some code to look for directories and it passed the tests, but it missed the jinja filters and needs to make sure __pycache__ is not included. Will fix it tomorrow, and then update the pull request for jinja2 (i.e. backup that branch in my fork, re-write the branch from this new branch, and push-force).

@hjoliver
Copy link
Member

hjoliver commented May 7, 2019

Great, and that seems a small downside ... we won't have a large number of packages.

@kinow
Copy link
Member Author

kinow commented Feb 19, 2020

Iris is an example of a Python module used by others here at NIWA, with plugins: SciTools/iris#3657. Might be useful reference to see what they did, why, issues they had, etc 👍

@kinow
Copy link
Member Author

kinow commented Apr 8, 2020

Closed by #3492 too, and #3560 has the part I had in mind for loading external plug-ins. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
speculative blue-skies ideas
Projects
None yet
Development

No branches or pull requests

6 participants