-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotations - separate out plugins into top level folder #339
Conversation
plugins/hive/setup.py
Outdated
|
||
microlib_name = f"flytekitplugins-{PLUGIN_NAME}" | ||
|
||
plugin_requires = ["flytekit==0.16.0a2", "hmsclient>=0.0.1,<1.0.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be pinned right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new pip resolver has stopped accepting conflicting versions, and that means if plugin foo
requires ==0.17.0
while plugin bar
requires ==0.17.1
, pip will not install both foo
and bar
.
I think having lower and upper bounds instead of pinned version is what we should do.
plugins/sagemaker/setup.py
Outdated
@@ -0,0 +1,32 @@ | |||
from setuptools import setup | |||
|
|||
PLUGIN_NAME = "sagemaker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about putting all aws related plugins in one place? this the plugin is aws
and we can have aws.sagemaker
as the first thing in there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sure, but I think this will get pretty big. In the future we should do aws[sagemaker]
? but for now install everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess you are right, just keep sagemaker separate - then?
@wild-endeavor - WOW! Thank you for doing this. A few comments. |
Some thoughts around downsides:
We can add a boilerplate script to bootstrap a new plugin, and another script to bump versions.
I guess as long as we well document |
Regarding repositories, I think keeping all plugins in one repo is sufficient, given most cases the plugins being relatively small. Having too many repos may require a lot more maintenance, e.g. versioning, CI. I think it is also OK to postpone repo split at this moment, because that still seems like a luxury problem that we don't have to deal with immediately. Consider this as my Luigi maintenance experience, but of course it has very different setup. :) |
plugins/hive/setup.py
Outdated
version="0.1.0", | ||
author="flyteorg", | ||
author_email="admin@flyte.org", | ||
description="Your microlib descriton", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hive blabla
plugins/hive/setup.py
Outdated
|
||
microlib_name = f"flytekitplugins-{PLUGIN_NAME}" | ||
|
||
plugin_requires = ["flytekit==0.16.0a2", "hmsclient>=0.0.1,<1.0.0"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new pip resolver has stopped accepting conflicting versions, and that means if plugin foo
requires ==0.17.0
while plugin bar
requires ==0.17.1
, pip will not install both foo
and bar
.
I think having lower and upper bounds instead of pinned version is what we should do.
plugins/hive/tests/test_hive_task.py
Outdated
@@ -1,13 +1,13 @@ | |||
import pandas | |||
import pytest | |||
from plugins.hive.flytekitplugins.hive.task import HiveConfig, HiveSelectTask, HiveTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can understand this absolute import but still a bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh no this is wrong, i'll fix.
plugins/setup.py
Outdated
install_requires=[], | ||
cmdclass={ | ||
'install': InstallCmd, | ||
'develop': DevelopCmd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running pip install -e .
under plugins
but that didn't seem to install any package. Am I supposed to run python setup.py develop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python setup.py develop
doesn't work for me either. It logs everything being OK, but nothing gets installed, apart from flytekitplugins-parent
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need this step? https://github.com/vaexio/vaex/blob/master/setup.py#L33
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NVM I recreated my virtualenv and it is now working.
Co-authored-by: Honnix <honnix@users.noreply.github.com>
Co-authored-by: Honnix <honnix@users.noreply.github.com>
So @honnix it sounds like you don't think this PR is a horrible idea? I'll work on cleaning it up and addressing all the other comments today if that's the case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this work with the lazy loader stuff? can we refactor that too?
|
||
microlib_name = f"flytekitplugins-{PLUGIN_NAME}" | ||
|
||
plugin_requires = ["flytekit>=0.16.0a2"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
less than 1.0.0?
@@ -28,6 +28,13 @@ class Spark(object): | |||
spark_conf: Optional[Dict[str, str]] = None | |||
hadoop_conf: Optional[Dict[str, str]] = None | |||
|
|||
def __post_init__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
@@ -4,6 +4,7 @@ | |||
|
|||
from flyteidl.plugins.sagemaker import hyperparameter_tuning_job_pb2 as _pb2_hpo_job | |||
from flyteidl.plugins.sagemaker import parameter_ranges_pb2 as _pb2_params | |||
from flytekitplugins.aws.training import SagemakerBuiltinAlgorithmsTask, SagemakerCustomTrainingTask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not mean rename sagemaker
-> aws
, I meant move sagemaker
-> aws/sagemaker
This is the last remaining item we hope in the beta release, the flytekit plugin system. This is because this will affect how users install and import plugin tasks (like Spark or Pytorch, or plugin types like Pandera, etc.)
This PR attempts one possible solution, using namespace packages. For those unfamiliar, please take a look at this blog post first https://medium.com/@jherreras/python-microlibs-5be9461ad979 as this PR is heavily based off of that. Also see PEP 420 for more info. Also see the existing vaex setup which is where I found the blog posting.
We rethought the plugin code structure with the following in mind.
pip install flytekit[spark]
to continue to install the existing master-branchspark
functionality. But we don't have to install the new spark stuff, based on theannotations
branch.flytekitplugins-spark
.This PR
annotations
branchtaskplugins
code into separateflytekitplugins
namespace package folders, under the new top-level folder (outside of the highest levelflytekit
folder). For all intents and purposes, you can think of this folder as a separate repo.setup.py
file for each pluginsetup.py
for the top levelflytekitplugins
folder.Spark
task config dataclass to instantiate empty Spark and Hadoop configs if missing.Upsides of doing things this way.
Downsides of doing things this way.
setup.py
, requirements file, etc. If you want to bump the minimumflytekit
version (which now is an explicit dependency), you have to go in to each plugin's setup and bump the version.export PYTHONPATH=:$PYTHONPATH
to basically add the current directory to the list of places where Python searches for code. This is no longer possible for the plugins. This is because the location of the pytorch plugin is nowplugins/pytorch/flytekitplugins/pytorch/blah.py
. When using something defined there, if you just writefrom flytekitplugins.pytorch.blah import MyBlah
it won't find it. It'd only work if you writefrom plugins.pytorch.flytekitplugins.pytorch.blah import MyBlah
but of course you can't do that, because that won't work for after it's been deployed.pip install -e .
each plugin. The-e
means development mode install, and it basically adds an egg link from yoursite-packages
to the directory where you ran that command. This is fine, just more steps.papermilltests
under theplugins/tests
folder. The reason for this is because the tests were failing when the folder was just calledplugins/tests/papermill
. The reason it failed is because when pytest runs, thepapermill
package that is found ends up being thepapermill
folder in the test folder itself. When in the real code we writeimport papermill as pm
we end up getting this test package instead of the real papermill package.Additional notes.
Schema
type should be included when you dopip install flytekit
. I don't think we should make userspip install flytekitplugins-schema
-- it's just too central.