-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move optional dependencies to extras #454
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work @Amertz08 !!
I'm not sure about this but I think the imports in the code should now be in try except ? Like if someone installs the smart_open[gcs] the smart_open.s3 will throw an import error |
Test coverage keeps failing in 3.8. @Tyrannas technically not yet because this doesn't not install the AWS deps when you do |
We can still add it just so it's not forgotten though. |
I was thinking something like: try:
import boto3
except ImportError:
logging.getLogger('smart_open').info("Boto3 dependency is not installed, smart_open s3 will therefore not be working.") Using info instead of warning to avoid regular loggers to be spammed since logging level is by default warning. What do you think? Something like that |
@Tyrannas I think we already do this: The next step is to get the backends to automatically tell the users which packages they need, e.g. "to enable S3 support, run |
@mpenkov yeah but for example you are importing .s3 in smart_open So it's not totally out of scope since it will prevent smart_open to crash |
We can just use deprecation warnings. It's built into python. |
smart_open/s3.py
Outdated
import botocore.client | ||
import botocore.exceptions | ||
except ImportError: | ||
sys.stderr.write("Install via smart_open[aws] or smart_open[all] to use this module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this instead of a logger.info() ? Otherwise the message will be displayed all the time for every library that is not installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a least the s3 import needs to be taken out from init.py otherwise the actual version will sys.exit(1) without boto3 installed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logger will only print if the root logger is configured which is no guarantee.
As far as that import path in __init__.py
goes it would seem that it would need to be removed. Once those underlying dependencies are dropped from being installed I believe the import will cause issues. If sys.exit is dropped and the shortcut path maintained it would cause the import error to be suppressed here and thrown elsewhere. Whereas by maintaining the sys.exit the program exits exactly where the message to fix the issue is printed vs exiting with an import error elsewhere that doesn't immediately tell you what the issue is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would probably want a deprecation warning for the import path in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unaware of that import path so thank you for pointing that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so didn't see those imports in transport.py
. sys.exit
will cause an issue every time. So yeah this is a bigger issue than just the short import path in __init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move register_transport('smart_open.s3')
to after the try/except
block in each of the cloud modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ moving the register after try/except did not work so I moved it back. Also I modified the try/except block to reraise the ImportError
as the register_transport
would never see it w/o it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two separate problems.
- We import s3.iter_bucket in init.py. This will explode if boto3 is not installed. Currently, we always install boto3 (for backwards compatibility) so it's not an acute problem.
- When we try to register all the transports, we log a warning for each submodule we couldn't import. This will be annoying for people who intentionally didn't install dependencies for a particular submodule.
I think the solution for (1) is simple. When we stop installing boto3 by default, we can try-catch around the culprit import. We can also remove the import altogether, and direct people to use smart_open.s3.iter_bucket
via a warning.
I think the solution for (2) is to group the submodules that failed to register together, and then log a single warning like:
The following submodules could not be imported due to missing dependencies: abs, s3, gcs. To use those submodules, you must install their dependencies. See <URL> for more info.
Once people get used to the new extra handling (a few versions later), we can reduce the verbosity of the log message.
@Amertz08 I'd like to include your PR in a soon-to-be bugfix release of smart_open (we need to do one because the Google Cloud deps are messing things up for some people). Can you please roll it back to the state it was in when I approved it? I will then merge. We can continue our discussion and work in a later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Please roll back all commits after 0d86f9f. Don't force push, it will axe our discussions).
Yeah so I'm rethinking how to do the deprecation warnings. I was unaware of the |
IMPORTANT: this PR requires a migration path
We should think about how and when we will expose users to this change.
Motivation
Checklist
Before you create the PR, please make sure you have:
Workflow
Please avoid rebasing and force-pushing to the branch of the PR once a review is in progress.
Rebasing can make your commits look a bit cleaner, but it also makes life more difficult from the reviewer, because they are no longer able to distinguish between code that has already been reviewed, and unreviewed code.