Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve virtualenv support & egg-link resolution #562

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions jedi/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
from jedi.evaluate.helpers import FakeName, get_module_names
from jedi.evaluate.finder import global_names_dict_generator, filter_definition_names
from jedi.evaluate import analysis
from jedi.evaluate.sys_path import get_venv_path

# Jedi uses lots and lots of recursion. By setting this a little bit higher, we
# can remove some "maximum recursion depth" errors.
Expand Down Expand Up @@ -75,7 +76,8 @@ class Script(object):
:type encoding: str
"""
def __init__(self, source=None, line=None, column=None, path=None,
encoding='utf-8', source_path=None, source_encoding=None):
encoding='utf-8', source_path=None, source_encoding=None,
sys_path=None):
if source_path is not None:
warnings.warn("Use path instead of source_path.", DeprecationWarning)
path = source_path
Expand Down Expand Up @@ -109,7 +111,11 @@ def __init__(self, source=None, line=None, column=None, path=None,
self._parser = UserContextParser(self._grammar, self.source, path,
self._pos, self._user_context,
self._parsed_callback)
self._evaluator = Evaluator(self._grammar)
if sys_path is None:
venv = os.getenv('VIRTUAL_ENV')
if venv:
sys_path = list(get_venv_path(venv)) + sys.path
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if VIRTUAL_ENV is set, why is it not enough to just take the current sys.path? I have never really understood this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to have jedi installed somewhere once per system (or per user) and to have it look into various virtual environments without installing jedi in all of these environments. In Emacs one of the easiest ways to accomplish this (without pulling virtualenv names through all the APIs) is to launch jedi process with this extra environment variable whenever I change the venv I'm working on.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you change the virtualenv variable, wouldn't the sys.path also change automatically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. AFAIR, sys.path is generated once at the startup and the virtualenv magic really relies on looking up candidate directories relative to the python interpreter location. Here's an article that probably can explain it better than I can (especially, within the scope of a comment): http://mikeboers.com/blog/2014/05/23/where-does-the-sys-path-start

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I've read it, very good article! I think I finally understand why site.py exists and what it does.

If you start a new process with Jedi in an activated virtualenv, wouldn't we see the correct sys.path? Are you talking about a changing just the VIRTUAL_ENV variable without restarting the process?

In general, have you read the idea of #385, to make virtualenv support easier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this some thought at the time and I couldn't find a way around the client being able to supply the requested sys.path to Jedi in public API. Now, to my liking, full sys_path is too long for a proper interface parameter and most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better. For example, we could add venvs=[path1, path2] to use those venvs and extra_sys_path=[path1, path2] just in case someone wants, say, to additionally use some source code checkouts without installing them into the said venvs.

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic:
it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards, but as long as jedi has no external dependencies, we can just start up PATH/TO/VENV/bin/python PATH/TO/JEDI/portable.py with portable.py adding its location to sys.path and then running jedi as usual. That way it will have more binary compatibility with code in the venv.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of the time it will refer to one or more virtualenvs, so probably specifying virtualenv paths will look better.

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

If this API addition goes through, I'm not sure if it is worth keeping the VIRTUAL_ENV logic: it boils down to having both VIRTUAL_ENV's and jedi's sys.path in one interpreter and this is better accomplished the other way around. I mean, that logic starts with jedi on sys.path and then tries to add venv path afterwards,

Hmm I know what you mean. I personally think that adding sys_path as an API is really valuable, because some people really want to play with it. Most of the time this is not the case and we should focus on what happens when it is None. This still includes virtual envs. However, I agree totally with the portable.py idea. I think Jedi should be started with just the virtualenv as its path. This is what I'm talking about when I talk about Jedi's future async server/client structure. You name a certain virtualenv and Jedi starts a new process that includes that virtualenv. This way, we don't have segfaults and all other crazy stuff...

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

~ Dave

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it refer to multiple virtualenvs? That's not something that happens in Python itself?

Yes, that was a mistake. Some time ago I tried conserving precious SSD space using a virtualenv for project dependencies and another virtualenv for common tools, like ipython, and pudb, and profilers, and whatnot. But then I realized that with conda you don't need that as it hard-links packages where possible.

What do you think? This would however mean that we don't include this pull request, we would rather just add the sys_path logic. And BTW this is not to say that your work is bad. I'm really thankful for it. It taught me a lot! :)

I am not completely against ditching this PR altogether, although I do think that this PR offers better support of the current Jedi workflow. It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon. But if it is and there's no time window during which this PR can be useful then feel free to close it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also doesn't seem to me that the new, arguably more correct, workflow involving portable.py and dropping all that sys_path magic is coming soon.

Unfortunately and probably true :( Can you quote yourself from earlier how this PR improves virtualenvs? I still don't see it :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you quote yourself from earlier how this PR improves virtualenvs?

  • fetching sys_path from a live interpreter in the specified virtualenv, if possible
  • if online fetching fails, offline detector now additionally follows .pth links (cannot seem to find the addsitedir invocation in the old code)
  • egg-link files are consulted in all new path directories, not just the site-packages one

Ok. I don't see the advantage from fetching it from a different interpreter (ecept for pth stuff). However, I feel that pth is something we shouldn't really execute like this, because it can contain arbitrary Python code and I'm not sure if I want to execute that (addsitedir does this).

What are "new path directories"? I'm strongly for splitting up this PR, because it tries to change too many things at once. We also needs tests. It would be way less messy, then.

self._evaluator = Evaluator(self._grammar, sys_path=sys_path)
debug.speed('init')

def _parsed_callback(self, parser):
Expand Down
10 changes: 9 additions & 1 deletion jedi/evaluate/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@
"""

import copy
import sys
from itertools import chain

from jedi.parser import tree
Expand All @@ -79,7 +80,7 @@


class Evaluator(object):
def __init__(self, grammar):
def __init__(self, grammar, sys_path=None):
self.grammar = grammar
self.memoize_cache = {} # for memoize decorators
# To memorize modules -> equals `sys.modules`.
Expand All @@ -88,6 +89,13 @@ def __init__(self, grammar):
self.recursion_detector = recursion.RecursionDetector()
self.execution_recursion_detector = recursion.ExecutionRecursionDetector()
self.analysis = []
if sys_path is None:
sys_path = sys.path
self.sys_path = copy.copy(sys_path)
try:
self.sys_path.remove('')
except ValueError:
pass

def wrap(self, element):
if isinstance(element, tree.Class):
Expand Down
12 changes: 4 additions & 8 deletions jedi/evaluate/compiled/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
from jedi._compatibility import builtins as _builtins, unicode
from jedi import debug
from jedi.cache import underscore_memoization, memoize_method
from jedi.evaluate.sys_path import get_sys_path
from jedi.parser.tree import Param, Base, Operator, zero_position_modifier
from jedi.evaluate.helpers import FakeName
from . import fake
Expand Down Expand Up @@ -309,15 +308,12 @@ def parent(self, value):
pass # Just ignore this, FakeName tries to overwrite the parent attribute.


def dotted_from_fs_path(fs_path, sys_path=None):
def dotted_from_fs_path(fs_path, sys_path):
"""
Changes `/usr/lib/python3.4/email/utils.py` to `email.utils`. I.e.
compares the path with sys.path and then returns the dotted_path. If the
path is not in the sys.path, just returns None.
"""
if sys_path is None:
sys_path = get_sys_path()

if os.path.basename(fs_path).startswith('__init__.'):
# We are calculating the path. __init__ files are not interesting.
fs_path = os.path.dirname(fs_path)
Expand All @@ -341,13 +337,13 @@ def dotted_from_fs_path(fs_path, sys_path=None):
return _path_re.sub('', fs_path[len(path):].lstrip(os.path.sep)).replace(os.path.sep, '.')


def load_module(path=None, name=None):
def load_module(evaluator, path=None, name=None):
sys_path = evaluator.sys_path
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually worried now that this line might cause some issues. It should probably be a sys_path_with_modifications call, but that's not your problem (or at least not a problem of this PR).

I just realized this, but even if it causes issues, there's not a big implication for all normal use cases, so don't care about it too much. It's much more that I should remember this.

if path is not None:
dotted_path = dotted_from_fs_path(path)
dotted_path = dotted_from_fs_path(path, sys_path=sys_path)
else:
dotted_path = name

sys_path = get_sys_path()
if dotted_path is None:
p, _, dotted_path = path.partition(os.path.sep)
sys_path.insert(0, p)
Expand Down
7 changes: 5 additions & 2 deletions jedi/evaluate/imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ def _do_import(self, import_path, sys_path):
module_file.close()

if module_file is None and not module_path.endswith('.py'):
module = compiled.load_module(module_path)
module = compiled.load_module(self._evaluator, module_path)
else:
module = _load_module(self._evaluator, module_path, source, sys_path)

Expand Down Expand Up @@ -440,12 +440,15 @@ def load(source):
with open(path, 'rb') as f:
source = f.read()
else:
return compiled.load_module(path)
return compiled.load_module(evaluator, path)
p = path
p = fast.FastParser(evaluator.grammar, common.source_to_unicode(source), p)
cache.save_parser(path, p)
return p.module

if sys_path is None:
sys_path = evaluator.sys_path

cached = cache.load_parser(path)
module = load(source) if cached is None else cached.module
module = evaluator.wrap(module)
Expand Down
85 changes: 67 additions & 18 deletions jedi/evaluate/sys_path.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import glob
import os
import sys
from subprocess import check_output
from ast import literal_eval
from site import addsitedir

from jedi._compatibility import exec_function, unicode
from jedi.parser import tree
Expand All @@ -11,24 +14,66 @@
from jedi import cache


def get_sys_path():
def check_virtual_env(sys_path):
""" Add virtualenv's site-packages to the `sys.path`."""
venv = os.getenv('VIRTUAL_ENV')
if not venv:
return
venv = os.path.abspath(venv)
p = _get_venv_sitepackages(venv)
if p not in sys_path:
sys_path.insert(0, p)

# Add all egg-links from the virtualenv.
def get_venv_path(venv):
"""Get sys.path for specified virtual environment."""
try:
sys_path = _get_venv_path_online(venv)
except Exception as e:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. Exceptions should not be catched like that. Only catch what you really know can go wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this catch-all expression is that whatever prevents this sys.path detection heuristic from working should not prevent normal jedi functioning and only report the error as a warning (which is done one line below that) as "An indication that something unexpected happened... The software is still working as expected."

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem is: Whenever you do something like this, it's really hard to tell what the issue is from the outside. That's why I never do this.

debug.warning("Error when getting venv path: %s" % e)
sys_path = _get_venv_path_offline(venv)
with common.ignored(ValueError):
sys_path.remove('')
return _get_sys_path_with_egglinks(sys_path)


def _get_sys_path_with_egglinks(sys_path):
"""Find all paths including those referenced by egg-links.

Egg-link-referenced directories are inserted into path immediately after
the directory on which their links were found. Such directories are not
taken into consideration by normal import mechanism, but they are traversed
when doing pkg_resources.require.
"""
result = []
for p in sys_path:
result.append(p)
for egg_link in glob.glob(os.path.join(p, '*.egg-link')):
with open(egg_link) as fd:
sys_path.insert(0, fd.readline().rstrip())
for line in fd:
line = line.strip()
if line:
result.append(os.path.join(p, line))
# pkg_resources package only interprets the first
# non-empty line in egg-link files.
break
return result


def _get_venv_path_offline(venv):
"""Get sys.path for venv without starting up the interpreter."""
venv = os.path.abspath(venv)
sitedir = _get_venv_sitepackages(venv)
sys.path, old_sys_path = [], sys.path
try:
addsitedir(sitedir)
return sys.path
finally:
sys.path = old_sys_path


def _get_venv_path_online(venv):
"""Get sys.path for venv by running its python interpreter."""
venv = os.path.abspath(os.path.expanduser(venv))
for python_binary in ('python', 'python3', 'python.exe',
'python3.exe'):
python_path = os.path.join(venv, 'bin', python_binary)
if os.path.isfile(python_path):
break
else:
raise RuntimeError("Cannot find python executable in venv: %s" % venv)
command = [python_path, '-E', '-c', 'import sys; print(sys.path)']
return literal_eval(check_output(command))

check_virtual_env(sys.path)
return [p for p in sys.path if p != ""]


def _get_venv_sitepackages(venv):
Expand Down Expand Up @@ -109,14 +154,16 @@ def _paths_from_list_modifications(module_path, trailer1, trailer2):
name = trailer1.children[1].value
if name not in ['insert', 'append']:
return []

arg = trailer2.children[1]
if name == 'insert' and len(arg.children) in (3, 4): # Possible trailing comma.
arg = arg.children[2]
return _execute_code(module_path, arg.get_code())


def _check_module(evaluator, module):
"""
Detect sys.path modifications within module.
"""
def get_sys_path_powers(names):
for name in names:
power = name.parent.parent
Expand All @@ -128,10 +175,12 @@ def get_sys_path_powers(names):
if isinstance(n, tree.Name) and n.value == 'path':
yield name, power

sys_path = list(get_sys_path()) # copy
sys_path = list(evaluator.sys_path) # copy
try:
possible_names = module.used_names['path']
except KeyError:
# module.used_names is MergedNamesDict whose getitem never throws
# keyerror, this is superfluous.
pass
else:
for name, power in get_sys_path_powers(possible_names):
Expand All @@ -148,7 +197,7 @@ def sys_path_with_modifications(evaluator, module):
if module.path is None:
# Support for modules without a path is bad, therefore return the
# normal path.
return list(get_sys_path())
return list(evaluator.sys_path)

curdir = os.path.abspath(os.curdir)
with common.ignored(OSError):
Expand Down
5 changes: 3 additions & 2 deletions test/test_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ def test_add_dynamic_mods(self):
src1 = "def r(a): return a"
# Other fictional modules in another place in the fs.
src2 = 'from .. import setup; setup.r(1)'
imports.load_module(os.path.abspath(fname), src2)
result = Script(src1, path='../setup.py').goto_definitions()
script = Script(src1, path='../setup.py')
imports.load_module(script._evaluator, os.path.abspath(fname), src2)
result = script.goto_definitions()
assert len(result) == 1
assert result[0].description == 'class int'

Expand Down