Skip to content
This repository has been archived by the owner on Mar 24, 2021. It is now read-only.

use pykafka in pyspark error: AssertionError: is not a subpath of * #836

Closed
whwby opened this issue Jul 17, 2018 · 2 comments
Closed

use pykafka in pyspark error: AssertionError: is not a subpath of * #836

whwby opened this issue Jul 17, 2018 · 2 comments

Comments

@whwby
Copy link

whwby commented Jul 17, 2018

version:

[root@node-61 ~]# pip list|grep py
pykafka                      2.7.0  
pyspark                      2.3.1

[root@node-61 ~]# pyspark
Python 2.7.5 (default, Nov 20 2015, 02:00:19) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
2018-07-17 10:24:42 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.1
      /_/

Using Python version 2.7.5 (default, Nov 20 2015 02:00:19)
SparkSession available as 'spark'.
>>> from pykafka import KafkaClient
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pykafka/__init__.py", line 1, in <module>
    from .broker import Broker
  File "/usr/lib/python2.7/site-packages/pykafka/broker.py", line 23, in <module>
    from .connection import BrokerConnection
  File "/usr/lib/python2.7/site-packages/pykafka/connection.py", line 27, in <module>
    from .utils.socket import recvall_into
  File "/usr/lib/python2.7/site-packages/pykafka/utils/__init__.py", line 16, in <module>
    from pkg_resources import parse_version
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2998, in <module>
    _declare_state('object', working_set = WorkingSet())
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 476, in __init__
    self.add_entry(entry)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 491, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1964, in find_in_zip
    if metadata.has_metadata('PKG-INFO'):
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1415, in has_metadata
    return self.egg_info and self._has(self._fn(self.egg_info,name))
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1737, in _has
    zip_path = self._zipinfo_name(fspath)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1611, in _zipinfo_name
    "%s is not a subpath of %s" % (fspath,self.zip_pre)
AssertionError: /usr/lib/python2.7/site-packages/pyspark-2.3.1-py2.7.egg/EGG-INFO/PKG-INFO is not a subpath of /usr/lib/python2.7/site-packages/pyspark-2.3.1-py2.7.egg/pyspark/python/lib/py4j-0.10.7-src.zip/
@emmettbutler
Copy link
Contributor

Thanks for the report. I'm guessing this is an issue with pkg_resources, not pykafka itself. What happens when you open a pyspark shell and run from pkg_resources import parse_version?

@whwby
Copy link
Author

whwby commented Jul 18, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants