Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't import pykafka.rdkafka #280

Closed
tdhopper opened this issue Jul 25, 2016 · 12 comments
Closed

Can't import pykafka.rdkafka #280

tdhopper opened this issue Jul 25, 2016 · 12 comments
Assignees

Comments

@tdhopper
Copy link
Contributor

tdhopper commented Jul 25, 2016

I'm trying to use librdkafka with Streamparse. I have librdkafka installed with pykafka on my supervisor machines. I can run the Python binary in my topology virtualenv and successfully call from pykafka import rdkafka. However, when I try to import it from my spout initialization routine, I get:

2016-07-25 20:54:31.142 kafka_spout [INFO] Traceback (most recent call last):
  File "/data/virtualenvs/topo/bin/streamparse_run", line 9, in <module>
    load_entry_point('streamparse==3.0.0.dev3', 'console_scripts', 'streamparse_run')()
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/streamparse/run.py", line 37, in main
    cls(serializer=args.serializer).run()
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pystorm/component.py", line 476, in run
    self.initialize(storm_conf, context)
  File "/var/storm/supervisor/stormdist/topo-88-1469480020/resources/spouts/kafka_spout.py", line 36, in initialize
    from pykafka import rdkafka
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pykafka/rdkafka/__init__.py", line 1, in <module>
    from .producer import RdKafkaProducer
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pykafka/rdkafka/producer.py", line 6, in <module>
    from . import _rd_kafka
ImportError: cannot import name _rd_kafka

Any idea how to resolve this? Maybe something with setting my path.

Pinging @emmett9001 too.

@emmettbutler
Copy link
Contributor

I'm not super familiar with how dependencies get packaged for storm, but I think this might require adding the rdkafka binary and pykafka's c extension binary to the JAR that storm is running. @dan-blanchard will probably be able to verify.

@dan-blanchard
Copy link
Member

I have librdkafka installed with pykafka on my supervisor machines.

Just to clarify, do you have it installed on your worker nodes?

Maybe something with setting my path.

You probably need LD_LIBRARY_PATH set to include the location where you have librdkafka installed. You can set topology-specific environment variables with topology.environment.

this might require adding the rdkafka binary and pykafka's c extension binary to the JAR that storm is running.

This shouldn't require that, because we don't actually package the virtualenv in the JAR. They just get created on the workers.

@tdhopper
Copy link
Contributor Author

@dan-blanchard: I have it on my workers. If I manually use the virtualenv created by streamparse, I can use rdkafka.

Setting LD_LIBRARY_PATH sounds right. I'm not sure how to set it though. If I do -o 'topology.environment={"LD_LIBRARY_PATH": "/usr/local/lib"}' (and all the similar syntax I can think of), I get storm_thrift.InvalidTopologyException: InvalidTopologyException(msg=u'Field TOPOLOGY_ENVIRONMENT must be a Map'). Can you help?

@dan-blanchard
Copy link
Member

dan-blanchard commented Jul 26, 2016

Hmm... looks like we need to get even fancier with our option parsing. Probably should just support YAML for those.

That said, when I address #276, you won't have to pass them in as command-line options anyway.

@tdhopper
Copy link
Contributor Author

@dan-blanchard: Can you think of any way for me to bypass this in the short term?

@dan-blanchard
Copy link
Member

Without modifying your copy of streamparse, no. You could probably just add YAML parsing here and things would work as expected. I say YAML and not JSON, because JSON requires quotes around string literals, whereas the quotes are optional for YAML, so that would work for all the options we get passed in with --option.

@dan-blanchard
Copy link
Member

@tdhopper Actually, you can just set the config option for all of your components separately to bypass this issue.

For example:

class FancyKafkaTopology(Topology):
    some_spout = SomeSpout.spec(config={'LD_LIBRARY_PATH': '/path/to/librdkafka'}})
    some_bolt = SomeBolt.spec(inputs=[some_spout],
                              config={'LD_LIBRARY_PATH': '/path/to/librdkafka'}})

@tdhopper
Copy link
Contributor Author

@dan-blanchard @emmett9001 unfortunately setting LD_LIBRARY_PATH doesn't help. I can run from pykafka import rdkafka from python in the environment i use for my topology, but the topology still barfs when I try that inside a Component. I have no idea what's happening.

@tdhopper
Copy link
Contributor Author

Actually, it looks like I can import it on one of my supervisors but not the other. 🙄 No idea why.

@tdhopper
Copy link
Contributor Author

Part of the issue is that I had tried installing apt-get install librdkafka1 on both machines and that version of librdkafka doesn't have rd_kafka_queue_t in the header file.

Once I removed that, I built rdkafka using these instructions. Once I do that, LD_LIBRARY_PATH=/usr/local/lib python -c "from pykafka import rdkafka" works on both machines. Hoping it'll work in the topo... About to try.

@tdhopper
Copy link
Contributor Author

Running sudo ldconfig also seems to help things.

@sumeetsarkar
Copy link

sumeetsarkar commented Dec 1, 2018

I faced this problem recently. I was getting the below error:

raise ImportError("use_rdkafka requires rdkafka to be installed")

My setup when I was facing the problem was:

(The order below is important for this to work, what I believed and ideally it should work, for me it did not)

brew install librdkafka
pip install pykafka

Note: Even using LD_LIBRARY_PATH with correct path for librdkafka lib it did not work.

Solution

# Uninstall pykafka if installed in your env
pip uninstall pykafka

# Clone pykafka in your project
git clone git@github.com:Parsely/pykafka.git && cd pykafka

# Build rdkafka
python setup.py develop

# Test if rdkafka is available
python -c "from pykafka import rdkafka"

If all steps above are followed well, from pykafka import rdkafka works without any errors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants