Can't import pykafka.rdkafka #280

tdhopper · 2016-07-25T20:59:48Z

I'm trying to use librdkafka with Streamparse. I have librdkafka installed with pykafka on my supervisor machines. I can run the Python binary in my topology virtualenv and successfully call from pykafka import rdkafka. However, when I try to import it from my spout initialization routine, I get:

2016-07-25 20:54:31.142 kafka_spout [INFO] Traceback (most recent call last):
  File "/data/virtualenvs/topo/bin/streamparse_run", line 9, in <module>
    load_entry_point('streamparse==3.0.0.dev3', 'console_scripts', 'streamparse_run')()
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/streamparse/run.py", line 37, in main
    cls(serializer=args.serializer).run()
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pystorm/component.py", line 476, in run
    self.initialize(storm_conf, context)
  File "/var/storm/supervisor/stormdist/topo-88-1469480020/resources/spouts/kafka_spout.py", line 36, in initialize
    from pykafka import rdkafka
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pykafka/rdkafka/__init__.py", line 1, in <module>
    from .producer import RdKafkaProducer
  File "/data/virtualenvs/topo/local/lib/python2.7/site-packages/pykafka/rdkafka/producer.py", line 6, in <module>
    from . import _rd_kafka
ImportError: cannot import name _rd_kafka

Any idea how to resolve this? Maybe something with setting my path.

Pinging @emmett9001 too.

The text was updated successfully, but these errors were encountered:

emmettbutler · 2016-07-26T14:56:41Z

I'm not super familiar with how dependencies get packaged for storm, but I think this might require adding the rdkafka binary and pykafka's c extension binary to the JAR that storm is running. @dan-blanchard will probably be able to verify.

dan-blanchard · 2016-07-26T15:10:09Z

I have librdkafka installed with pykafka on my supervisor machines.

Just to clarify, do you have it installed on your worker nodes?

Maybe something with setting my path.

You probably need LD_LIBRARY_PATH set to include the location where you have librdkafka installed. You can set topology-specific environment variables with topology.environment.

this might require adding the rdkafka binary and pykafka's c extension binary to the JAR that storm is running.

This shouldn't require that, because we don't actually package the virtualenv in the JAR. They just get created on the workers.

tdhopper · 2016-07-26T16:44:43Z

@dan-blanchard: I have it on my workers. If I manually use the virtualenv created by streamparse, I can use rdkafka.

Setting LD_LIBRARY_PATH sounds right. I'm not sure how to set it though. If I do -o 'topology.environment={"LD_LIBRARY_PATH": "/usr/local/lib"}' (and all the similar syntax I can think of), I get storm_thrift.InvalidTopologyException: InvalidTopologyException(msg=u'Field TOPOLOGY_ENVIRONMENT must be a Map'). Can you help?

dan-blanchard · 2016-07-26T19:26:47Z

Hmm... looks like we need to get even fancier with our option parsing. Probably should just support YAML for those.

That said, when I address #276, you won't have to pass them in as command-line options anyway.

tdhopper · 2016-07-26T19:30:34Z

@dan-blanchard: Can you think of any way for me to bypass this in the short term?

dan-blanchard · 2016-07-26T20:36:33Z

Without modifying your copy of streamparse, no. You could probably just add YAML parsing here and things would work as expected. I say YAML and not JSON, because JSON requires quotes around string literals, whereas the quotes are optional for YAML, so that would work for all the options we get passed in with --option.

dan-blanchard · 2016-07-27T14:36:14Z

@tdhopper Actually, you can just set the config option for all of your components separately to bypass this issue.

For example:

class FancyKafkaTopology(Topology):
    some_spout = SomeSpout.spec(config={'LD_LIBRARY_PATH': '/path/to/librdkafka'}})
    some_bolt = SomeBolt.spec(inputs=[some_spout],
                              config={'LD_LIBRARY_PATH': '/path/to/librdkafka'}})

tdhopper · 2016-11-17T20:12:04Z

@dan-blanchard @emmett9001 unfortunately setting LD_LIBRARY_PATH doesn't help. I can run from pykafka import rdkafka from python in the environment i use for my topology, but the topology still barfs when I try that inside a Component. I have no idea what's happening.

tdhopper · 2016-11-17T20:23:58Z

Actually, it looks like I can import it on one of my supervisors but not the other. 🙄 No idea why.

tdhopper · 2016-11-17T21:14:40Z

Part of the issue is that I had tried installing apt-get install librdkafka1 on both machines and that version of librdkafka doesn't have rd_kafka_queue_t in the header file.

Once I removed that, I built rdkafka using these instructions. Once I do that, LD_LIBRARY_PATH=/usr/local/lib python -c "from pykafka import rdkafka" works on both machines. Hoping it'll work in the topo... About to try.

tdhopper · 2016-11-18T16:18:02Z

Running sudo ldconfig also seems to help things.

sumeetsarkar · 2018-12-01T14:46:27Z

I faced this problem recently. I was getting the below error:

raise ImportError("use_rdkafka requires rdkafka to be installed")

My setup when I was facing the problem was:

(The order below is important for this to work, what I believed and ideally it should work, for me it did not)

brew install librdkafka
pip install pykafka

Note: Even using LD_LIBRARY_PATH with correct path for librdkafka lib it did not work.

Solution

# Uninstall pykafka if installed in your env
pip uninstall pykafka

# Clone pykafka in your project
git clone git@github.com:Parsely/pykafka.git && cd pykafka

# Build rdkafka
python setup.py develop

# Test if rdkafka is available
python -c "from pykafka import rdkafka"

If all steps above are followed well, from pykafka import rdkafka works without any errors

dan-blanchard mentioned this issue Jul 27, 2016

Add support for passing any topology options via CLI as YAML #285

Merged

dan-blanchard self-assigned this Jul 27, 2016

dan-blanchard added the in progress label Jul 27, 2016

dan-blanchard closed this as completed in #285 Jul 27, 2016

dan-blanchard removed the in progress label Jul 27, 2016

tdhopper mentioned this issue Nov 18, 2016

Have a way to make build fail if rdkafka dependency isn't built. Parsely/pykafka#617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't import pykafka.rdkafka #280

Can't import pykafka.rdkafka #280

tdhopper commented Jul 25, 2016 •

edited

Loading

emmettbutler commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016

tdhopper commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016 •

edited

Loading

tdhopper commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016

dan-blanchard commented Jul 27, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 18, 2016

sumeetsarkar commented Dec 1, 2018 •

edited

Loading

Can't import pykafka.rdkafka #280

Can't import pykafka.rdkafka #280

Comments

tdhopper commented Jul 25, 2016 • edited Loading

emmettbutler commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016

tdhopper commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016 • edited Loading

tdhopper commented Jul 26, 2016

dan-blanchard commented Jul 26, 2016

dan-blanchard commented Jul 27, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 17, 2016

tdhopper commented Nov 18, 2016

sumeetsarkar commented Dec 1, 2018 • edited Loading

tdhopper commented Jul 25, 2016 •

edited

Loading

dan-blanchard commented Jul 26, 2016 •

edited

Loading

sumeetsarkar commented Dec 1, 2018 •

edited

Loading