Skip to content

Releases: pystorm/streamparse

streamparse 3.4.0

26 Jan 21:38
1de07de
Compare
Choose a tag to compare

This release fixes a few bugs and adds a few new features that require pystorm 3.1.0 or greater.

Features

  • Added a ReliableSpout implementation that can be used to have spouts that will automatically replay failed tuples up to a specified number of times before giving up on them. (pystorm/pystorm#39)
  • Added Spout.activate and Spout.deactivate methods that will be called in Storm 1.1.0 and above when a spout is activated or deactivated. This is handy if you want to close database connections on deactivation and reconnect on activation. (Issue #351, PR pystorm/pystorm#42)
  • Can now override config.json Nimbus host and port with the STREAMPARSE_NIMBUS environment variable (PR #347)
  • Original topology name will now be sent to Storm as topology.original_name even when you're using sparse --override_name. (PR #354)

Fixes

  • Fixed an issue where batching bolts would fail all batches they had received when they encountered an exception, even when exit_on_exception was False. Now they will only fail the current batch when exit_on_exception is False; if it is True, all batches are still failed. (PR pystorm/pystorm#43)
  • No longer call lein jar twice when creating jars. (PR #348)
  • We now use yaml.safe_load instead of yaml.load when parsing command line options. (commit 6e8c4d8)

streamparse 3.3.0

23 Nov 20:59
e51390c
Compare
Choose a tag to compare

This release fixes a few bugs and adds the ability to pre-build JARs for submission to Storm/Nimbus..

Features

  • Added --local_jar_path and --remote_jar_path options to submit to allow the re-use of pre-built JARs. This should make deploying topologies that are all within the same Python project much faster. (Issue #332)
  • Added help subcommand, since it's not immediately obvious to users that sparse -h submit and sparse submit -h will return different help messages. (Issue #334)
  • We now provide a universal wheel on PyPI (commit f600c98)
  • sparse kill can now kill any topology and not just those that have a definition in your topologies folder. (commit 66b3a70)

Fixes

  • Fixed Python 3 compatibility issue in sparse stats (Issue #333) an issue where name was being used instead of override_name when calling pre- and post-submit hooks. (10e8ce3)
  • sparse will no longer hang without any indication of why when you run it as root. (Issue #324)
  • RedisWordCount example topology works again (PR #336)
  • Fix an issue where updating virtualenvs could be slow because certain versions of fabric would choke on the pip output (commit 9b1978f)

streamparse 3.2.0

03 Nov 19:14
c942cb3
Compare
Choose a tag to compare

This release adds tools to simplify some common deployment scenarios where you need to deploy the same topology to different environments.

Features

  • The par parameter for the Component.spec() method used to set options for components within your topology can now take dictionaries in addition to integers. The keys must be names of environments in your config.json, and the values are integers as before. This allows you to specify different parallelism hints for components depending on the environment they are deployed to. This is very helpful when one of your environments has more processing power than the other. (PR #326)
  • Added --requirements options to sparse update_virtualenv and sparse submit commands so that you can customize the requirements files that are used for your virtualenv, instead of relying on files in your virtualenv_specs directory. (PR #328)
  • pip is now automatically upgraded to 9.x on the worker nodes and is now run with the flags --upgrade --upgrade-strategy only-if-needed to ensure that requirements specified as ranges are upgraded to the same version on all machines, without needlessly upgrading all recursive dependencies. (PR #327)

Fixes

  • Fixed an issue where name was being used instead of override_name when calling pre- and post-submit hooks. (10e8ce3)
  • Docs have been updated to fix some RST rendering issues (issue #321)
  • Updated quickstart to clarify which version of Storm is required (PR #315)
  • Added information about flux-core dependency to help string for sparse run (PR #316)

streamparse 3.1.1

02 Sep 13:01
b965843
Compare
Choose a tag to compare

This bugfix release fixes an issue where not having graphviz installed in your virtualenv would cause every command to crash, not just sparse visualize (#311)

streamparse 3.1.0

01 Sep 20:43
30c6d70
Compare
Choose a tag to compare

Implemented enhancements:

  • Added sparse visualize command that will use graphviz to generate a visualization of your topology (PR #308)
  • Can now set ssh port in config.json (Issue #229, PR #309)
  • Use latest Storm for quickstart (PR #306)
  • Re-enable support for bolts without outputs in sparse run (PR #292)

Fixed bugs:

  • sparse run error if more than one environment in config.json (Issue #304, PR #305)
  • Switch from invoke to fabric for kafka-jvm to fix TypeError (Issue #301, PR #310)
  • Rely on pystorm 3.0.3 to fix nested exception issue
  • Updated bootstrap filter so that generated project.clj will work fine with both sparse run and sparse submit

streamparse 3.0.1

29 Jul 17:41
6192b66
Compare
Choose a tag to compare

Fixes an issue where sparse submit would crash if log.path was not set in config.json (Issue #293)

streamparse 3.0.0

27 Jul 20:46
302aa05
Compare
Choose a tag to compare

This is the final release of streamparse 3.0.0. The developer preview versions of this release have been used extensively by many people for months, so we are quite confident in this release, but please let us know if you encounter any issues.

You can install this release via pip with pip install streamparse==3.0.0.

Highlights

  • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
  • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.
  • The --ackers and --workers settings now default to the number of worker nodes in your Storm environment instead of 2.
  • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
  • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
  • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)

⚠️ API Breaking Changes ⚠️

  • Topologies are now specified via a Python Topology DSL instead of the Clojure Topology DSL. This means you can/must now write your topologies in Python! Components can still be written in any language supported by Storm, of course. (Issues #84 and #136, PR #199, #226)
  • The deprecated Spout.emit_many method has been removed. (pystorm/pystorm@004dc27)
  • As a consequence of using the new Python Topology DSL, all Bolts and Spouts that emit anything are expected to have the outputs attribute declared. It must either be a list of str or Stream objects, as described in the docs.
  • We temporarily removed the sparse run command, as we've removed all of our Clojure code, and this was the only thing that had to still be done in Clojure. (Watch issue #213 for future developments)
  • ssh_tunnel has moved from streamparse.contextmanagers to streamparse.util. The streamparse.contextmanagers module has been removed.
  • The ssh_tunnel context manager now returns the hostname and port that should be used for connecting nimbus (e.g., ('localhost', 1234) when use_ssh_for_nimbus is True or unspecified, and ('nimbus.foo.com', 6627) when use_ssh_for_nimbus is False).
  • need_task_ids defaults to False instead of True in all emit() method calls. If you were previously storing the task IDs that your tuples were emitted to (which is pretty rare), then you must pass need_task_ids=True in your emit() calls. This should provide a little speed boost to most users, because we do not need to wait on a return message from Storm for every emitted tuple.
  • Instead of having the log.level setting in your config.json influence the root logger's level, only your component (and its StormHandler if you haven't set log.path)'s levels will be set.
  • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.
  • The --par option to sparse submit has been remove. Please use --ackers and --workers instead.
  • The --ackers and --workers settings now default to the number of worker nodes in your Storm environment instead of 2.

Features

  • Added sparse slot_usage command that can show you how balanced your topologies are across nodes. This is something that isn't currently possible with the Storm UI on its own. (PR #218)
  • Can now specify ssh_password in config.json if you don't have SSH keys setup. Storing your password in plaintext is not recommended, but nice to have for local VMs. (PR #224, thanks @motazreda)
  • Now fully Python 3 compatible (and tested on up to 3.5), because we rely on fabric3 instead of plain old fabric now. (4acfa2f)
  • Now remove _resources directory after JAR has been created.
  • Added serializer setting to config.json that can be used to switch between JSON and msgpack pack serializers (PR #238). Note that you cannot use the msgpack serializer unless you also include a Java implementation in your topology's JAR such as the one provided by Pyleus, or the one being added to Storm in apache/storm#1136. (PR #238)
  • Added support for custom log filenames (PR #234 — thanks @ kalmanolah)
  • Can now set environment-specific options, acker_count, and worker_count settings, to avoid constantly passing all those pesky options to sparse submit. (PR #265)
  • Added option to disable installation of virtualenv was stilling allowing their use, install_virtualenv. (PR #264).
  • The Python Topology DSL now allows topology-level config options to be set via the config attribute of the Topology class. (Issue #276, PRs #284 and #289)
  • Can now pass any valid YAML as a value for sparse submit --option (Issue #280, PR #285)
  • Added --override_name option to kill, submit, and update_virtualenv commands so that you can deploy the same topology file multiple times with different overridden names. (Issue #207, PR #286)

Fixes

  • sparse slot_usage, sparse stats, and sparse worker_uptime are much faster as we've fixed an issue where they were creating many SSH subprocesses.
  • All commands that must connect to the Nimbus server now properly use SSH tunnels again.
  • The output from running pip install is now displayed when submitting your topology, so you can see if things get stuck.
  • sparse submit should no longer sporadically raise exceptions about failing to create SSH tunnels (PR #242).
  • sparse submit will no longer crash when your provide a value for --ackers (PR #241).
  • pin pystorm version to >=2.0.1 (PR #230)
  • sparse tail now looks for pystorm named filenames (@9339908)
  • Fixed typo that caused crash in sparse worker_uptime (@7085804)
  • Added back sparse run (PR #244)
  • sparse run should no longer crash when searching for the version number on some versions of Storm. (Issue #254, PR #255)
  • sparse run will no longer crash due to PyYAML dumping out !!python/unicode garbage into the YAML files. (Issue #256, PR #257)
  • A sparse run TypeError with Python 3 has been fixed. (@e232224)
  • sparse update_virtualenv will no longer ignore the virtualenv_flags setting in config.json. (Issue #281, PR #282)
  • sparse run now supports named streams on Storm 1.0.1+ (PR #260)
  • No longer remove non-topology-specific logs with sparse remove_logs (@45bd005)
  • sparse tail will now find logs in subdirectories for Storm 1.0+ compatibility (Issue #268, PR #271)

Other Changes

  • Now rely on pystorm package for handling Multi-Lang IPC between Storm and Python. This library is essentially the same as our old storm subpackage with a few enhancements (e.g., the ability to use MessagePack instead of JSON to serialize messages). (Issue #174, Commits aaeb3e9 and 1347ded)
  • All Bolt, Spout, and Topology-related classes are all available directly at the streamparse package level (i.e., you can just do from streamparse import Bolt now) (Commit b9bf4ae).
  • sparse kill now will kill inactive topologies. (Issue #156)
  • All examples now use the Python DSL
  • The Kafka-JVM example has been cleaned up a bit, so now you can click on Storm UI log links and they'll work.
  • Docs have been updated to reflect latest Leiningen installation instructions. (PR #261)
  • A broken link in our docs was fixed. (PR #273)
  • JARs are now uploaded before killing the running topology to reduce downtime during deployments (PR #277)
  • Switched from PyYAML to ruamel.yaml (@18fd2e9)
  • Added docs for handling multiple streams and groupings (Issue #252, @344ce8c)
  • Added VPC deployment docs (Issue #134, @d2bd1ac)

streamparse 3.0.0.dev3

21 Apr 02:23
4780718
Compare
Choose a tag to compare
Pre-release

This is the fourth developer preview release of streamparse 3.0. In addition to having been extensively tested in production, this version also is the first in the 3.0 line that has sparse run back in it. However it is only supports on Storm 0.10.0+ and requires you to add [org.apache.storm/flux-core "0.10.0"] to your dependencies in your project.clj, because it uses Storm's new Flux library to start the local cluster.

You can install this release via pip with pip install --pre streamparse==3.0.0.dev3. It will not automatically install because it's a pre-release.

⚠️ API Breaking Changes ⚠️

In addition to those outlined in the 3.0.0dev0 and 3.0.0dev1 release notes, this release introduces the following backwards incompatible changes from pinning our pystorm version to 3.0+:

  • need_task_ids defaults to False instead of True in all emit() method calls. If you were previously storing the task IDs that your tuples were emitted to (which is pretty rare), then you must pass need_task_ids=True in your emit() calls. This should provide a little speed boost to most users, because we do not need to wait on a return message from Storm for every emitted tuple.
  • Instead of having the log.level setting in your config.json influence the root logger's level, only your component (and its StormHandler if you haven't set log.path)'s levels will be set.
  • When log.path is not set in your config.json, pystorm will no longer issue warning about how you should set it; instead, it will automatically set up a StormHandler and log everything directly to your Storm logs. This is really handy as in Storm 1.0 there's support through the UI for searching logs.

Features

  • Added back sparse run (PR #244)

streamparse 3.0.0.dev2

13 Apr 15:55
Compare
Choose a tag to compare
Pre-release

This is the third developer preview release of streamparse 3.0. Unlike when we released the previous two, this one has been tested extensively in production, so users should feel more confident using it. It's still missing sparse run, which will try to fix before the final release.

You can install this release via pip with pip install --pre streamparse==3.0.0.dev2. It will not automatically install because it's a pre-release.

⚠️ API Breaking Changes ⚠️

These are outlined in the 3.0.0dev0 and 3.0.0dev1 release notes.

Features

  • Added serializer setting to config.json that can be used to switch between JSON and msgpack pack serializers (PR #238). Note that you cannot use the msgpack serializer unless you also include a Java implementation in your topology's JAR such as the one provided by Pyleus, or the one being added to Storm in apache/storm#1136. (PR #238)
  • Added support for custom log filenames (PR #234 — thanks @ kalmanolah)

Fixes

  • sparse submit should no longer sporadically raise exceptions about failing to create SSH tunnels (PR #242).
  • sparse submit will no longer crash when your provide a value for --ackers (PR #241).
  • pin pystorm version to >=2.0.1 (PR #230)
  • sparse tail now looks for pystorm named filenames (@9339908)
  • Fixed typo that caused crash in sparse worker_uptime (@7085804)

streamparse 3.0.0.dev1

17 Mar 18:40
Compare
Choose a tag to compare
Pre-release

This is the second developer preview release of streamparse 3.0. It has not been tested extensively in production yet, so we are looking for as much feedback as we can get from users who are willing to test it out.

You can install this release via pip with pip install --pre streamparse==3.0.0.dev1. It will not automatically install because it's a pre-release.

⚠️ API Breaking Changes ⚠️

In additions to those outlined in the 3.0.0dev0 release notes, we've made a few more changes.

  • ssh_tunnel has moved from streamparse.contextmanagers to streamparse.util. The streamparse.contextmanagers module has been removed.
  • The ssh_tunnel context manager now returns the hostname and port that should be used for connecting nimbus (e.g., ('localhost', 1234) when use_ssh_for_nimbus is True or unspecified, and ('nimbus.foo.com', 6627) when use_ssh_for_nimbus is False).

Fixes

  • sparse slot_usage, sparse stats, and sparse worker_uptime are much faster as we've fixed an issue where they were creating many SSH subprocesses.
  • All commands that must connect to the Nimbus server now properly use SSH tunnels again.
  • The output from running pip install is now displayed when submitting your topology, so you can see if things get stuck.