Skip to content

Conversation

@liurenjie1024
Copy link
Contributor

This patch aims to fix bug STORM-1642. When we failed to deserialize a tuple, we should skip this tuple and print a log rather than just shutdown the work.

@liurenjie1024 liurenjie1024 changed the title STORM-1642 [STORM-1642] Catch Exception when deserialization failed. Apr 7, 2016
@abhishekagarwal87
Copy link
Contributor

How was deserialization failure resulting in NPE? Other than that, deserialization failure is the problem with an application itself which application should correct by registering correct serializer/deserializer. storm framework should not skip such tuples on its own.

@liurenjie1024
Copy link
Contributor Author

As I have mentioned in bug report, the reason why NPE happens is that storm is vulnerable to fraud message from processes outside the cluster. To reproduce the NPE, you just need to send a message [taskid 0] to [host]:[port] from anywhere, where taskid is id of one of the tasks running on [host]:[port]. In this case, storm will return a TaskMessage with payload set to null to the deserializer.
Storm does not check whether the task message is from processes within the cluster, so deserialization may fail. I think storm should skip the fruad task message rather than shutdowning the worker when deserialization failed.

@abhishekagarwal87
Copy link
Contributor

I can't think of a reason of why would such a message be sent to storm from outside the cluster. Ideally only cluster machines and daemons should have access to the worker ports. Or if it indeeds needs to be solved, then a better method would be to ignore the zero length payload and not add new TaskMessage(task, null). Before you make any change, let's wait for comments from someone who is more familiar with this part of code.

@liurenjie1024
Copy link
Contributor Author

The reason why such a message is sent is that our security team is scanning all the ports listened on each server and tries to detect potential weakness.
At first thought I just wanted to ignore messages with zero payload, but I think ignoring messages failed to be deserialized would be more robust. I think this also matches storm's design since it throws away messages whose target task id is not in the worker and this patch is just a complement.

;; null task ids are broadcast tuples
(fast-list-iter [task-id task-ids]
(tuple-action-fn task-id tuple)
))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of parenthetical and formatting problems here.

@knusbaum
Copy link
Contributor

knusbaum commented Apr 8, 2016

Why not no both? I don't see any reason to propagate TaskMessages with null payloads, but I agree we should be dropping tuples that failed to deserialize in the iterator.

This is very much on the critical path, so someone else might want to weigh in.

@liurenjie1024
Copy link
Contributor Author

NPE is just a special case of deserialization failure, right?

@knusbaum
Copy link
Contributor

knusbaum commented Apr 9, 2016

What do you mean?

If you mean null payloads are a special case of failure, I disagree. In fact, it's not a failure at all. We received a null message, so we can perform a NOP as soon as possible.

@liurenjie1024
Copy link
Contributor Author

Yes, I agree with you that we should drop a task message with null payload as soon as possible, and I'll push another commit to drop task messages with null payload.


bout.writeShort((short)task_id);
if (payload_len == 0) {
LOG.warn("Zero length payload to task {}.", (short)task_id);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a debugging log? should remove them?

@knusbaum
Copy link
Contributor

knusbaum commented Oct 4, 2016

@liurenjie1024
If you want to clean up the log messages, this still looks good. We'd want this instead to go into master and 1.x, though.

HeartSaVioR and others added 18 commits January 16, 2017 23:35
* Expose OpenTsdbClient.Builder's constructor to public
** This allows Flux to initialize Builder instance which is needed for OpenTSDBBolt
* Add constructor for having single mapper instance to OpenTsdbBolt
** Flux doesn't support array of reference for now
* Also fix a bug on OpenTsdbBolt: it should refer the mapper interface for flexibility, not implemented one
* use bounded wildcard type to fix invariant issue
* introduce reflist
* introduce BeanListReference which stores list of id of references
* handle BeanListReference properly
* modify unit test for testing reflist
…ith predictable ordering

KafkaSpoutStreamsNamedTopics.getOutputFields() uses HashSet causes output fields with predictable ordering. So replaced with LinkedHashSet
… when a topology is killed

* increase supervisor.worker.shutdown.sleep.secs to let workers kill themselves first even stuck
* disable shutdown hook for log4j2 to make sure logs are written after shutdown is started
alexlehm and others added 26 commits April 4, 2017 22:05
…ly by default

the dir from the default config is log4j2 and that should be relative to STORM_HOME, however this is check was missing

Signed-off-by: alexlehm <alexlehm@gmail.com>
… newline before the code quotes)

Signed-off-by: alexlehm <alexlehm@gmail.com>
…e when ZK nodes have already been created/deleted
We had the case that this broke encoding for us because it used the default
system locale. Given the fact that this code uses an explicit encoding for the
other case and that it evolved from something that always used an explicit
encoding I believe this is more correct.
Code blocks should always follow an empty line; otherwise, jekyll will
fail to properly format the code block.

Also, github's fenced code blocks (with triple backticks) in the middle
of item lists cause incorrect list numbering.
bipinprasad pushed a commit to bipinprasad/storm that referenced this pull request Oct 17, 2019
[YSTORM-6088] Update mvn arguments to speed up builds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.