-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-2201 Improve FlumeInputDStream's stability and make it scalable #1310
Conversation
Can one of the admins verify this patch? |
Hey @joyyoj PS: Apologies for the not having commented on this earlier. Fell through the cracks I guess. |
@tdas, Thanks for noticing the PR. I’m pleased to share my design idea. I'll update it this weekend. |
@harishreedharan Can you take a look? This looks really interesting for Flume. |
Hmm, I don't see any code. Shows +0, -0 lines. Something went wrong in the last merge? |
@joyyoj Something went wrong in your last merge. Its an empty patch now! |
Sorry, I'll soon send a PR.
|
@joyyoj Thanks for the explanation. This makes quite a lot of sense. I recently added a new Dstream + an associated Flume sink to fix the issue of receivers being hard-coded on the Flume config. Basically solves the same issue, by telling the Spark receiver where the Flume agents are running. So even if the executors die, they can come back and simply poll the same Flume agents for data. In my experience, the hosts on which the agents are running rarely change - so this solution works nicely. PR #807 - let me know what you think. |
@harishreedharan The time I am confronted with this problem, PR #807 is not merged to trunk. I think PR #807 is another solution to solve the same problem and quiet good. |
To PR #807, if some flume agent crashed and restarted from another host, spark should be restarted to reload conf ? |
@joyyoj I will take a look at it in the next couple days. As far as #807 is concerned - yes, if the flume agent's location changes, the config needs to change. In my experience (I work for a company that has a large number of Flume customers), Flume agents are usually deployed on specific nodes and if they crash - they are restarted on the same node - since Flume has no concept of workers (every agent is a worker), so that was not a concern in my design. The ZK-based config seems interesting. I will take a look at it soon. Thanks! |
Can one of the admins verify this patch? |
I still don't see any code. Did a merge fail somewhere? |
Hi @joyyoj, Since this pull request doesn't show any code / changes, do you mind closing it? Feel free to update / re-open if you have code that you'd like us to review. Thanks! |
Can one of the admins verify this patch? |
Let's close this issue |
No description provided.