-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace jar distribution strategy with bittorrent #629
base: master
Are you sure you want to change the base?
Conversation
This overlaps some with the HA Nimbus work, but the two can work together. For example, torrent files can list multiple trackers (I.e. nimbus). I'll take a closer look at the ha stuff when I'm not on a phone. ;) |
We should have all topology files (conf etc.) managed by this system. This will make HA Nimbus much easier and remove the need for the storage abstraction and dependencies on external systems like HDFS. We can set a policy that at least N Nimbus's need to have the full topology files before the topology gets launched (but files will eventually be copied to all Nimbus's if more than N). Is it possible to set max download and upload throughputs on each node? We would want to set these limits to prevent supervisors from taking on too much load from this. As for the Bittorrent library, I can just upload that to the same Maven repo used for other Storm jars once this pull request is done. |
Then the way to go is to package everything in one multi-file torrent (topo.jar, topo.ser, topo-conf.ser), and distribute that.. That's a relatively easy change... Do you want me to add that? As for throughput, the way I have it now is nimbus will take the most load, depending on the size of the .jar. If it is massive, then there will be more sharing between supervisor clients. Otherwise, the supervisors may get the parts from nimbus before they discover the other supervisor seeds. I'll also look into what throttling options are available, with the ttorrent lib. |
OK. The throttling is really important, so I won't merge this in until that's in place. We can make the throttle amounts configurable via the storm.yaml. |
I delved into the tttorrent code and found that it does not support throttling dl/ul rates, but I will look into what it would take to add it. Can you elaborate on your concern with the supervisors taking on too much load? In the current approach, only nimbus acts as a seeder. The supervisors act as leechers and stop sharing as soon as they complete the download. The supervisors never seed. For the most part, the supervisors will prefer downloading parts from nimbus (since nimbus has the complete file), and only share amongst themselves while downloading if a supervisor peer could offer a part with better performance than the nimbus peer. So the supervisors are "download heavy" and the nimbus's are "upload heavy". I'll move forward with including all topology files as part of the torrent download. Let me know if you still think lack of throttling is a showstopper, and I'll look into adding/requesting that feature for ttorrent. |
I love the idea of using bit-torrent, but bit-torrent does not support authentication or authorization, except through extensions like http://www.rasterbar.com/products/libtorrent/auth.html that ttorrent does not support. This is a bit of a regression as the current thrift APIs do have the beginnings of auth. Seeing how this code does not yet remove the thrift upload/download APIs would it be possible to have the distribution mechanism configurable until we can work out the auth model? |
This is a case where making it pluggable increases complexity. The Bittorrent based distribution greatly simplifies the construction of HA Nimbus (completely removes reliance on external systems), and it is clearly the optimal approach for distributing static files around the cluster. So I'd like us to figure out how to support our auth needs (or at least have a plan for it) within the context of a Bittorrent based approach. Perhaps this will require modifications to ttorrent, and that's fine. @ptgoetz So the reason for the importance of throttling is it provides a guarantee of load on a supervisor. First of all, I think supervisors should be allowed to seed files and participate in sharing after they've finished downloading. This ensures jar distribution is fast and scalable to large clusters. But we need to make sure that the distribution process can only have a limited effect on active topologies – hence the throttling. Resource usage should always be explicit and not rely on implicit effects of the design. Things change, and my experience has repeatedly shown that if you want a guarantee, you must be explicit about it. |
I understand the desire to not clutter things up too much with plug-ability. I am fine with just having a plan for auth initially. I am also happy to help implement it. I just want to be sure that it is not missed. |
Absolutely, you brought up a great point regarding auth. |
Ok. I will update the code so all the topology files are distributed via BitTorrent. Are you willing to accept BitTorrent throttling (and auth) as separate pull requests? For those I will need to fork the ttorrent project and make the necessary modifications, which may take a while. IMHO, the switch to BitTorrent is a good first step toward HA, and the choking mechanism in the BitTorrent protocol will help alleviate overload on supervisors. When the BitTorrent feature is in place, the planned HA features can move forward. When throttling is available, it would be a simple addition (I could even put the hooks in place now). I can also add a configurable seed duration for supervisors (e.g. Don't seed, seed indefinitely, or seed for X seconds). |
Throttling will have to be part of this pull request, auth can be in a separate pull request. |
I dove into the ttorrent code last night and have a crude, but functional means of throttling throughput. Now I have a question and a caveat. The question is, do you want to throttle at the peer level, or the torrent level? I'm assuming the torrent level so I'm headed in that direction. The caveat is that the throttling is not exact, nor is the logging/reporting of throuhput. So setting max ul/dl thresholds is more like a "hint." For example if I set a threshold of 50 kb/sec., the actual throughput will fluctuate between ~45-55 kb/sec. but average very close to 50. The ttorrent logs will also report inaccurate throughputs. But testing with several bittorrent clients, throughput was very close to the requested threshold. Let me know if that's okay and I'll proceed. Thanks, |
Well we'd want to throttle the overall rate for the supervisor. I think that's what you mean by "peer". If the easiest way to do that is to restrict the supervisor to downloading / sharing one torrent at a time, that could be an acceptable first step. And then we'll open up an issue to fix that. The trick if taking that implementation strategy is choosing which torrent that supervisor should share. We'd also want to add a special case for when the supervisor has no active workers in which there would be no ul/dl limits. I think it's ok for there to be minor fluctuation in the ul/dl rate. |
Mods to ttorrent are here: https://github.com/ptgoetz/ttorrent/tree/rate-limits |
@ptgoetz So that implements peer-level throttling? |
Throttling has been added to ttorrent at the torrent level: https://github.com/turn/ttorrent/pull/49 But a potential bug was introduced in a separate pull request that got merged around the same time mine did: https://github.com/turn/ttorrent/issues/51 So if you want to see it action, I'd suggest pulling from my fork/branch until that gets sorted out. The throttling in ttorrent works like so:
How I see this playing out in storm: (I'm totally open to suggestions, etc.)
Seeding:
Is that an acceptable feature set for this pull request? I'd like to nail down those features and move on to sorting the ramifications it will have on I'm still in a WIP state at this point, but largely functional. I can deploy a topology such that all its files are distributed (and rate limited) via bittorrent, the topology functions, and supervisor will recover from (I should also note that I haven't pushed many of the above changes yet). -Taylor |
OK, sounds good. Let me know when all these changes are in there. |
OK, the changes are in. Like I said, I'm not sure I got everything right in Until this pull request gets merged, you'll probably want to use my branch of ttorrent: https://github.com/ptgoetz/ttorrent/tree/rate-limits |
Fixes are in to ttorrent... So we're good to use the master branch of that dependency now. |
@nathanmarz Have you had any free cycles to take a look at this? (No problem if you haven't, I'm just wondering if it is still under consideration.) |
+1 |
@@ -1151,12 +1157,21 @@ | |||
(FileUtils/copyFile src-file (File. (master-stormjar-path stormroot))) | |||
)) | |||
|
|||
(defmethod mk-bt-tracker :distributed [conf] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we require the use of bit-torrent distribution? It will be better if we could make it optional. For simple topology, one may just prefer not to use BT at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is required. There was some discussion around making the distribution mechanism pluggable, but Nathan argued against it, as making it pluggable would increase complexity (#629 (comment)).
@anfeng Thanks for the input. I pushed a commit to address your concerns. Let me know if you feel strongly about the "Tracker" class names. |
@ptgoetz I could live with NimbusTracker, but don't like SupervisorTracker. How about that we rename it to SupervisorPeer? We may want to rename BaseTracker to BasePeer. |
(extract-dir-from-jar (supervisor-stormjar-path tmproot) RESOURCES-SUBDIR tmproot) | ||
(FileUtils/moveDirectory (File. tmproot) (File. stormroot)) | ||
(FileUtils/forceMkdir (File. (supervisor-stormdist-root conf))) | ||
(Utils/downloadFromMaster conf (master-stormtorrent-path master-code-dir storm-id) (supervisor-stormtorrent-path (supervisor-stormdist-root conf) storm-id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still use Utils/downloadFromMaster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To download the .torrent file.
Nimbus will generate the .torrent file, then supervisors need to download it.
With the introduction of BT, we don't need the following interface of Nimbus. Why are we still keeping them?
We should also remove Utils::downloadFromMaster(). |
@anfeng I will change the classnames per your suggestion. We still need Utils::downloadFromMaster() because that's how supervisors get the .torrent file that allows them to download the content. In the future, we could possibly eliminate it entirely by using magnet links (no .torrent files), but that's not an option yet. |
@anfeng Classes renamed. All unit tests are passing. |
@ptgoetz Why don't we store torrent files in ZK? That will help us to make Nimbus stateless. Torrent files are very small, and don't see ZK will face challenges to deal with torrent files. |
I also tested this merged with the 0.9.0-rc2 code on a 5 node cluster with a 70MB topology jar and topology startup was pretty fast. |
@anfeng I thought about storing the .torrent in ZK, but ultimately decided against it. I was afraid of hitting some edge case where the size of the .torrent file exceeded the 1MB limit. My gut instinct is to keep it that way for now. In the future, however, ttorrent will likely support magnet links. Then the ZK route would definitely be the way to go since all we would have to store in ZK would be a URI. |
@ptgoetz I guess this is due to the limitation of bittorrent client lib. In a more advanced client lib, you could specify chunk size for torrent creation. At Yahoo, we have been using such features to ensure torrent size <= 40KB. |
Out of curiosity what client lib are you using at yahoo? The ttorrent maintainer is great about contributions (I contributed bandwidth throttling to support this pull request). I'm sure he'd be open to implementing configurable piece size -- and most of the groundwork is already there. Nonetheless, I would still prefer the magnet link approach over storing files in ZK. I generally avoid putting files ZK unless there is a definite guarantee that the size limit won't be exceeded. Eventually someone will include blueray rip of the Star Wars collection in their topology jar. ;) Anyway, would you be willing to move forward with this if we add an issue for completely eliminating the file transfer code from the nimbus interface? This pull request greases the wheels for HA nimbus. That work will likely involve a lot of ZK work (leader election, etc.), and might be a good time to address this. I can also look into what it would take to get magnet link support added to ttorrent. Some work has already been done in that are, but I'm not sure where it stands. Thoughts? |
If bittorrent solution makes Nimbus stateless, Nimbus HA will be very simple. For that reason, I think that we should explore the possibility on solutions getting there. Would you investigate the possibility on (1) magnet link, or (2) controling chunk size. At Yahoo, we have a file distribution solution based on libtorrent (http://libtorrent.org/). Please take a look at http://libtorrent.org/reference-Create_Torrents.html#create_torrent. |
I agree that having Nimbus be completely stateless would be ideal, but I do think that it's worth getting this pull request merged in even if Nimbus has to distribute .torrent files. The shift to Bittorrent distribution from the current approach is a much bigger shift than moving to magnet links. It's important that we get this into people's hands so that it can be tested and ironed out. That said, we should target this for 0.9.1 (from Apache release) rather than 0.9.0, since RC's shouldn't add major new features. Perhaps we can get the full Nimbus HA implementation done for that release as well. Also, I'm not sure if you were suggesting this but using libtorrent is a non-starter. Let's keep all dependencies Java-based. We had enough headaches already with 0mq native dependency issues. |
Nathan, we should not use libtorrent. I mentioned it as a reference point on how bt client could be enhanced to reduce size of torrent. Andy Feng Sent from my iPhone
|
Addresses:
https://github.com/nathanmarz/storm/issues/435
This just uses bittorrent for moving the topology jar files around, the serialized conf and topology still go through the thrift API.
When nimbus starts up, it will create a tracker on port 6969. When a topology is deployed, nimbus will create a torrent file, announce it to the tracker, and start seeding (it will seed until the topology is killed). Supervisors will grab the .torrent and use it to download the topology jar. Supervisors will share pieces during the download, but will not continue to seed once the download completes.
I'm open to any change in this behavior and/or class naming, etc.
I didn't update the logback config, but the ttorrent client is pretty chatty at the INFO level. It might be a good idea to turn that off.
The bittorrent client (https://github.com/turn/ttorrent) is not available in a public maven repo, as far as I can tell, but it is simple to build.