Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace jar distribution strategy with bittorrent #629

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Replace jar distribution strategy with bittorrent #629

wants to merge 17 commits into from

Conversation

ptgoetz
Copy link
Collaborator

@ptgoetz ptgoetz commented Jul 17, 2013

Addresses:
https://github.com/nathanmarz/storm/issues/435

This just uses bittorrent for moving the topology jar files around, the serialized conf and topology still go through the thrift API.

When nimbus starts up, it will create a tracker on port 6969. When a topology is deployed, nimbus will create a torrent file, announce it to the tracker, and start seeding (it will seed until the topology is killed). Supervisors will grab the .torrent and use it to download the topology jar. Supervisors will share pieces during the download, but will not continue to seed once the download completes.

I'm open to any change in this behavior and/or class naming, etc.

I didn't update the logback config, but the ttorrent client is pretty chatty at the INFO level. It might be a good idea to turn that off.

The bittorrent client (https://github.com/turn/ttorrent) is not available in a public maven repo, as far as I can tell, but it is simple to build.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 17, 2013

This overlaps some with the HA Nimbus work, but the two can work together.

For example, torrent files can list multiple trackers (I.e. nimbus). I'll take a closer look at the ha stuff when I'm not on a phone. ;)

@nathanmarz
Copy link
Owner

We should have all topology files (conf etc.) managed by this system. This will make HA Nimbus much easier and remove the need for the storage abstraction and dependencies on external systems like HDFS. We can set a policy that at least N Nimbus's need to have the full topology files before the topology gets launched (but files will eventually be copied to all Nimbus's if more than N).

Is it possible to set max download and upload throughputs on each node? We would want to set these limits to prevent supervisors from taking on too much load from this.

As for the Bittorrent library, I can just upload that to the same Maven repo used for other Storm jars once this pull request is done.

@ptgoetz ptgoetz closed this Jul 18, 2013
@ptgoetz ptgoetz reopened this Jul 18, 2013
@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 18, 2013

Then the way to go is to package everything in one multi-file torrent (topo.jar, topo.ser, topo-conf.ser), and distribute that.. That's a relatively easy change... Do you want me to add that?

As for throughput, the way I have it now is nimbus will take the most load, depending on the size of the .jar. If it is massive, then there will be more sharing between supervisor clients. Otherwise, the supervisors may get the parts from nimbus before they discover the other supervisor seeds.

I'll also look into what throttling options are available, with the ttorrent lib.

@nathanmarz
Copy link
Owner

OK. The throttling is really important, so I won't merge this in until that's in place. We can make the throttle amounts configurable via the storm.yaml.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 18, 2013

I delved into the tttorrent code and found that it does not support throttling dl/ul rates, but I will look into what it would take to add it.

Can you elaborate on your concern with the supervisors taking on too much load? In the current approach, only nimbus acts as a seeder. The supervisors act as leechers and stop sharing as soon as they complete the download. The supervisors never seed. For the most part, the supervisors will prefer downloading parts from nimbus (since nimbus has the complete file), and only share amongst themselves while downloading if a supervisor peer could offer a part with better performance than the nimbus peer.

So the supervisors are "download heavy" and the nimbus's are "upload heavy".

I'll move forward with including all topology files as part of the torrent download. Let me know if you still think lack of throttling is a showstopper, and I'll look into adding/requesting that feature for ttorrent.

@revans2
Copy link
Contributor

revans2 commented Jul 18, 2013

I love the idea of using bit-torrent, but bit-torrent does not support authentication or authorization, except through extensions like http://www.rasterbar.com/products/libtorrent/auth.html that ttorrent does not support. This is a bit of a regression as the current thrift APIs do have the beginnings of auth. Seeing how this code does not yet remove the thrift upload/download APIs would it be possible to have the distribution mechanism configurable until we can work out the auth model?

@nathanmarz
Copy link
Owner

This is a case where making it pluggable increases complexity. The Bittorrent based distribution greatly simplifies the construction of HA Nimbus (completely removes reliance on external systems), and it is clearly the optimal approach for distributing static files around the cluster. So I'd like us to figure out how to support our auth needs (or at least have a plan for it) within the context of a Bittorrent based approach. Perhaps this will require modifications to ttorrent, and that's fine.

@ptgoetz So the reason for the importance of throttling is it provides a guarantee of load on a supervisor. First of all, I think supervisors should be allowed to seed files and participate in sharing after they've finished downloading. This ensures jar distribution is fast and scalable to large clusters. But we need to make sure that the distribution process can only have a limited effect on active topologies – hence the throttling. Resource usage should always be explicit and not rely on implicit effects of the design. Things change, and my experience has repeatedly shown that if you want a guarantee, you must be explicit about it.

@revans2
Copy link
Contributor

revans2 commented Jul 18, 2013

I understand the desire to not clutter things up too much with plug-ability. I am fine with just having a plan for auth initially. I am also happy to help implement it. I just want to be sure that it is not missed.

@nathanmarz
Copy link
Owner

Absolutely, you brought up a great point regarding auth.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 18, 2013

Ok. I will update the code so all the topology files are distributed via BitTorrent.

Are you willing to accept BitTorrent throttling (and auth) as separate pull requests? For those I will need to fork the ttorrent project and make the necessary modifications, which may take a while.

IMHO, the switch to BitTorrent is a good first step toward HA, and the choking mechanism in the BitTorrent protocol will help alleviate overload on supervisors. When the BitTorrent feature is in place, the planned HA features can move forward. When throttling is available, it would be a simple addition (I could even put the hooks in place now).

I can also add a configurable seed duration for supervisors (e.g. Don't seed, seed indefinitely, or seed for X seconds).

@nathanmarz
Copy link
Owner

Throttling will have to be part of this pull request, auth can be in a separate pull request.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 19, 2013

I dove into the ttorrent code last night and have a crude, but functional means of throttling throughput. Now I have a question and a caveat.

The question is, do you want to throttle at the peer level, or the torrent level? I'm assuming the torrent level so I'm headed in that direction.

The caveat is that the throttling is not exact, nor is the logging/reporting of throuhput. So setting max ul/dl thresholds is more like a "hint." For example if I set a threshold of 50 kb/sec., the actual throughput will fluctuate between ~45-55 kb/sec. but average very close to 50. The ttorrent logs will also report inaccurate throughputs. But testing with several bittorrent clients, throughput was very close to the requested threshold.

Let me know if that's okay and I'll proceed.

Thanks,
Taylor

@nathanmarz
Copy link
Owner

Well we'd want to throttle the overall rate for the supervisor. I think that's what you mean by "peer". If the easiest way to do that is to restrict the supervisor to downloading / sharing one torrent at a time, that could be an acceptable first step. And then we'll open up an issue to fix that. The trick if taking that implementation strategy is choosing which torrent that supervisor should share.

We'd also want to add a special case for when the supervisor has no active workers in which there would be no ul/dl limits.

I think it's ok for there to be minor fluctuation in the ul/dl rate.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 22, 2013

@nathanmarz
Copy link
Owner

@ptgoetz So that implements peer-level throttling?

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 24, 2013

@nathanmarz

Throttling has been added to ttorrent at the torrent level: https://github.com/turn/ttorrent/pull/49

But a potential bug was introduced in a separate pull request that got merged around the same time mine did: https://github.com/turn/ttorrent/issues/51

So if you want to see it action, I'd suggest pulling from my fork/branch until that gets sorted out.

The throttling in ttorrent works like so:

  • A ttorrent client instance has exactly one torrent.
  • The client can share (upload/download) with n peers (a peer is just another client that participates in downloading/uploading).
  • upload/download rate limits are set at the client/torrent level, the amount each peer is throttled will depend on how many peers are connected. For example, if you have a client sharing a torrent with 5 leeching (downloading) peers, and you set the max upload rate to 50 kB/sec., each downloading peer will only get 10 kB/sec. If 4 of the 5 leechers stop downloading (or disconnect), the remaining downloading peer will be able to download at 50 kB/sec.

How I see this playing out in storm: (I'm totally open to suggestions, etc.)

  • Nimbus('s)/Supervisors will have separate global upload/download rate limits (a limit <= 0.0 == unlimited).
  • "Toplogy Torrents" will contain: stormjar.jar, stormconf.ser, stormcode.ser (let me know if I'm missing anything).
  • Each Nimbus/Supervisor will have 1 topology torrent for every deployed topology. (right now I'm naming the file "${storm-id}.torrent")
  • If a rate limit is set for Nimbus/Supervisor, the amount of bandwidth allocated to each torrent/client will depend on the number of deployed topologies. For example, if I set nimbus.bittorrent.max.upload.rate: 50.0 and there are two topologies deployed, the 50 kB/sec. bandwidth will be split between the two (25 kB/sec each). Killing one of the two toplogies will rebalance the bandwidth allocation, so the remaining toplogy will then get the full 50 kB/sec.

Seeding:

  • By default, Nimbus will seed a topology torrent until the topology is killed.
  • Supervisors can be configured with: download-only, seed-for-duration, seed-indefinitely

Is that an acceptable feature set for this pull request?

I'd like to nail down those features and move on to sorting the ramifications it will have on supervisor.clj, et. al. There's some logic in there that I have yet to fully grok (sync, download cleanup, etc.), and I may need to lean on you for some clarification/assistance to make sure everything's right.

I'm still in a WIP state at this point, but largely functional. I can deploy a topology such that all its files are distributed (and rate limited) via bittorrent, the topology functions, and supervisor will recover from kill -9-ing a worker.

(I should also note that I haven't pushed many of the above changes yet).

-Taylor

@nathanmarz
Copy link
Owner

OK, sounds good. Let me know when all these changes are in there.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 24, 2013

OK, the changes are in.

Like I said, I'm not sure I got everything right in nimbus.clj and supervisor.clj -- specifically the sync-processes function. That process deletes the .torrent file, but that doesn't affect sharing since the torrent is already active (or complete) at that point.

Until this pull request gets merged, you'll probably want to use my branch of ttorrent: https://github.com/ptgoetz/ttorrent/tree/rate-limits

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Jul 25, 2013

Fixes are in to ttorrent... So we're good to use the master branch of that dependency now.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Aug 2, 2013

@nathanmarz Have you had any free cycles to take a look at this? (No problem if you haven't, I'm just wondering if it is still under consideration.)

@xumingming
Copy link
Collaborator

+1

@@ -1151,12 +1157,21 @@
(FileUtils/copyFile src-file (File. (master-stormjar-path stormroot)))
))

(defmethod mk-bt-tracker :distributed [conf]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we require the use of bit-torrent distribution? It will be better if we could make it optional. For simple topology, one may just prefer not to use BT at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is required. There was some discussion around making the distribution mechanism pluggable, but Nathan argued against it, as making it pluggable would increase complexity (#629 (comment)).

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 26, 2013

@anfeng Thanks for the input. I pushed a commit to address your concerns.

Let me know if you feel strongly about the "Tracker" class names.

@anfeng
Copy link
Collaborator

anfeng commented Sep 27, 2013

@ptgoetz I could live with NimbusTracker, but don't like SupervisorTracker. How about that we rename it to SupervisorPeer? We may want to rename BaseTracker to BasePeer.

(extract-dir-from-jar (supervisor-stormjar-path tmproot) RESOURCES-SUBDIR tmproot)
(FileUtils/moveDirectory (File. tmproot) (File. stormroot))
(FileUtils/forceMkdir (File. (supervisor-stormdist-root conf)))
(Utils/downloadFromMaster conf (master-stormtorrent-path master-code-dir storm-id) (supervisor-stormtorrent-path (supervisor-stormdist-root conf) storm-id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we still use Utils/downloadFromMaster?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To download the .torrent file.

Nimbus will generate the .torrent file, then supervisors need to download it.

@anfeng
Copy link
Collaborator

anfeng commented Sep 27, 2013

With the introduction of BT, we don't need the following interface of Nimbus. Why are we still keeping them?

  • beginFileDownload
  • downloadChunk

We should also remove Utils::downloadFromMaster().

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 27, 2013

@anfeng I will change the classnames per your suggestion.

We still need Utils::downloadFromMaster() because that's how supervisors get the .torrent file that allows them to download the content.

In the future, we could possibly eliminate it entirely by using magnet links (no .torrent files), but that's not an option yet.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 27, 2013

@anfeng Classes renamed. All unit tests are passing.

@anfeng
Copy link
Collaborator

anfeng commented Sep 27, 2013

@ptgoetz Why don't we store torrent files in ZK? That will help us to make Nimbus stateless. Torrent files are very small, and don't see ZK will face challenges to deal with torrent files.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 27, 2013

I also tested this merged with the 0.9.0-rc2 code on a 5 node cluster with a 70MB topology jar and topology startup was pretty fast.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 27, 2013

@anfeng I thought about storing the .torrent in ZK, but ultimately decided against it. I was afraid of hitting some edge case where the size of the .torrent file exceeded the 1MB limit. My gut instinct is to keep it that way for now.

In the future, however, ttorrent will likely support magnet links. Then the ZK route would definitely be the way to go since all we would have to store in ZK would be a URI.

@anfeng
Copy link
Collaborator

anfeng commented Sep 27, 2013

@ptgoetz I guess this is due to the limitation of bittorrent client lib. In a more advanced client lib, you could specify chunk size for torrent creation. At Yahoo, we have been using such features to ensure torrent size <= 40KB.

@ptgoetz
Copy link
Collaborator Author

ptgoetz commented Sep 27, 2013

@anfeng

Out of curiosity what client lib are you using at yahoo?

The ttorrent maintainer is great about contributions (I contributed bandwidth throttling to support this pull request). I'm sure he'd be open to implementing configurable piece size -- and most of the groundwork is already there.

Nonetheless, I would still prefer the magnet link approach over storing files in ZK. I generally avoid putting files ZK unless there is a definite guarantee that the size limit won't be exceeded. Eventually someone will include blueray rip of the Star Wars collection in their topology jar. ;)

Anyway, would you be willing to move forward with this if we add an issue for completely eliminating the file transfer code from the nimbus interface?

This pull request greases the wheels for HA nimbus. That work will likely involve a lot of ZK work (leader election, etc.), and might be a good time to address this.

I can also look into what it would take to get magnet link support added to ttorrent. Some work has already been done in that are, but I'm not sure where it stands.

Thoughts?

@anfeng
Copy link
Collaborator

anfeng commented Sep 28, 2013

@ptgoetz

If bittorrent solution makes Nimbus stateless, Nimbus HA will be very simple. For that reason, I think that we should explore the possibility on solutions getting there. Would you investigate the possibility on (1) magnet link, or (2) controling chunk size.

At Yahoo, we have a file distribution solution based on libtorrent (http://libtorrent.org/). Please take a look at http://libtorrent.org/reference-Create_Torrents.html#create_torrent.
* create_torrent (file_storage& fs, int piece_size, ...)
where piece_size is the size of each piece in bytes. If a piece size of 0 is specified, a piece_size will be calculated such that the torrent file is roughly 40 KB.

@nathanmarz
Copy link
Owner

I agree that having Nimbus be completely stateless would be ideal, but I do think that it's worth getting this pull request merged in even if Nimbus has to distribute .torrent files. The shift to Bittorrent distribution from the current approach is a much bigger shift than moving to magnet links. It's important that we get this into people's hands so that it can be tested and ironed out.

That said, we should target this for 0.9.1 (from Apache release) rather than 0.9.0, since RC's shouldn't add major new features. Perhaps we can get the full Nimbus HA implementation done for that release as well.

Also, I'm not sure if you were suggesting this but using libtorrent is a non-starter. Let's keep all dependencies Java-based. We had enough headaches already with 0mq native dependency issues.

@anfeng
Copy link
Collaborator

anfeng commented Sep 28, 2013

Nathan, we should not use libtorrent. I mentioned it as a reference point on how bt client could be enhanced to reduce size of torrent.

Andy Feng

Sent from my iPhone

On Sep 28, 2013, at 1:16 PM, Nathan Marz notifications@github.com wrote:

I agree that having Nimbus be completely stateless would be ideal, but I do think that it's worth getting this pull request merged in even if Nimbus has to distribute .torrent files. The shift to Bittorrent distribution from the current approach is a much bigger shift than moving to magnet links. It's important that we get this into people's hands so that it can be tested and ironed out.

That said, we should target this for 0.9.1 (from Apache release) rather than 0.9.0, since RC's shouldn't add major new features. Perhaps we can get the full Nimbus HA implementation done for that release as well.

Also, I'm not sure if you were suggesting this but using libtorrent is a non-starter. Let's keep all dependencies Java-based. We had enough headaches already with 0mq native dependency issues.


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants