Please bring back Torus #452

SkinyMonkey · 2018-09-06T15:56:09Z

While working on a kubernetes cluster running scientific programs I stumbled upon a small problem.
I was using NFS to share temporary files between the nodes and noticed that the NFS server was a single point of failure.

So, there I go, trying to find a solution to this small bump on my happy kubernetes road.
But the more I look, the more I read, the more I understand that the distributed, redundant storage problem was never adressed correctly.

The systems are old, big, hard to deploy and hard to maintain.

As it was pinpoint in the origin torus blog, they were not designed for the cloud era.

Torus seemed like a real clean solution and I hope it will be back someday ...

Thank you for taking the time to read this and have a nice day!

philips · 2018-09-06T15:59:38Z

Have you checked out https://github.com/rook/rook?

SkinyMonkey · 2018-09-06T16:22:18Z

I did, it deploys a lot of components, making it hard to debug by simply reading the logs. Plus it needs an up to date kernel, which is complex in my case : the cloud provider (azure) does not allow me to choose which OS version is running on their AKS kubernetes nodes. In a case of horizontal autoscaling I would have to reboot the machine after a kernel update ... far from fast, far from guaranteed stability. Right now I didn't find any clean alternative to the simple NFS solution. We might end up using rook + minio or something like scality's product, but still, seems overkill for a temporary file system ...

…

On Thu, Sep 6, 2018 at 12:00 PM Brandon Philips ***@***.***> wrote: Have you checked out https://github.com/rook/rook? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#452 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAO2RDCqE-9dmaINRnGb83lD2ag1-DUGks5uYUaqgaJpZM4WdPIr> .

-- Adrien Candiotti login : candio_a epitech 2013 - astek - koala

Ulexus · 2018-09-06T17:37:23Z

@SkinyMonkey If you are in a single cloud provider, why are you not simply using the provided storage services? Azure offers both NFS-like storage and volume-like storage. AKS already has storage classes for azure-disk (volume) types out of the box.

Torus' use cases (in my opinion) are for multi-cloud architectures and baremetal clusters, where you are not already provided an economical storage infrastructure. Rook is serving those needs well, and since it is just Ceph under the hood, it is pretty easily debuggable. By and large, the Go codebase of Rook's operators and wrappers is pretty comprehensible and well-architected.

SkinyMonkey · 2018-09-06T17:53:14Z

Sadly the NFS like storage, AzureFile, is extremely slow, to the point of not being usable in a realistic environment. This has been reported multiple times since 2017 and still hasn't been adressed.
About the volume like storage, as far as I know you cannot attach the same azure disk to multiple pods, making it useless in a shared file system context.
If it does that would be a good temporary solution, but ..

the product I'm working on aims at being deployable on premise anyway. We need an abstract solution to this problem.

Rook is nice, and maybe i'm looking at it from the wrong angle, but again, it's an overkill solution for a simple temporary shared file system.
It also has prerequisites that makes it difficult to use in my case, as I described earlier (kernel updates etc)

I might use Rook or a solution like Scality's or Minio in the end, it's just that torus seemed like a really good project that could benefit a lot of people.
Right now I'm stuck between overkill solutions that need more and more infrastructure/maintenance/development to support it and simple ones that wont survive a node crash.

Ulexus · 2018-09-06T18:13:05Z

There exist other storage solutions for baremetal clusters, but among them, I would only use Rook/Ceph. Kubernetes does support NFS natively, though I think only a madman would actually use NFS in production (though I understand there are many people who will vehemently disagree).

Personally, I would stay away from any shared-filesystem type approach. You will either trade speed or reliability, and there is almost always a better way to achieve the actual requirements than sharing POSIX-like filesystems amongst disparate machines.

If you really must have a shared filesystem, GlusterFS is another option. If offers a native shared filesystem as well as an NFS compatibility later. It is supported by Kubernetes out-of-the-box. Since it is simpler than Ceph and concentrates just on a shared filesystem, that is probably the most appropriate answer to your needs. Keep in mind that, like any distributed storage system, you must install n+1 (i.e. minimum 3 nodes) for redundancy, and it is easy to overlook this fact in the GlusterFS documentation.

SkinyMonkey · 2018-09-06T18:47:55Z

I've checked/considered a lot. Rook/Ceph is in the top 5 with Glusterfs.
I agree That NFS offers no guarantee of reliability.

The product I'm working on demands that I share files between the pods i'm executing. Big files too, hundred of gigs.
Another solution would be to execute all the pods on the same node .. but it would be losing the real plus of kubernetes: distribute the computing charge over several machines.

Glusterfs was considered but I met the same problem as I met with rook : kernel updates or modules are needed.
Any distributed system needs at least 3 nodes to run yes, Mongodb, Rabbitmq, Etcd, any clustered program needs 3 nodes fo redundancy. K8s is another example.

Thank you, I'll recheck Glusterfs, maybe i'll have two separate clusters : one for the file system, one for the computation. That way the filesystem will be less subjects to autoscaling needs and problems.

Ulexus · 2018-09-06T18:58:07Z

Separate clusters would be precisely my advice, in that case, yes.

Also to note, though, since your ultimate goal is to deploy to baremetal, you might as well be using bare VM instances running your own kernel, rather than the cloud provider's prefabricated kubernetes, which would give you exactly the same freedom of deployment, kernel-wise, as you would have with baremetal. You can easily also easily follow a progressive build-out, to reduce the size of each hurdle:

Deploy to AKS using offered (however slow) shared filesystem there.
Deploy to Azure bare VMs and construct your storage layer.
Deploy to baremetal.

The storage interfaces for Kubernetes are nicely abstracted, so you don't need to change your applications just because you change your underlying storage infrastructure. Your YAML files may change, but your application interfaces will be the same throughout that progression.

I know I keep harping, but I would have very serious reservations about an architecture which calls for shared filesystems, much less with individual files of that size. In my opinion, you are just asking for trouble. I'm not about to dive into an architectural analysis, but I would strongly urge you to take a step back and really analyze your storage parameter needs.

SkinyMonkey · 2018-09-06T19:18:25Z

Thank you for your time and analysis, we work on both hosted and on premise solutions.

I'm not sure to understand the problems tied to a shared file system. Maybe because I never worked with one in production?
In a case where a chain of interdependant programs produce heavy files, I would be curious to see what solution, that avoid using a shared file system, would allow the system to work in a reasonably fast way.

Edit: by juggling with a 'per job' azure disk for example, I could build something in a hosted situation i guess. But i'm not sure how that would work on premise.

Ulexus · 2018-09-06T21:13:27Z

The core problem is always locking, and POSIX-oriented filesystem semantics just don't provide sufficient locking guarantees or semantics to facilitate safe operation. Various tricks are used to work around this limitation, and the trade-off is always either speed or safety.

If a 'per job' azure disk is something you are considering, consider using an object store instead. Every major cloud provider supplies one, and there are a few options for self-hosted object stores (including Rook/Ceph). You can have any number of readers, and they handle locking automatically. For extremely large chunks of data, too, most object store APIs (the de facto standard is S3) allow for server-side seeking, to efficiently process smaller chunks at a time. In general, the APIs are very simple, so it is nearly as easy to use as a filesystem.

zhengxiaochuan-3 · 2018-09-07T05:13:58Z

https://github.com/tiglabs/containerfs
https://github.com/tiglabs/containerfs-csi-driver

SkinyMonkey · 2018-09-07T13:01:58Z

@Ulexus Ok! That makes sense. I did see some wait happening on read operation with NFS. 'Could have been caused by the locking.

I'm going to check object stores on Azure and the on premise solutions.
Thanks again :)

@zhengxiaochuan-3 , I did found this repo but the lack of documentation or orientation made me consider other solutions. Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please bring back Torus #452

Please bring back Torus #452

SkinyMonkey commented Sep 6, 2018

philips commented Sep 6, 2018 •

edited

Loading

SkinyMonkey commented Sep 6, 2018 via email

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 •

edited

Loading

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 •

edited

Loading

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 •

edited

Loading

Ulexus commented Sep 6, 2018

zhengxiaochuan-3 commented Sep 7, 2018

SkinyMonkey commented Sep 7, 2018

Please bring back Torus #452

Please bring back Torus #452

Comments

SkinyMonkey commented Sep 6, 2018

philips commented Sep 6, 2018 • edited Loading

SkinyMonkey commented Sep 6, 2018 via email

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 • edited Loading

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 • edited Loading

Ulexus commented Sep 6, 2018

SkinyMonkey commented Sep 6, 2018 • edited Loading

Ulexus commented Sep 6, 2018

zhengxiaochuan-3 commented Sep 7, 2018

SkinyMonkey commented Sep 7, 2018

philips commented Sep 6, 2018 •

edited

Loading

SkinyMonkey commented Sep 6, 2018 •

edited

Loading

SkinyMonkey commented Sep 6, 2018 •

edited

Loading

SkinyMonkey commented Sep 6, 2018 •

edited

Loading