-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using plasma on Kubernetes #1315
Comments
Hey @remram44 thanks for bringing this up! Do you have Kubernetes scripts to reproduce this and instructions how to set it up on EC2 and reproduce this issue? Any pointers are welcome. |
@remram44 how did you get it working between Docker containers? Did you have to do anything special? |
On Docker, I didn't have to do anything. I ran with native Docker on macOS. However on doing this again it seems to work if I don't pass an explicit ObjectID to put(), otherwise get() hangs. Server:
Sender:
Getter:
Explicit sender:
Getter:
|
I ran this on Kubernetes on Google Cloud with this configuration:
Then I ran commands on plasma1 and plasma2 using |
@remram44 Thanks! The hanging you are seeing is unrelated to using docker. It hangs because ObjectIDs need to be exactly 20 bytes long. So even without docker, this hangs: In [5]: client.put("hello", plasma.ObjectID(b"hi"))
Out[5]: ObjectID(68690000537f0000300000000000000091010000)
In [6]: client.get(plasma.ObjectID(b"hi")) Whereas this works: In [3]: client.put("hello", plasma.ObjectID(20 * b"h"))
Out[3]: ObjectID(6868686868686868686868686868686868686868)
In [4]: client.get(plasma.ObjectID(20*b"h"))
Out[4]: 'hello' Can you try if this also fixes the problem in Kubernetes? |
This probably should be raising a ValueError 😅 but I agree that it's a separate problem. I'll try again with valid IDs. |
I am surprised that your Docker example even works. Because Plasma store uses |
Hanging on invalid ObjectID is really surprising. :-) (It is interesting that GitHub colors the invalid ObjectID with red background?) |
I don't know why it is red :) I agree it is not good behaviour and should give an error. I submitted a JIRA ticket here and will fix it ASAP: https://issues.apache.org/jira/browse/ARROW-1919 Thanks for finding the problem! |
@pcmoritz: Do you understand why sharing works between containers even if |
I do not understand it and have not tried it, but it seems to be possible to share memory between docker containers in general, see https://stackoverflow.com/questions/29173193/shared-memory-with-docker-containers-docker-version-1-4-1 |
It seems we would have to use --ipc argument, but example above is not. This is why I am confused. @remram44, which Docker version are you using? If you go to two docker containers and you create a file in its Also, @pcmoritz, is |
By default on linux it is using |
What does it store there? Does it store whole objects and then mmaps them? Because |
So we had the same suspicion and did performance experiments with this, it behaves very much like it is in-memory. We are actually unlinking the file before writing anything, so maybe that prevents flushing to disk. This strategy is the same that Google Chrome is using for it's shared memory. |
Do both containers have to have access to the same |
The file descriptor is sent over the socket. That's a good point, probably that's what makes it work. And yes, /dev/shm needs to be larger than the |
OK, the above is Yea, I would also suspect so. So I would assume that object is stored in |
The beauty here is that the OS does refcounting on the file descriptors and will release the resources when the last refcount goes out of scope. That's why we went through the pain of making the file descriptor sending work and unlinking the original file, so the combination of these make sure there is no garbage left behind. Not sure what happens in the docker container case however, does the host OS do the refcounting in that case and everything magically works? I don't know but suspect so. Let me know if you plan to look into this! |
Disregarding the details, I'm extremely happy to learn that it works in the case of multiple docker containers. That's really great :) So in the future we could use docker to get isolation between workers! And if the things stored in the object store are not pickled and use arrow data instead (pickle could be deactivated), it might even be possible to get some level of security from this if you trust docker's isolation. |
An issue right now is that Kubernetes doesn't have an equivalent to |
Ok, so again running on GKE, I could get plasma to run just fine with shm (staying under the default Docker size of 64MB, and using 20-byte object IDs), but no luck with hugepages. Support seems to be upcoming (alpha in 1.8; see here). Can I use the |
Mounting a bigger shm from the host, either as So I guess plasma is usable on Docker and Kubernetes after all, just no hugepages?
|
@remram44 , in the log above, your plasma store is starting with 0.010GB, that's 10MB. Hugepages in the plasma store start working with 1GB minimum memory allocation: |
@remram44 , if you are sure you are dealing with 2MB hugepages, you could try overriding that 1GB default with, say 10MB instead, to fit your memory configuration. |
You mean that plasma doesn't work with hugepages if the |
yes, I believe that's correct, but it's a one line change. I think we could log an error message on startup if the specified |
Same error running with |
@remram44 , did you set up the mount point inside docker containers to be backed by hugetlbfs? I'm not sure if you've gone through the process of setting up the mount point, here's the link:
All of this -- inside the docker container running the plasma store. I haven't tried it in the docker container, so it's not officially supported, but let's see if we can make it work together :) |
This is just one more reason why we should use huge pages instead of /dev/shm. |
Using Can you use |
Please reopen if there are more questions/updates. |
Yes. It works well. We just have a host-local directory we mount to all pods which we use for plasma socket between pods. |
How precisely to configure this host local directory in scalable way I have not yet found a good solution though. If you want your pods to run on multiple nodes. Some of my notes I wrote about this: There seems to be two ways to achieve this:
|
@mitar thank you for the detailed response :) |
@mitar Hi! I'm also interested in the way you mount
|
I haven't done it at the end in a way that would support automatic scheduling. So I cannot help you much here. |
I am trying to use plasma to store datasets in memory, and share them between pods.
I find that this does not work well, and in particular, plasma.get/plasma.put tends to hang with no specific error message.
I am sure other people have tried this setup, I would love to hear about their experience.
The setup is:
-h
). shm hangs ifget()
ing an object submitted by another client, hugepages seems to complain about missing huge pages when mmappingNote that I could get this running using Docker containers just fine. I understand that some of those issues are due to Kubernetes more than plasma, but I would love some pointers.
cc @mitar
The text was updated successfully, but these errors were encountered: