You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I sent the question via email, but there was no reply, so I am leaving it on the issue.
I used 2 nodes, one is for the compute node and the other for the far memory node.
Both nodes have the following HW specifications:
CPU: Intel Xeon W-2245 (3.9GHz, 8-cores)
Memory: 32GB
NIC: ConnectX3 EN 40G
and SW specifications:
NIC Drivers
Mellanox ConnectX3 OFED 4.2-1.0.0.0 drivers (Both nodes)
OS
Ubuntu 16.04.7 server (Both nodes)
Fastswap kernel which is compiled & installed as the given information (Compute node)
Linux kernel 4.11.0 (Far memory node)
I wanted to use 24GB of far memory, so created a swap file size of 24GB, and deactivated any other swap devices on my compute node.
So, followings are my questions:
[ farmemserver/rmserver.c ]
In the paper, it said you used 32GB of memory for each node. In my case, there's an error when trying to get 32GB of memory for queues by malloc(BUFFER_SIZE). I fixed BUFFER_SIZE to 24GB, then it works. To use less memory than the default of 32GB, is it right to modify BUFFER_SIZE in farmemserver/rmserver.c as above?
drivers/fastswap_rdma.c is supposed to take the number of CPUs by num_online_cpu(), but in farmemserver/rmserver.c has the fixed number of CPUs to 8. If hyperthreading is enabled (as in the paper), num_online_cpu() will return 16 at compute node, and try to get 48 queues from the server. But rmserver only creates 24 queues. Is it okay to modify the NUM_PROCS of farmemserver/rmserver.c to 16 as screenshot attached above?
After I compiled and executed rmserver with edited code, I could successfully load fastswap_rdma and fastswap modules on the compute node. But when I tried to execute the test workloads of cfm, I encountered kernel error on the compute node as above and swap traffics flow to the local swap space (compute node didn't make any RDMA requests). I tried rebooting both machines and setting up for using far memory, but the compute node only used local swap file without any errors.
Have you ever experienced the same error as above? If so, could you tell me what was the problem and how you solved it?
Thank you for your reading and I would appreciate your reply.
The text was updated successfully, but these errors were encountered:
I just realized that the mailing list seems to be broken. I'll try to get it fixed. Sorry about that!
In the paper, it said you used 32GB of memory for each node. In my case, there's an error when trying to get 32GB of memory for queues by malloc(BUFFER_SIZE). I fixed BUFFER_SIZE to 24GB, then it works. To use less memory than the default of 32GB, is it right to modify BUFFER_SIZE in farmemserver/rmserver.c as above?
You should modify the source code of the program, and recompile it (in the screenshot, it seems you modified only the comment).
drivers/fastswap_rdma.c is supposed to take the number of CPUs by num_online_cpu(), but in farmemserver/rmserver.c has the fixed number of CPUs to 8. If hyperthreading is enabled (as in the paper), num_online_cpu() will return 16 at compute node, and try to get 48 queues from the server. But rmserver only creates 24 queues. Is it okay to modify the NUM_PROCS of farmemserver/rmserver.c to 16 as screenshot attached above?
We didn't use hyper threading in our experiments; quoting from the paper: "We use one hyperthread on each core and disable TurboBoost and CPU frequency scaling in order to reduce variability."
The number of queues must match exactly both in fastswap and in the memory server. So yes, modifying (and recompiling) the rmserver to create the same number of queues fastswap is trying to create should work.
Have you ever experienced the same error as above? If so, could you tell me what was the problem and how you solved it?
No I haven't seen this error before. Can you post your dmesg output from boot until the error shows up?
Can you also post the output of using ib_read_lat between the server and client?
I sent the question via email, but there was no reply, so I am leaving it on the issue.
I used 2 nodes, one is for the compute node and the other for the far memory node.
Both nodes have the following HW specifications:
and SW specifications:
NIC Drivers
OS
I wanted to use 24GB of far memory, so created a swap file size of 24GB, and deactivated any other swap devices on my compute node.
So, followings are my questions:
[ farmemserver/rmserver.c ]
After I compiled and executed rmserver with edited code, I could successfully load fastswap_rdma and fastswap modules on the compute node. But when I tried to execute the test workloads of cfm, I encountered kernel error on the compute node as above and swap traffics flow to the local swap space (compute node didn't make any RDMA requests). I tried rebooting both machines and setting up for using far memory, but the compute node only used local swap file without any errors.
Thank you for your reading and I would appreciate your reply.
The text was updated successfully, but these errors were encountered: