Questions about installation #2

BinZlP · 2021-04-22T02:26:22Z

I sent the question via email, but there was no reply, so I am leaving it on the issue.

I used 2 nodes, one is for the compute node and the other for the far memory node.

Both nodes have the following HW specifications:

CPU: Intel Xeon W-2245 (3.9GHz, 8-cores)
Memory: 32GB
NIC: ConnectX3 EN 40G

and SW specifications:
NIC Drivers

Mellanox ConnectX3 OFED 4.2-1.0.0.0 drivers (Both nodes)
OS
Ubuntu 16.04.7 server (Both nodes)
Fastswap kernel which is compiled & installed as the given information (Compute node)
Linux kernel 4.11.0 (Far memory node)

I wanted to use 24GB of far memory, so created a swap file size of 24GB, and deactivated any other swap devices on my compute node.

So, followings are my questions:

[ farmemserver/rmserver.c ]

In the paper, it said you used 32GB of memory for each node. In my case, there's an error when trying to get 32GB of memory for queues by malloc(BUFFER_SIZE). I fixed BUFFER_SIZE to 24GB, then it works. To use less memory than the default of 32GB, is it right to modify BUFFER_SIZE in farmemserver/rmserver.c as above?
drivers/fastswap_rdma.c is supposed to take the number of CPUs by num_online_cpu(), but in farmemserver/rmserver.c has the fixed number of CPUs to 8. If hyperthreading is enabled (as in the paper), num_online_cpu() will return 16 at compute node, and try to get 48 queues from the server. But rmserver only creates 24 queues. Is it okay to modify the NUM_PROCS of farmemserver/rmserver.c to 16 as screenshot attached above?

After I compiled and executed rmserver with edited code, I could successfully load fastswap_rdma and fastswap modules on the compute node. But when I tried to execute the test workloads of cfm, I encountered kernel error on the compute node as above and swap traffics flow to the local swap space (compute node didn't make any RDMA requests). I tried rebooting both machines and setting up for using far memory, but the compute node only used local swap file without any errors.

Have you ever experienced the same error as above? If so, could you tell me what was the problem and how you solved it?

Thank you for your reading and I would appreciate your reply.

amaro · 2021-04-25T16:53:38Z

I just realized that the mailing list seems to be broken. I'll try to get it fixed. Sorry about that!

In the paper, it said you used 32GB of memory for each node. In my case, there's an error when trying to get 32GB of memory for queues by malloc(BUFFER_SIZE). I fixed BUFFER_SIZE to 24GB, then it works. To use less memory than the default of 32GB, is it right to modify BUFFER_SIZE in farmemserver/rmserver.c as above?

You should modify the source code of the program, and recompile it (in the screenshot, it seems you modified only the comment).

drivers/fastswap_rdma.c is supposed to take the number of CPUs by num_online_cpu(), but in farmemserver/rmserver.c has the fixed number of CPUs to 8. If hyperthreading is enabled (as in the paper), num_online_cpu() will return 16 at compute node, and try to get 48 queues from the server. But rmserver only creates 24 queues. Is it okay to modify the NUM_PROCS of farmemserver/rmserver.c to 16 as screenshot attached above?

We didn't use hyper threading in our experiments; quoting from the paper: "We use one hyperthread on each core and disable TurboBoost and CPU frequency scaling in order to reduce variability."
The number of queues must match exactly both in fastswap and in the memory server. So yes, modifying (and recompiling) the rmserver to create the same number of queues fastswap is trying to create should work.

Have you ever experienced the same error as above? If so, could you tell me what was the problem and how you solved it?

No I haven't seen this error before. Can you post your dmesg output from boot until the error shows up?

Can you also post the output of using ib_read_lat between the server and client?

amaro self-assigned this Apr 25, 2021

amaro added the question Further information is requested label Apr 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about installation #2

Questions about installation #2

BinZlP commented Apr 22, 2021

amaro commented Apr 25, 2021 •

edited

Loading

Questions about installation #2

Questions about installation #2

Comments

BinZlP commented Apr 22, 2021

amaro commented Apr 25, 2021 • edited Loading

amaro commented Apr 25, 2021 •

edited

Loading