Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

编译过程,rust-kernel-module文件出现问题 #4

Open
ShuguiW opened this issue Jul 10, 2024 · 7 comments
Open

编译过程,rust-kernel-module文件出现问题 #4

ShuguiW opened this issue Jul 10, 2024 · 7 comments

Comments

@ShuguiW
Copy link

ShuguiW commented Jul 10, 2024

作者您好,当我执行make km,其编译rust-kernel-module时,出现错误,其报错信息如下,我自己找到build.rs文件,在其中添加打印语句,发现报错位置貌似发生在let bindings = builder.generate().expect("Unable to generate bindings");这一句。而且报错会发生在编译rust-kernel-linux-util时,报错语句也是相同的位置,希望您能给出一些建议指明报错原因以及如何修改。
(base) crow@crow-H310M-T-PRO:~/mitosis-core$ make km
cd mitosis-kms ; python build.py fork

  • [fork]
  • [RUNNING] ['make', '-C', '/home/crow/mitosis-core/mitosis-kms', 'TEST_NAME=fork', 'TEST_PATH=fork']
    make[1]: Entering directory '/home/crow/mitosis-core/mitosis-kms'
    cp -f /usr/src/ofa_kernel/default/Module*.symvers /home/crow/mitosis-core/mitosis-kms/Module.symvers
    make -C /lib/modules/4.15.0-46-generic/build M=/home/crow/mitosis-core/mitosis-kms CC=clang-9 CONFIG_CC_IS_CLANG=y
    make[2]: Entering directory '/usr/src/linux-headers-4.15.0-46-generic'
    cd /home/crow/mitosis-core/mitosis-kms/fork; CARGO_TARGET_DIR=../target cargo build -Z build-std=core,alloc --target=x86_64-unknown-none-linuxkernel --features "mitosis krdma-test cow use_rc" --no-default-features
    warning: /home/crow/mitosis-core/mitosis/Cargo.toml: dependency (x86_64) specified without providing a local path, Git repository, or version to use. This will be considered an error in future versions
    Compiling linux-kernel-module v0.1.0 (/home/crow/mitosis-core/deps/krcore/rust-kernel-rdma/deps/rust-kernel-module)
    Compiling rust-kernel-linux-util v0.1.0 (/home/crow/mitosis-core/deps/krcore/rust-kernel-rdma/rust-kernel-linux-util)
    error: failed to run custom build command for linux-kernel-module v0.1.0 (/home/crow/mitosis-core/deps/krcore/rust-kernel-rdma/deps/rust-kernel-module)

Caused by:
process didn't exit successfully: /home/crow/mitosis-core/mitosis-kms/fork/../target/debug/build/linux-kernel-module-e361eb8a63f7b057/build-script-build (exit status: 101)
--- stdout
cargo:rerun-if-env-changed=CC
cargo:rerun-if-env-changed=KDIR
cargo:rerun-if-env-changed=c_flags
cargo:rerun-if-changed=src/bindings_helper.h
cargo:rerun-if-changed=src/inline_helper.h
rust-kernel-module/186
rust-kernel-module/190
rust-kernel-module/197
set opaque type:desc_struct
set opaque type:xregs_state
rust-kernel-module/200

--- stderr
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:17:9: error: unknown type name '__kernel_ino_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:18:9: error: unknown type name '__kernel_mode_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:21:9: error: unknown type name '__kernel_off_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:22:9: error: unknown type name '__kernel_pid_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:23:9: error: unknown type name '__kernel_daddr_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:25:9: error: unknown type name '__kernel_suseconds_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:26:9: error: unknown type name '__kernel_timer_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:27:9: error: unknown type name '__kernel_clockid_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:32:9: error: unknown type name '__kernel_uid32_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:33:9: error: unknown type name '__kernel_gid32_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:34:9: error: unknown type name '__kernel_uid16_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:35:9: error: unknown type name '__kernel_gid16_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:41:9: error: unknown type name '__kernel_old_uid_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:42:9: error: unknown type name '__kernel_old_gid_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:46:9: error: unknown type name '__kernel_loff_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:55:9: error: unknown type name '__kernel_size_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:60:9: error: unknown type name '__kernel_ssize_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:65:9: error: unknown type name '__kernel_ptrdiff_t'
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:70:9: error: unknown type name '__kernel_time_t'
fatal error: too many errors emitted, stopping now [-ferror-limit=]
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:17:9: error: unknown type name '__kernel_ino_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:18:9: error: unknown type name '__kernel_mode_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:21:9: error: unknown type name '__kernel_off_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:22:9: error: unknown type name '__kernel_pid_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:23:9: error: unknown type name '__kernel_daddr_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:25:9: error: unknown type name '__kernel_suseconds_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:26:9: error: unknown type name '__kernel_timer_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:27:9: error: unknown type name '__kernel_clockid_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:32:9: error: unknown type name '__kernel_uid32_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:33:9: error: unknown type name '__kernel_gid32_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:34:9: error: unknown type name '__kernel_uid16_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:35:9: error: unknown type name '__kernel_gid16_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:41:9: error: unknown type name '__kernel_old_uid_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:42:9: error: unknown type name '__kernel_old_gid_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:46:9: error: unknown type name '__kernel_loff_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:55:9: error: unknown type name '__kernel_size_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:60:9: error: unknown type name '__kernel_ssize_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:65:9: error: unknown type name '__kernel_ptrdiff_t', err: true
/lib/modules/4.15.0-46-generic/build/./include/linux/types.h:70:9: error: unknown type name '__kernel_time_t', err: true
fatal error: too many errors emitted, stopping now [-ferror-limit=], err: true
thread 'main' panicked at 'Unable to generate bindings: ()', /home/crow/mitosis-core/deps/krcore/rust-kernel-rdma/deps/rust-kernel-module/build.rs:203:39
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...
error: build failed
/home/crow/mitosis-core/mitosis-kms/Kbuild:11: recipe for target '/home/crow/mitosis-core/mitosis-kms/target/x86_64-unknown-none-linuxkernel/debug/libfork.a' failed
make[3]: *** [/home/crow/mitosis-core/mitosis-kms/target/x86_64-unknown-none-linuxkernel/debug/libfork.a] Error 101
Makefile:1551: recipe for target 'module/home/crow/mitosis-core/mitosis-kms' failed
make[2]: *** [module/home/crow/mitosis-core/mitosis-kms] Error 2
make[2]: Leaving directory '/usr/src/linux-headers-4.15.0-46-generic'
Makefile:38: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/home/crow/mitosis-core/mitosis-kms'
Traceback (most recent call last):
File "/home/crow/mitosis-core/mitosis-kms/build.py", line 36, in
main(sys.argv)
File "/home/crow/mitosis-core/mitosis-kms/build.py", line 28, in main
run(
File "/home/crow/mitosis-core/mitosis-kms/build.py", line 15, in run
subprocess.check_call(list(args), cwd=cwd, env=environ)
File "/home/crow/miniconda3/lib/python3.12/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['make', '-C', '/home/crow/mitosis-core/mitosis-kms', 'TEST_NAME=fork', 'TEST_PATH=fork']' returned non-zero exit status 2.
makefile:12: recipe for target 'km' failed
make: *** [km] Error 1

@wxdwfc
Copy link
Collaborator

wxdwfc commented Jul 10, 2024

您好,我看了下您的环境似乎没问题;能否确认下,软件环境(kernel-header,rust版本和clang版本)是否按照readme里的进行安装? 如果确认无误的话,可以试下rm mitosis-kms/.cache.mk 再进行build下。

@ShuguiW
Copy link
Author

ShuguiW commented Jul 11, 2024

非常感谢,根据您的建议,make km编译成功。然后我尝试了后续remote fork进程的例子,遇到了几个小问题还想向您请教。执行make insmod 命令时出现了如下错误:
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ make insmod
sudo rmmod fork ; sudo insmod mitosis-kms/fork.ko mac_id=0
[sudo] password for crow:
rmmod: ERROR: Module fork is not currently loaded
Segmentation fault (core dumped)
makefile:15: recipe for target 'insmod' failed
make: *** [insmod] Error 139
我不清楚这个段错误来源于哪里,因为这个错误,没有成功生成/dev/mitosis-syscalls文件。然后我执行make rmmod,但是又出现了如下错误。
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ make rmmod
sudo rmmod fork
rmmod: ERROR: Module fork is in use
makefile:18: recipe for target 'rmmod' failed
make: *** [rmmod] Error 1
因此,不出意外,当我尝试连接另一台机器的时候,连接失败。
(base) crow@crow-H310M-T-PRO:~ /mitosis-core/exp$ ./connector -gid="fe80:0000:0000:0000:1270:fdff:fe39:0e7a" -mac_id=0 -nic_id=0
connect res: -1
下面是我show_gids执行的结果以及ibstatus的结果:
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ show_gids
DEV   PORT  INDEX GID                           IPv4             VER   DEV
---   ----  ----- ---                           ------------     ---   ---
mlx5_0      1     0     fe80:0000:0000:0000:1270:fdff:fe39:0e92               v1    enp1s0f0
mlx5_0      1     1     fe80:0000:0000:0000:1270:fdff:fe39:0e92               v2    enp1s0f0
mlx5_1      1     0     fe80:0000:0000:0000:1270:fdff:fe39:0e93               v1    enp1s0f1
mlx5_1      1     1     fe80:0000:0000:0000:1270:fdff:fe39:0e93               v2    enp1s0f1
n_gids_found=4
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ ibstatus
Infiniband device 'mlx5_0' port 1 status:
      default gid:       fe80:0000:0000:0000:1270:fdff:fe39:0e92
      base lid:    0x0
      sm lid:            0x0
      state:             4: ACTIVE
      phys state:  5: LinkUp
      rate:        100 Gb/sec (4X EDR)
      link_layer:  Ethernet

Infiniband device 'mlx5_1' port 1 status:
      default gid:       fe80:0000:0000:0000:1270:fdff:fe39:0e93
      base lid:    0x0
      sm lid:            0x0
      state:             1: DOWN
      phys state:  3: Disabled
      rate:        40 Gb/sec (4X QDR)
      link_layer:  Ethernet
这个连接失败的原因应该就是我前面没能生成/dev/mitosis-syscalls文件导致的。希望您能提供一些修改意见,十分感谢。
另外,我还尝试了微基准的测试,其打印结果貌似也比较奇怪,如下所示:
(mitosis) crow@crow-H310M-T-PRO:~/mitosis-core/exp_scripts$ make micro-c-prepare
rm -rf out/micro-c-prepare
python toml_generator.py -f templates-run/micro-c/template-run-micro-prepare.toml -o out/micro-c-prepare -d "{ 'pwd':'218','user':'crow', 'hosts':{'builder':['crow-H310M-T-PRO',] , 'parent':['crow-H310M-T-PRO'], 'child':[], },'path':'projects/mos', 'placeholder': {'parent_gid': 'fe80:0000:0000:0000:1270:fdff:fe39:0e92', 'parent_host': 'crow-H310M-T-PRO', 'child_hosts': ''} } "
creating toml output dir out/micro-c-prepare
python evaluation_runner.py --input out/micro-c-prepare --arguments="-k=" --filter="Prepare"
trace 1048576

finish

run-1048576.toml
trace 1073741824

finish

run-1073741824.toml
trace 134217728

finish

run-134217728.toml

@wxdwfc
Copy link
Collaborator

wxdwfc commented Jul 12, 2024

您好,看上去是kernel module 没加载成功。能否看下dmesg 的报错信息?

@wxdwfc
Copy link
Collaborator

wxdwfc commented Jul 12, 2024

另外,看到您用的是ROCE。 我们由于机器问题,没测试过ROCE,建议使用IB网络进行实验。

@ShuguiW
Copy link
Author

ShuguiW commented Jul 12, 2024

您好,下面是dmesg的报错信息,由于信息较多,下面还包括一个word文件,里面将报错位置标红,便于查看。另外根据您的提示,我尝试将link_layer换成IB网络,但是没有更换成功,貌似是当前环境并不支持我更换传输模式。最后,我测试了以太网工作的联通性。我期望能够经过一些配置上的更改成功运行您的测试例子,或者您是否能够给出一些代码上修改的提示以支持在ROCE上运行。期待您的反馈,诚心感谢。
dmesg打印信息.docx
[ 1231.613811] perf: interrupt took too long (2516 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
[ 1630.526382] perf: interrupt took too long (3166 > 3145), lowering kernel.perf_event_max_sample_rate to 63000
[ 2191.290286] perf: interrupt took too long (3958 > 3957), lowering kernel.perf_event_max_sample_rate to 50500
[ 4541.191119] perf: interrupt took too long (4951 > 4947), lowering kernel.perf_event_max_sample_rate to 40250
[ 7368.180582] src/lib.rs@29: [INFO ] - Remote fork kernel module assigned ID=0
[ 7368.180585] /home/crow/mitosis-core/mitosis/src/startup.rs@49: [INFO ] - Try to start MITOSIS instance, init global services
[ 7368.180586] /home/crow/mitosis-core/mitosis/src/startup.rs@15: [INFO ] - [check]: use on-demand resume mode.
[ 7368.180587] /home/crow/mitosis-core/mitosis/src/startup.rs@19: [INFO ] - [check]: Parent is using copy-on-write (COW) mode.
[ 7368.180588] /home/crow/mitosis-core/mitosis/src/startup.rs@25: [INFO ] - [check]: Prefetch optimization is enabled, prefetch sz 1.
[ 7368.180589] /home/crow/mitosis-core/mitosis/src/startup.rs@36: [INFO ] - [check]: Not cache remote page table.
[ 7368.180590] /home/crow/mitosis-core/mitosis/src/startup.rs@42: [INFO ] - [check]: Use RDMA's dynamic connected transport for communications.
[ 7368.180591] /home/crow/mitosis-core/mitosis/src/startup.rs@45: [INFO ] - ********* All configuration check passes !*********
[ 7368.180817] rust-kernel-rdma-base: enabling unsafe global rkey
[ 7368.488917] buf info: panicked at 'should not fail: Creation(-22)', /home/crow/mitosis-core/mitosis/src/rdma_context.rs:50:66
[ 7368.488932] ------------[ cut here ]------------
[ 7368.488934] kernel BUG at src/helpers.c:14!
[ 7368.488944] invalid opcode: 0000 [#1] SMP PTI
[ 7368.488947] Modules linked in: fork(OE+) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_umad(OE) mlx5_ib(OE) ib_uverbs(OE) mlx4_ib(OE) ib_core(OE) mlx4_en(OE) mlx4_core(OE) binfmt_misc nls_iso8859_1 kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd input_leds joydev video acpi_pad knem(OE) parport_pc ppdev lp parport autofs4 hid_generic usbhid hid mlx5_core(OE) r8101(OE) ahci mlx_compat(OE) libahci mlxfw(OE) devlink ptp pps_core
[ 7368.488997] CPU: 1 PID: 4844 Comm: insmod Tainted: G OE 4.15.0-46-generic #49-Ubuntu
[ 7368.488999] Hardware name: Colorful Technology And Development Co.,LTD H310M-T PRO/H310M-T PRO, BIOS 5.12 05/16/2019
[ 7368.489264] RIP: 0010:bug_helper+0x9/0x20 [fork]
[ 7368.489267] RSP: 0018:ffffab8002d56720 EFLAGS: 00010282
[ 7368.489271] RAX: 0000000000000071 RBX: ffffab8002d56738 RCX: 0000000000000000
[ 7368.489274] RDX: 0000000000000000 RSI: ffff9a036ed16498 RDI: ffff9a036ed16498
[ 7368.489276] RBP: ffffab8002d56720 R08: 0000000000000001 R09: 0000000000000344
[ 7368.489278] R10: ffffab8002d56580 R11: 0000000000000000 R12: 0000000000000000
[ 7368.489281] R13: ffff9a02ae9d1720 R14: ffffab8002d56c80 R15: 0000000000000001
[ 7368.489284] FS: 00007fe225653540(0000) GS:ffff9a036ed00000(0000) knlGS:0000000000000000
[ 7368.489287] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7368.489290] CR2: 0000557a053811c0 CR3: 00000001814ca002 CR4: 00000000003606e0
[ 7368.489292] Call Trace:
[ 7368.489570] rust_begin_unwind+0x91/0xa0 [fork]
[ 7368.489578] ? __switch_to_asm+0x34/0x70
[ 7368.489604] ? rdma_port_get_link_layer+0x1e/0x50 [ib_core]
[ 7368.489879] ? ZN4core5slice29$LT$impl$u20$$u5b$T$u5d$$GT$15copy_from_slice17hf82914350afa714dE+0x80/0x80 [fork]
[ 7368.490107] _ZN4core9panicking9panic_fmt17he24d6cc5a36dd1dbE+0x2d/0x30 [fork]
[ 7368.490365] ? _ZN8KRdmaKit11queue_pairs27dynamic_connected_transport22DynamicConnectedTarget17get_datagram_meta17ha37b4d2717f1645dE+0x31/0xf0 [fork]
[ 7368.490586] _ZN4core6result13unwrap_failed17h93cdea133055b12cE+0x6c/0x70 [fork]
[ 7368.490803] ? ZN44$LT$$RF$T$u20$as$u20$core..fmt..Display$GT$3fmt17h0feed2dd3958df8bE+0x20/0x20 [fork]
[ 7368.491022] ? ZN42$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$3fmt17h884188d77116d6bbE+0x100/0x100 [fork]
[ 7368.491233] _ZN7mitosis12rdma_context10start_rdma17h698be4bafd69e08cE+0xa63/0xb10 [fork]
[ 7368.491458] ? ZN42$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$3fmt17hf9cb0ff4e10367dcE+0x10/0x10 [fork]
[ 7368.491675] ? ZN4core3fmt3num3imp52$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h840ec10b2de74afeE+0x20/0x20 [fork]
[ 7368.491895] ? ZN42$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$3fmt17hf9cb0ff4e10367dcE+0x10/0x10 [fork]
[ 7368.492122] ? ZN4core3fmt3num3imp52$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h840ec10b2de74afeE+0x20/0x20 [fork]
[ 7368.492350] _ZN7mitosis7startup12init_mitosis17hcead06ac6449721eE+0x167/0xdf0 [fork]
[ 7368.492591] ? ZN42$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$3fmt17hf9cb0ff4e10367dcE+0x10/0x10 [fork]
[ 7368.492797] ? ZN4core3fmt3num3imp52$LT$impl$u20$core..fmt..Display$u20$for$u20$i32$GT$3fmt17h840ec10b2de74afeE+0x20/0x20 [fork]
[ 7368.493074] ? ZN83$LT$rust_kernel_linux_util..level..Level$u20$as$u20$core..str..traits..FromStr$GT$8from_str17h30ed5e388135041cE+0x270/0x270 [fork]
[ 7368.493286] ? ZN57$LT$core..fmt..Arguments$u20$as$u20$core..fmt..Debug$GT$3fmt17hcc1dbed6d991ec2cE+0x60/0x60 [fork]
[ 7368.493530] _ZN7mitosis7startup14start_instance17h29fb7b63e3619507E+0x31/0x250 [fork]
[ 7368.493541] ? kmem_cache_alloc+0xa2/0x1b0
[ 7368.493546] ? mempool_alloc_slab+0x15/0x20
[ 7368.493551] ? wait_woken+0x80/0x80
[ 7368.493556] ? mempool_alloc_slab+0x15/0x20
[ 7368.493560] ? mempool_alloc+0x71/0x190
[ 7368.493564] ? mempool_alloc_slab+0x15/0x20
[ 7368.493569] ? mempool_alloc+0x71/0x190
[ 7368.493574] ? blk_rq_map_sg+0x13e/0x540
[ 7368.493785] ? _ZN4core3fmt9Formatter12pad_integral17hf8e301a155813e6cE+0x106/0x450 [fork]
[ 7368.494037] ? ZN79$LT$linux_kernel_module..printk..LogLineWriter$u20$as$u20$core..fmt..Write$GT$9write_str17h0f01d8afb6abda8bE+0x96/0x150 [fork]
[ 7368.494044] ? sched_clock+0x9/0x10
[ 7368.494048] ? sched_clock+0x9/0x10
[ 7368.494052] ? up+0x32/0x50
[ 7368.494058] ? irq_work_queue+0x99/0xa0
[ 7368.494062] ? console_unlock+0x2e5/0x4e0
[ 7368.494066] ? vprintk_emit+0x333/0x3a0
[ 7368.494255] ? _ZN4core3ptr61drop_in_place$LT$core..option..Option$LT$fork..Module$GT$$GT$17h8a82d34b02fe66e4E+0x170/0x170 [fork]
[ 7368.494269] ? vprintk_default+0x29/0x50
[ 7368.494273] ? vprintk_func+0x27/0x60
[ 7368.494277] ? printk+0x52/0x6e
[ 7368.494472] init_module+0x203/0x410 [fork]
[ 7368.494684] ? ZN4core3fmt3num3imp52$LT$impl$u20$core..fmt..Display$u20$for$u20$i64$GT$3fmt17h9e3ac72c4fc3d8eaE+0x30/0x30 [fork]
[ 7368.494691] ? __vunmap+0x71/0xb0
[ 7368.494891] ? _ZN4core3ptr61drop_in_place$LT$core..option..Option$LT$fork..Module$GT$$GT$17h8a82d34b02fe66e4E+0x170/0x170 [fork]
[ 7368.494898] do_one_initcall+0x52/0x19f
[ 7368.494903] ? __vunmap+0x81/0xb0
[ 7368.494908] ? _cond_resched+0x19/0x40
[ 7368.494913] ? kmem_cache_alloc_trace+0xa6/0x1b0
[ 7368.494918] ? do_init_module+0x27/0x209
[ 7368.494922] do_init_module+0x5f/0x209
[ 7368.494927] load_module+0x191e/0x1f10
[ 7368.494932] ? ima_post_read_file+0x96/0xa0
[ 7368.494938] SYSC_finit_module+0xfc/0x120
[ 7368.494942] ? SYSC_finit_module+0xfc/0x120
[ 7368.494948] SyS_finit_module+0xe/0x10
[ 7368.494952] do_syscall_64+0x73/0x130
[ 7368.494957] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 7368.494961] RIP: 0033:0x7fe22516d539
[ 7368.494964] RSP: 002b:00007ffe5d031a18 EFLAGS: 00000206 ORIG_RAX: 0000000000000139
[ 7368.494968] RAX: ffffffffffffffda RBX: 0000556703c5d7a0 RCX: 00007fe22516d539
[ 7368.494970] RDX: 0000000000000000 RSI: 0000556703c5d260 RDI: 0000000000000003
[ 7368.494973] RBP: 0000556703c5d260 R08: 0000000000000000 R09: 0000000000000000
[ 7368.494975] R10: 0000000000000003 R11: 0000000000000206 R12: 0000000000000000
[ 7368.494978] R13: 0000556703c5fe50 R14: 0000000000000000 R15: 0000556703c5d260
[ 7368.494981] Code: 01 00 e8 7b a1 0f d0 55 48 89 e5 48 c7 c6 1f 68 95 c0 48 c7 c2 58 c7 99 c0 e8 24 5e 7d cf 5d c3 00 00 e8 5b a1 0f d0 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 eb fe 00 00 00 00 00 00 00 00 00 00 00 00
[ 7368.495274] RIP: bug_helper+0x9/0x20 [fork] RSP: ffffab8002d56720
[ 7368.495312] ---[ end trace e2c3c428894af0f9 ]---
[ 7571.210530] mlx5_core 0000:01:00.0 enp1s0f0: Link down
[ 7577.752104] mlx5_core 0000:01:00.0 enp1s0f0: Link up


下面是更换IB网络的相关信息,我使用了mstconfig来执行,但是发现没有LINK_PORT选项。
(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ lspci -v | grep Mellanox
01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
      Subsystem: Mellanox Technologies Device 0008
01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
      Subsystem: Mellanox Technologies Device 0008
(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ sudo mstconfig -d 01:00.0 q

Device #1:

Device type: ConnectX5
Name: MCX516A-CDA_Ax_Bx
Description: ConnectX-5 Ex EN network interface card; 100GbE dual-port QSFP28; PCIe4.0 x16; tall bracket; ROHS R6
Device: 01:00.0

Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
HOST_CHAINING_MODE DISABLED(0)
HOST_CHAINING_DESCRIPTORS Array[0..7]
HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7]
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7]
ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7]
PF_BAR2_SIZE 0
NON_PREFETCHABLE_PF_BAR False(0)
VF_VPD_ENABLE False(0)
STRICT_VF_MSIX_NUM False(0)
VF_NODNIC_ENABLE False(0)
NUM_OF_VFS 0
PF_BAR2_ENABLE False(0)
SRIOV_EN False(0)
PF_LOG_BAR_SIZE 5
VF_LOG_BAR_SIZE 1
NUM_PF_MSIX 63
NUM_VF_MSIX 11
INT_LOG_MAX_PAYLOAD_SIZE AUTOMATIC(0)
PARTIAL_RESET_EN False(0)
SW_RECOVERY_ON_ERRORS False(0)
RESET_WITH_HOST_ON_ERRORS False(0)
ADVANCED_POWER_SETTINGS False(0)
CQE_COMPRESSION BALANCED(0)
IP_OVER_VXLAN_EN False(0)
MKEY_BY_NAME False(0)
ESWITCH_IPV4_TTL_MODIFY_ENABLE False(0)
PRIO_TAG_REQUIRED_EN False(0)
UCTX_EN True(1)
PCI_ATOMIC_MODE PCI_ATOMIC_DISABLED_EXT_ATOMIC_ENABLED(0)
TUNNEL_ECN_COPY_DISABLE False(0)
LRO_LOG_TIMEOUT0 6
LRO_LOG_TIMEOUT1 7
LRO_LOG_TIMEOUT2 8
LRO_LOG_TIMEOUT3 13
LOG_DCR_HASH_TABLE_SIZE 11
DCR_LIFO_SIZE 16384
ROCE_CC_PRIO_MASK_P1 255
ROCE_CC_ALGORITHM_P1 ECN(0)
ROCE_CC_PRIO_MASK_P2 255
ROCE_CC_ALGORITHM_P2 ECN(0)
CLAMP_TGT_RATE_AFTER_TIME_INC_P1 True(1)
CLAMP_TGT_RATE_P1 False(0)
RPG_TIME_RESET_P1 300
RPG_BYTE_RESET_P1 32767
RPG_THRESHOLD_P1 1
RPG_MAX_RATE_P1 0
RPG_AI_RATE_P1 5
RPG_HAI_RATE_P1 50
RPG_GD_P1 11
RPG_MIN_DEC_FAC_P1 50
RPG_MIN_RATE_P1 1
RATE_TO_SET_ON_FIRST_CNP_P1 0
DCE_TCP_G_P1 1019
DCE_TCP_RTT_P1 1
RATE_REDUCE_MONITOR_PERIOD_P1 4
INITIAL_ALPHA_VALUE_P1 1023
MIN_TIME_BETWEEN_CNPS_P1 4
CNP_802P_PRIO_P1 6
CNP_DSCP_P1 48
CLAMP_TGT_RATE_AFTER_TIME_INC_P2 True(1)
CLAMP_TGT_RATE_P2 False(0)
RPG_TIME_RESET_P2 300
RPG_BYTE_RESET_P2 32767
RPG_THRESHOLD_P2 1
RPG_MAX_RATE_P2 0
RPG_AI_RATE_P2 5
RPG_HAI_RATE_P2 50
RPG_GD_P2 11
RPG_MIN_DEC_FAC_P2 50
RPG_MIN_RATE_P2 1
RATE_TO_SET_ON_FIRST_CNP_P2 0
DCE_TCP_G_P2 1019
DCE_TCP_RTT_P2 1
RATE_REDUCE_MONITOR_PERIOD_P2 4
INITIAL_ALPHA_VALUE_P2 1023
MIN_TIME_BETWEEN_CNPS_P2 4
CNP_802P_PRIO_P2 6
CNP_DSCP_P2 48
LLDP_NB_DCBX_P1 False(0)
LLDP_NB_RX_MODE_P1 OFF(0)
LLDP_NB_TX_MODE_P1 OFF(0)
LLDP_NB_DCBX_P2 False(0)
LLDP_NB_RX_MODE_P2 OFF(0)
LLDP_NB_TX_MODE_P2 OFF(0)
DCBX_IEEE_P1 True(1)
DCBX_CEE_P1 True(1)
DCBX_WILLING_P1 True(1)
DCBX_IEEE_P2 True(1)
DCBX_CEE_P2 True(1)
DCBX_WILLING_P2 True(1)
KEEP_ETH_LINK_UP_P1 True(1)
KEEP_IB_LINK_UP_P1 True(1)
KEEP_LINK_UP_ON_BOOT_P1 False(0)
KEEP_LINK_UP_ON_STANDBY_P1 False(0)
DO_NOT_CLEAR_PORT_STATS_P1 False(0)
KEEP_ETH_LINK_UP_P2 True(1)
KEEP_IB_LINK_UP_P2 False(0)
KEEP_LINK_UP_ON_BOOT_P2 False(0)
KEEP_LINK_UP_ON_STANDBY_P2 False(0)
DO_NOT_CLEAR_PORT_STATS_P2 False(0)
NUM_OF_VL_P1 _4_VLs(3)
NUM_OF_TC_P1 _8_TCs(0)
NUM_OF_PFC_P1 8
NUM_OF_VL_P2 _4_VLs(3)
NUM_OF_TC_P2 _8_TCs(0)
NUM_OF_PFC_P2 8
DUP_MAC_ACTION_P1 LAST_CFG(0)
SRIOV_IB_ROUTING_MODE_P1 LID(1)
IB_ROUTING_MODE_P1 LID(1)
DUP_MAC_ACTION_P2 LAST_CFG(0)
SRIOV_IB_ROUTING_MODE_P2 LID(1)
IB_ROUTING_MODE_P2 LID(1)
PCI_WR_ORDERING per_mkey(0)
MULTI_PORT_VHCA_EN False(0)
PORT_OWNER True(1)
ALLOW_RD_COUNTERS True(1)
RENEG_ON_CHANGE True(1)
TRACER_ENABLE True(1)
IP_VER IPv4(0)
BOOT_UNDI_NETWORK_WAIT 0
UEFI_HII_EN True(1)
BOOT_DBG_LOG False(0)
UEFI_LOGS DISABLED(0)
BOOT_VLAN 1
LEGACY_BOOT_PROTOCOL PXE(1)
BOOT_RETRY_CNT NONE(0)
BOOT_INTERRUPT_DIS False(0)
BOOT_LACP_DIS True(1)
BOOT_VLAN_EN False(0)
BOOT_PKEY 0
ATS_ENABLED False(0)
DYNAMIC_VF_MSIX_TABLE False(0)
EXP_ROM_UEFI_ARM_ENABLE False(0)
EXP_ROM_UEFI_x86_ENABLE False(0)
EXP_ROM_PXE_ENABLE True(1)
ADVANCED_PCI_SETTINGS False(0)
SAFE_MODE_THRESHOLD 10
SAFE_MODE_ENABLE True(1)
(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ sudo mstconfig -d 01:00.0 q | grep LINK_PORT
并没有任何打印结果


接下来,我使用以太网进行传输测试,为网卡手动添加IP地址,测试是可以进行联通的。
(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ ibstatus
Infiniband device 'mlx5_0' port 1 status:
      default gid:       fe80:0000:0000:0000:1270:fdff:fe39:0e7a
      base lid:    0x0
      sm lid:            0x0
      state:             4: ACTIVE
      phys state:  5: LinkUp
      rate:        100 Gb/sec (4X EDR)
      link_layer:  Ethernet

Infiniband device 'mlx5_1' port 1 status:
      default gid:       fe80:0000:0000:0000:1270:fdff:fe39:0e7b
      base lid:    0x0
      sm lid:            0x0
      state:             1: DOWN
      phys state:  3: Disabled
      rate:        40 Gb/sec (4X QDR)
      link_layer:  Ethernet

(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ show_gids
DEV   PORT  INDEX GID                           IPv4             VER   DEV
---   ----  ----- ---                           ------------     ---   ---
mlx5_0      1     0     fe80:0000:0000:0000:1270:fdff:fe39:0e7a               v1    enp1s0f0
mlx5_0      1     1     fe80:0000:0000:0000:1270:fdff:fe39:0e7a               v2    enp1s0f0
mlx5_0      1     2     0000:0000:0000:0000:0000:ffff:c0a8:0101   192.168.1.1      v1    enp1s0f0
mlx5_0      1     3     0000:0000:0000:0000:0000:ffff:c0a8:0101   192.168.1.1      v2    enp1s0f0
mlx5_1      1     0     fe80:0000:0000:0000:1270:fdff:fe39:0e7b               v1    enp1s0f1
mlx5_1      1     1     fe80:0000:0000:0000:1270:fdff:fe39:0e7b               v2    enp1s0f1
n_gids_found=6
(base) ll@ll-System-Product-Name:~ /mitosis-core/exp$ ib_send_bw -d mlx5_0


  • Waiting for client to connect... *


                Send BW Test

Dual-port : OFF        Device : mlx5_0
Number of qps : 1          Transport type : IB
Connection type : RC         Using SRQ : OFF
RX depth : 512
CQ Moderation : 1
Mtu : 1024[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs       : OFF
Data ex. method : Ethernet

local address: LID 0000 QPN 0x0047 PSN 0xd80157
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01
remote address: LID 0000 QPN 0x0051 PSN 0x9b7ca3
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:02

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 3477.746000 != 3397.849000. CPU Frequency is not max.
65536 1000 0.00 6645.12            0.106322

(base) crow@crow-H310M-T-PRO:~ /mitosis-core/exp$ ib_send_bw -d mlx5_0 192.168.1.1

                Send BW Test

Dual-port : OFF        Device : mlx5_0
Number of qps : 1          Transport type : IB
Connection type : RC         Using SRQ : OFF
TX depth : 128
CQ Moderation : 1
Mtu : 1024[B]
Link type : Ethernet
GID index : 3
Max inline data : 0[B]
rdma_cm QPs       : OFF
Data ex. method : Ethernet

local address: LID 0000 QPN 0x0051 PSN 0x9b7ca3
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:02
remote address: LID 0000 QPN 0x0047 PSN 0xd80157
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:01:01

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
65536 1000 10751.53 6175.40           0.098806

最后,我发现明明make insmod出现了错误,但是fork模块还是可以被列出。希望这个信息可以辅助您进行思考。文件/dev/mitosis-syscalls没有被建立,我始终觉得这是一个比较关键的错误。
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ make insmod
sudo rmmod fork ; sudo insmod mitosis-kms/fork.ko mac_id=0
[sudo] password for crow:
rmmod: ERROR: Module fork is not currently loaded
Segmentation fault (core dumped)
makefile:15: recipe for target 'insmod' failed
make: *** [insmod] Error 139
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ lsmod
Module Size Used by
fork 3235840 1
rdma_ucm 28672 0
ib_ucm 20480 0
rdma_cm 57344 1 rdma_ucm
iw_cm 45056 1 rdma_cm
ib_ipoib 176128 0
ib_cm 53248 4 rdma_cm,ib_ipoib,fork,ib_ucm
ib_umad 24576 0
mlx5_ib 393216 0
ib_uverbs 131072 3 rdma_ucm,mlx5_ib,ib_ucm
mlx4_ib 221184 0
ib_core 323584 11 rdma_cm,ib_ipoib,mlx4_ib,fork,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,ib_ucm
mlx4_en 139264 0
mlx4_core 335872 2 mlx4_ib,mlx4_en
binfmt_misc 20480 1
nls_iso8859_1 16384 1
kvm_intel 212992 0
kvm 598016 1 kvm_intel
irqbypass 16384 1 kvm
crct10dif_pclmul 16384 0
crc32_pclmul 16384 0
ghash_clmulni_intel 16384 0
pcbc 16384 0
aesni_intel 188416 0
aes_x86_64 20480 1 aesni_intel
crypto_simd 16384 1 aesni_intel
glue_helper 16384 1 aesni_intel
cryptd 24576 3 crypto_simd,ghash_clmulni_intel,aesni_intel
input_leds 16384 0
joydev 24576 0
video 45056 0
acpi_pad 180224 0
knem 36864 0
parport_pc 36864 0
ppdev 20480 0
lp 20480 0
parport 49152 3 parport_pc,lp,ppdev
autofs4 40960 2
hid_generic 16384 0
usbhid 49152 0
hid 118784 2 usbhid,hid_generic
mlx5_core 1040384 1 mlx5_ib
r8101 196608 0
ahci 40960 3
mlx_compat 40960 14 rdma_cm,ib_ipoib,mlx4_core,mlx4_ib,iw_cm,ib_umad,mlx4_en,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core,ib_ucm
libahci 32768 1 ahci
mlxfw 20480 1 mlx5_core
devlink 45056 4 mlx4_core,mlx4_ib,mlx4_en,mlx5_core
ptp 20480 2 mlx4_en,mlx5_core
pps_core 20480 1 ptp
(base) crow@crow-H310M-T-PRO:~ /mitosis-core$ file /dev/mitosis-syscalls
/dev/mitosis-syscalls: cannot open `/dev/mitosis-syscalls' (No such file or directory)

@wxdwfc
Copy link
Collaborator

wxdwfc commented Jul 12, 2024

您好,我看了下dmesg,是[ 589.589922] buf info: panicked at 'should not fail: Creation(-22)', /home/ll/mitosis-core/mitosis/src/rdma_context.rs:50:66 中报的错,原因是DCT创建失败。我不大清楚你的网卡是否支持DCT,可以check下。

如果不需要DCT这个特性,可以在kbuild里面用一下use_rc的选项:如使用

https://github.com/ProjectMitosisOS/mitosis-core/blob/main/mitosis-kms/Kbuild-mitosis-use-rc

这个kbuild试试(具体怎么用请参考下README)。

如果还不行的话,只能尝试换下IB的卡了(我看了下你的卡不支持IB),这个应该最方便。

ps:如果出现kernel panic的话,我建议重新启动下机器,不然会出现undefined behavior。

@ShuguiW
Copy link
Author

ShuguiW commented Jul 15, 2024

好的,非常感谢您的帮助。应该确实是我硬件的问题,前段时间多有打扰了。再次致谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants