Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Solved] Symbol issue within kernel module #214

Closed
marco44 opened this issue Dec 1, 2020 · 22 comments
Closed

[Solved] Symbol issue within kernel module #214

marco44 opened this issue Dec 1, 2020 · 22 comments

Comments

@marco44
Copy link

marco44 commented Dec 1, 2020

perf tries to read /proc/kallsyms. And this occurs when the corefreq module is loaded:

[root@marco ~]# cat /proc/kallsyms > /dev/null
[root@marco ~]# modprobe corefreqk 
[root@marco ~]# cat /proc/kallsyms > /dev/null
Killed

an extract from dmesg:

[60695.322004] BUG: unable to handle page fault for address: fffffffff5dbb850
[60695.322007] #PF: supervisor read access in kernel mode
[60695.322008] #PF: error_code(0x0000) - not-present page
[60695.322008] PGD 575613067 P4D 575613067 PUD 575615067 PMD 0 
[60695.322011] Oops: 0000 [#21] PREEMPT SMP NOPTI

Do I need to provide something else ?

@cyring
Copy link
Owner

cyring commented Dec 1, 2020

CoreFreq was not designed with perf compatibility. It is explained in project Wiki

However with Intel processors I have been able to make them run simultaneously, if no PMC counters programming conflicts happen.
With Ryzen, perf top just crashes so far with the latest Arch built kernel.

What's your processor ?

For security purpose corefreqk.ko also implements read-only and read-write pages, it might not please perf.

@marco44
Copy link
Author

marco44 commented Dec 2, 2020

That's a Ryzen, 3900X

But that's what I've been looking into, perf top crashes, and it seems it crashed because of the previous message, from a strace. perf tries to read kallsyms, and gets killed because of the page fault

@cyring
Copy link
Owner

cyring commented Dec 3, 2020

Arch Linux: after a full system rolling upgrade with a 5.9.11 kernel built and no command line boot options.
CoreFreq is not running, not even started

# cat /proc/kallsyms > /dev/null

# dmesg -t | tail 
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 2c on CPU 26.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 3c on CPU 26.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 3c on CPU 25.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue

# perf top
Samples: 4K of event 'cycles', 4000 Hz, Event count (approx.): 1273347411 lost: 
Overhead  Shared Object            Symbol                                       
   2.79%  liblzma.so.5.2.5         [.] lzma_crc64
   1.06%  [kernel]                 [k] __x86_indirect_thunk_rax
   0.91%  perf                     [.] __symbols__insert
   0.59%  [kernel]                 [k] clear_page_rep
   0.54%  [kernel]                 [k] psi_group_change
   0.49%  libc-2.32.so             [.] __memmove_avx_unaligned_erms
   0.45%  perf                     [.] dso__load_sym
   0.45%  [kernel]                 [k] copy_user_generic_string
   0.42%  [vdso]                   [.] 0x0000000000000675
   0.42%  [kernel]                 [k] check_preemption_disabled
...

EDIT: now rebooting for CoreFreq...

# perf top
Samples: 224K of event 'cycles', 4000 Hz, Event count (approx.): 17098041160 los
Overhead  Shared Object                  Symbol                                 
   4.04%  [kernel]                       [k] copy_user_generic_string
   3.35%  nvidia_drv.so                  [.] 0x00000000000d02f1
   2.99%  [JIT] tid 1837                 [.] 0x00007fcff567c3f9
   2.88%  [kernel]                       [k] read_hpet
...
  • Tracking NMI count
    CoreFreq_NMI
  • Issue
# cat /proc/kallsyms
...
ffffffff83400000 D __init_scratch_begin
ffffffff83800000 D __init_scratch_end
ffffffffc03f1024 r _note_7      [corefreqk]
ffffffffc0406c98 b KPublic      [corefreqk]
ffffffffc03f4c00 r CSWTCH.4038  [corefreqk]
ffffffffc03f4bc0 r CSWTCH.4039  [corefreqk]
ffffffffc03f4b80 r CSWTCH.4044  [corefreqk]
ffffffffc03f4b40 r CSWTCH.4045  [corefreqk]
ffffffffc03f4b00 r CSWTCH.4050  [corefreqk]
ffffffffc03f4ac0 r CSWTCH.4051  [corefreqk]
ffffffffc03f4a80 r CSWTCH.4056  [corefreqk]
ffffffffc03f4a40 r CSWTCH.4057  [corefreqk]
ffffffffc03ce410 t Intel_Turbo_Cfg8C_PerCore    [corefreqk]
ffffffffc03ce5c0 t Intel_Turbo_Cfg15C_PerCore   [corefreqk]
ffffffffc03ce730 t Intel_Turbo_Cfg16C_PerCore   [corefreqk]
ffffffffc03ce8e0 t Intel_Turbo_Cfg18C_PerCore   [corefreqk]
ffffffffc03ce990 t Intel_Turbo_Cfg_SKL_X_PerCore        [corefreqk]
ffffffffc03cec30 t SNB_EP_HB    [corefreqk]
ffffffffc03cec40 t AMD_17h_ZenIF        [corefreqk]
ffffffffc0406c90 b KPrivate     [corefreqk]
ffffffffc03cec60 t Set_Core2_Target     [corefreqk]
ffffffffc03cec70 t Set_Nehalem_Target   [corefreqk]
ffffffffc03cec90 t Set_SandyBridge_Target       [corefreqk]
ffffffffc03ceca0 t Get_Core2_Target     [corefreqk]
ffffffffc03cecb0 t Get_Nehalem_Target   [corefreqk]
ffffffffc03cecc0 t Get_SandyBridge_Target       [corefreqk]
ffffffffc03cecd0 t Cmp_Core2_Target     [corefreqk]
ffffffffc03cecf0 t Cmp_Nehalem_Target   [corefreqk]
ffffffffc03ced10 t Cmp_SandyBridge_Target       [corefreqk]
ffffffffc03ced30 t Start_Uncore_Nehalem [corefreqk]
ffffffffc03cedd0 t Stop_Uncore_Nehalem  [corefreqk]
ffffffffc03cee20 t Start_Uncore_SandyBridge     [corefreqk]
ffffffffc03ceec0 t Stop_Uncore_SandyBridge      [corefreqk]
ffffffffc03cef10 t Start_Uncore_SandyBridge_EP  [corefreqk]
ffffffffc03cef70 t Stop_Uncore_SandyBridge_EP   [corefreqk]
ffffffffc03cefc0 t Start_Uncore_Haswell_ULT     [corefreqk]
ffffffffc03cf070 t Stop_Uncore_Haswell_ULT      [corefreqk]
ffffffffc03cf0c0 t Start_Uncore_Haswell_EP      [corefreqk]
ffffffffc03cf120 t Stop_Uncore_Haswell_EP       [corefreqk]
ffffffffc03cf170 t Start_Uncore_Skylake [corefreqk]
ffffffffc03cf210 t Stop_Uncore_Skylake  [corefreqk]
ffffffffc03cf260 t Start_Uncore_Skylake_X       [corefreqk]
ffffffffc03cf2a0 t CoreFreqK_Idle_State_Withdraw        [corefreqk]
ffffffffc03fe760 d CoreFreqK    [corefreqk]
ffffffffc03cf370 t CoreFreqK_Policy_Exit        [corefreqk]
ffffffffc03cf380 t CoreFreqK_Policy_Init        [corefreqk]
ffffffffc03fed0c d Register_Governor    [corefreqk]
ffffffffc03cf450 t Policy_GetFreq       [corefreqk]
ffffffffc03cf4a0 t CoreFreqK_DevNode    [corefreqk]
ffffffffc03cf4c0 t CoreFreqK_Empty_Func_Level_Down      [corefreqk]
ffffffffc03f0161 t CoreFreqK_ShutDown   [corefreqk]
ffffffffc03f06ad t CoreFreqK_Alloc_Features_Level_Down  [corefreqk]
ffffffffc03d3790 t CoreFreqK_Alloc_Device_Level_Down    [corefreqk]
ffffffffc03d3770 t CoreFreqK_Make_Device_Level_Down     [corefreqk]
ffffffffc03d3740 t CoreFreqK_Create_Device_Level_Down   [corefreqk]
Killed
  • Kernel log
# dmesg -t
BUG: unable to handle page fault for address: 00000000920188f8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#5] PREEMPT SMP NOPTI
CPU: 25 PID: 3010 Comm: cat Tainted: P      D    OE     5.9.11-arch2-1 #1
Hardware name: System manufacturer System Product Name/ROG CROSSHAIR VIII HERO (WI-FI), BIOS 2206 08/13/2020
RIP: 0010:strlen+0x0/0x20
...
Call Trace:
 strlcpy+0xf/0x40
 module_get_kallsym+0xd1/0x1b0
 update_iter+0x171/0x280
 s_next+0x1d/0x30
 seq_read+0x2c1/0x460
 proc_reg_read+0x51/0x90
 vfs_read+0x9c/0x180
 ksys_read+0x67/0xe0
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
  • Unloading CoreFreq
rmmod corefreqk
# cat /proc/kallsyms > /dev/null

@cyring
Copy link
Owner

cyring commented Dec 3, 2020

This could happen in those two strlcpy
https://elixir.bootlin.com/linux/latest/source/kernel/module.c#L4237
Perhaps a KSYM_NAME_LEN of 128 bytes limitation
I'm trying to sandbox a simple module with the same number of corefreqk.c parameters but can't reproduce the kallsyms issue...

@marco44
Copy link
Author

marco44 commented Dec 3, 2020

Can I provide anything ?

@cyring
Copy link
Owner

cyring commented Dec 3, 2020

Can I provide anything ?

It looks like CoreFreq driver has some much symbols that kallsyms is running out of resources.
That's why I would say it's not a perf issue but a consequence because perf is loading all symbols.

I have searched into the kernel documentation for such limitations. Nothing found. Although I'm aware that with time corefreqk.c became so huge in lines of code for a driver that it may induce too many symbols.

To track the issue I don't see another immediate solution than tracing the kernel on these parts:

			strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);
			strlcpy(module_name, mod->name, MODULE_NAME_LEN);
  • printk the return code and the variables involved into strlcpy when error happens
  • build and boot this kernel
  • tracks if some symbol variables are null or out of buffer; which could give a hint on limitation.

Could you help with this or another debug way ?

@marco44
Copy link
Author

marco44 commented Dec 4, 2020

Yeah sure. Can you give me a patch or just an example printk so I do exactly what you want ?

@cyring
Copy link
Owner

cyring commented Dec 4, 2020

Arch Build System

# pacman -Sy
$ asp update && git pull
$ makepkg -osr
int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
                        char *name, char *module_name, int *exported)
{
        struct module *mod;

        preempt_disable();
        list_for_each_entry_rcu(mod, &modules, list) {
                struct mod_kallsyms *kallsyms;

                if (mod->state == MODULE_STATE_UNFORMED)
                        continue;
                kallsyms = rcu_dereference_sched(mod->kallsyms);
                if (symnum < kallsyms->num_symtab) {
                        const Elf_Sym *sym = &kallsyms->symtab[symnum];
                        size_t len = 0;

                        *value = kallsyms_symbol_value(sym);
                        *type = kallsyms->typetab[symnum];
                        len = strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);
                        if (len == 0) {
                                printk("strlcpy(name[%s])>%ld\n",name,len);
                        }
                        len = strlcpy(module_name, mod->name, MODULE_NAME_LEN);
                        if (len == 0) {
                                printk("strlcpy(module_name[%s], mod->name[%s])>%ld\n",
                                        module_name, mod->name, len);
                        }
                        *exported = is_exported(name, *value, mod);
                        preempt_enable();
                        return 0;
                }
                symnum -= kallsyms->num_symtab;
        }
        preempt_enable();
        return -ERANGE;
}
$ makepkg -eisrf

EDIT: (re)building the kernel in progress, I will tell you the results ...

@cyring
Copy link
Owner

cyring commented Dec 4, 2020

# cat /proc/kallsyms
Killed
strlcpy(name[])>0
BUG: unable to handle page fault for address: 00000000b0160940
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page

@cyring
Copy link
Owner

cyring commented Dec 4, 2020

OK, now changing to print all names going through this function ...

int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
                        char *name, char *module_name, int *exported)
{
        struct module *mod;

        preempt_disable();
        list_for_each_entry_rcu(mod, &modules, list) {
                struct mod_kallsyms *kallsyms;

                if (mod->state == MODULE_STATE_UNFORMED)
                        continue;
                kallsyms = rcu_dereference_sched(mod->kallsyms);
                if (symnum < kallsyms->num_symtab) {
                        const Elf_Sym *sym = &kallsyms->symtab[symnum];
                        size_t len = 0;

                        *value = kallsyms_symbol_value(sym);
                        *type = kallsyms->typetab[symnum];
                        len = strlcpy(name, kallsyms_symbol_name(kallsyms, symnum), KSYM_NAME_LEN);

                        printk("strlcpy(name[%s])>%ld\n",name,len);

                        len = strlcpy(module_name, mod->name, MODULE_NAME_LEN);
                        if (len == 0) {
                                printk("strlcpy(module_name[%s], mod->name[%s])>%ld\n",
                                        module_name, mod->name, len);
                        }
                        *exported = is_exported(name, *value, mod);
                        preempt_enable();
                        return 0;
                }
                symnum -= kallsyms->num_symtab;
        }
        preempt_enable();
        return -ERANGE;
}

@cyring
Copy link
Owner

cyring commented Dec 4, 2020

  • All traces
strlcpy(name[])>0
strlcpy(name[_note_7])>7
strlcpy(name[KPublic])>7
strlcpy(name[CSWTCH.4036])>11
strlcpy(name[CSWTCH.4037])>11
strlcpy(name[CSWTCH.4042])>11
strlcpy(name[CSWTCH.4043])>11
strlcpy(name[CSWTCH.4048])>11
strlcpy(name[CSWTCH.4049])>11
strlcpy(name[CSWTCH.4054])>11
strlcpy(name[CSWTCH.4055])>11
strlcpy(name[Intel_Turbo_Cfg8C_PerCore])>25
strlcpy(name[Intel_Turbo_Cfg15C_PerCore])>26
strlcpy(name[Intel_Turbo_Cfg16C_PerCore])>26
strlcpy(name[Intel_Turbo_Cfg18C_PerCore])>26
strlcpy(name[Intel_Turbo_Cfg_SKL_X_PerCore])>29
strlcpy(name[SNB_EP_HB])>9
strlcpy(name[AMD_17h_ZenIF])>13
strlcpy(name[KPrivate])>8
strlcpy(name[Set_Core2_Target])>16
strlcpy(name[Set_Nehalem_Target])>18
strlcpy(name[Set_SandyBridge_Target])>22
strlcpy(name[Get_Core2_Target])>16
strlcpy(name[Get_Nehalem_Target])>18
strlcpy(name[Get_SandyBridge_Target])>22
strlcpy(name[Cmp_Core2_Target])>16
strlcpy(name[Cmp_Nehalem_Target])>18
strlcpy(name[Cmp_SandyBridge_Target])>22
strlcpy(name[Start_Uncore_Nehalem])>20
strlcpy(name[Stop_Uncore_Nehalem])>19
strlcpy(name[Start_Uncore_SandyBridge])>24
strlcpy(name[Stop_Uncore_SandyBridge])>23
strlcpy(name[Start_Uncore_SandyBridge_EP])>27
strlcpy(name[Stop_Uncore_SandyBridge_EP])>26
strlcpy(name[Start_Uncore_Haswell_ULT])>24
strlcpy(name[Stop_Uncore_Haswell_ULT])>23
strlcpy(name[Start_Uncore_Haswell_EP])>23
strlcpy(name[Stop_Uncore_Haswell_EP])>22
strlcpy(name[Start_Uncore_Skylake])>20
strlcpy(name[Stop_Uncore_Skylake])>19
strlcpy(name[Start_Uncore_Skylake_X])>22
strlcpy(name[CoreFreqK_Idle_State_Withdraw])>29
strlcpy(name[CoreFreqK])>9
strlcpy(name[CoreFreqK_Policy_Exit])>21
strlcpy(name[CoreFreqK_Policy_Init])>21
strlcpy(name[Register_Governor])>17
strlcpy(name[Policy_GetFreq])>14
strlcpy(name[CoreFreqK_DevNode])>17
strlcpy(name[CoreFreqK_Empty_Func_Level_Down])>31
strlcpy(name[CoreFreqK_ShutDown])>18
strlcpy(name[CoreFreqK_Alloc_Features_Level_Down])>35
strlcpy(name[CoreFreqK_Alloc_Device_Level_Down])>33
strlcpy(name[CoreFreqK_Make_Device_Level_Down])>32
strlcpy(name[CoreFreqK_Create_Device_Level_Down])>34
strlcpy(name[CoreFreqK_Alloc_Public_Level_Down])>33
strlcpy(name[CoreFreqK_Alloc_Public_Level_Down])>33
strlcpy(name[CoreFreqK_Alloc_Private_Level_Down])>34
strlcpy(name[CoreFreqK_Alloc_Processor_RO_Level_Down])>39
strlcpy(name[CoreFreqK_Alloc_Processor_RW_Level_Down])>39
strlcpy(name[CoreFreqK_Alloc_Public_Cache_Level_Down])>39
strlcpy(name[CoreFreqK_Alloc_Private_Cache_Level_Down])>40
strlcpy(name[CoreFreqK_Alloc_Per_CPU_Level_Down])>34
strlcpy(name[CoreFreqK_Ignition_Level_Down])>29
strlcpy(name[RunLevel])>8
strlcpy(name[ClockMod_HWP_PerCore])>20
strlcpy(name[CoreFreqK_NMI_Handler])>21
BUG: unable to handle page fault for address: 00000000a4da4940
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page

whereas

cat /proc/kallsyms
...
ffffffffc0441024 r _note_7      [corefreqk]
ffffffffc0456c98 b KPublic      [corefreqk]
ffffffffc0444c00 r CSWTCH.4036  [corefreqk]
ffffffffc0444bc0 r CSWTCH.4037  [corefreqk]
ffffffffc0444b80 r CSWTCH.4042  [corefreqk]
ffffffffc0444b40 r CSWTCH.4043  [corefreqk]
ffffffffc0444b00 r CSWTCH.4048  [corefreqk]
ffffffffc0444ac0 r CSWTCH.4049  [corefreqk]
ffffffffc0444a80 r CSWTCH.4054  [corefreqk]
ffffffffc0444a40 r CSWTCH.4055  [corefreqk]
ffffffffc041e410 t Intel_Turbo_Cfg8C_PerCore    [corefreqk]
ffffffffc041e5c0 t Intel_Turbo_Cfg15C_PerCore   [corefreqk]
ffffffffc041e730 t Intel_Turbo_Cfg16C_PerCore   [corefreqk]
ffffffffc041e8e0 t Intel_Turbo_Cfg18C_PerCore   [corefreqk]
ffffffffc041e990 t Intel_Turbo_Cfg_SKL_X_PerCore        [corefreqk]
ffffffffc041ec30 t SNB_EP_HB    [corefreqk]
ffffffffc041ec40 t AMD_17h_ZenIF        [corefreqk]
ffffffffc0456c90 b KPrivate     [corefreqk]
ffffffffc041ec60 t Set_Core2_Target     [corefreqk]
ffffffffc041ec70 t Set_Nehalem_Target   [corefreqk]
ffffffffc041ec90 t Set_SandyBridge_Target       [corefreqk]
ffffffffc041eca0 t Get_Core2_Target     [corefreqk]
ffffffffc041ecb0 t Get_Nehalem_Target   [corefreqk]
ffffffffc041ecc0 t Get_SandyBridge_Target       [corefreqk]
ffffffffc041ecd0 t Cmp_Core2_Target     [corefreqk]
ffffffffc041ecf0 t Cmp_Nehalem_Target   [corefreqk]
ffffffffc041ed10 t Cmp_SandyBridge_Target       [corefreqk]
ffffffffc041ed30 t Start_Uncore_Nehalem [corefreqk]
ffffffffc041edd0 t Stop_Uncore_Nehalem  [corefreqk]
ffffffffc041ee20 t Start_Uncore_SandyBridge     [corefreqk]
ffffffffc041eec0 t Stop_Uncore_SandyBridge      [corefreqk]
ffffffffc041ef10 t Start_Uncore_SandyBridge_EP  [corefreqk]
ffffffffc041ef70 t Stop_Uncore_SandyBridge_EP   [corefreqk]
ffffffffc041efc0 t Start_Uncore_Haswell_ULT     [corefreqk]
ffffffffc041f070 t Stop_Uncore_Haswell_ULT      [corefreqk]
ffffffffc041f0c0 t Start_Uncore_Haswell_EP      [corefreqk]
ffffffffc041f120 t Stop_Uncore_Haswell_EP       [corefreqk]
ffffffffc041f170 t Start_Uncore_Skylake [corefreqk]
ffffffffc041f210 t Stop_Uncore_Skylake  [corefreqk]
ffffffffc041f260 t Start_Uncore_Skylake_X       [corefreqk]
ffffffffc041f2a0 t CoreFreqK_Idle_State_Withdraw        [corefreqk]
ffffffffc044e760 d CoreFreqK    [corefreqk]
ffffffffc041f370 t CoreFreqK_Policy_Exit        [corefreqk]
ffffffffc041f380 t CoreFreqK_Policy_Init        [corefreqk]
ffffffffc044ed0e d Register_Governor    [corefreqk]
ffffffffc041f450 t Policy_GetFreq       [corefreqk]
ffffffffc041f4a0 t CoreFreqK_DevNode    [corefreqk]
ffffffffc041f4c0 t CoreFreqK_Empty_Func_Level_Down      [corefreqk]
ffffffffc0440151 t CoreFreqK_ShutDown   [corefreqk]
ffffffffc044069d t CoreFreqK_Alloc_Features_Level_Down  [corefreqk]
ffffffffc0423780 t CoreFreqK_Alloc_Device_Level_Down    [corefreqk]
ffffffffc0423760 t CoreFreqK_Make_Device_Level_Down     [corefreqk]
ffffffffc0423730 t CoreFreqK_Create_Device_Level_Down   [corefreqk]
Killed

@cyring
Copy link
Owner

cyring commented Dec 4, 2020

I don't find a solution.
I've created another kernel module with many functions, static or not, parameters, callbacks, and can't reproduce this bug in a sandbox.
CoreFreq as kvm or nvidia is a big driver but those latest don't encounter such issue.
I still believe the error comes from my coding. I might do something which is not compliant with the kernel rules but i don't have any clue.
If kallsyms has bug it would appear in the past and we could find some facts about it. But nothing found about this kind of issue.
Any help is welcomed.

@cyring
Copy link
Owner

cyring commented Dec 6, 2020

A fix is available in the develop branch.
Can you please give a try ?

@marco44
Copy link
Author

marco44 commented Dec 7, 2020

Hi,

Problem fixed for me

Thanks a lot

@cyring
Copy link
Owner

cyring commented Dec 7, 2020

Hi,

Problem fixed for me

Thanks a lot

You are welcome.

@cyring cyring closed this as completed Dec 7, 2020
@cyring
Copy link
Owner

cyring commented Jan 22, 2022

When corefreqk.ko is already running:

  • Only with AMD/Zen, running perf top is getting killed!
  • With an Intel Tiger Lake/U perf starts nicely.

@cyring cyring reopened this Jan 22, 2022
@cyring cyring added enhancement and removed bugfix labels Jan 22, 2022
@cyring
Copy link
Owner

cyring commented Jan 22, 2022

When corefreqk.ko is already running:

  • Only with AMD/Zen, running perf top is getting killed!
  • With an Intel Tiger Lake/U perf starts nicely.

This is indeed linked with cat /proc/kallsyms but using the same disk, same build, plugged on Intel/TGL has no issue but plugged on AMD/Matisse has the issue.
I'm reviewing the whole driver code for any mismatch in function prototypes, especially static, but don't find any difference in coding ...

@cyring
Copy link
Owner

cyring commented Jan 24, 2022

nm corefreqk.ko

0000000000000000 t CoreFreqK_Read_CS_From_Invariant_TSC

EDIT: Why ?

  1. Same zero value is passing with other systems.
  2. Most mitigations and randomizations disabled: still an issue with that setup: ASUS ROG VIII (BIOS 3801), 3950X, Arch Linux (UEFI), latest kernel

@cyring cyring changed the title Cannot run perf when corefreq is loaded Null symbol within kernel module Jan 24, 2022
@cyring cyring changed the title Null symbol within kernel module Symbol issue within kernel module Jan 26, 2022
@cyring
Copy link
Owner

cyring commented Jan 26, 2022

What! 5.15.16-xanmod1-1 (EDIT: where amd-pstate is built-in): no kallsyms issue

@cyring
Copy link
Owner

cyring commented May 8, 2022

Mainline Kernel does not want but latest xanmod of 5.15.36 is doing fine with kallsyms plus CoreFreq

Unsolved. Issue is postponed.

@cyring cyring closed this as completed May 8, 2022
@cyring
Copy link
Owner

cyring commented May 10, 2022

Much Ado About Nothing

Solution

no-inline compiler flag is the key answer

  1. Build and install CoreFreq as below
make CC='cc -fno-inline' clean all
ln -s src/CoreFreq/corefreqk.ko /lib/modules/$(uname -r)/kernel/drivers/misc/

depmod -a
  1. Start CoreFreq
  2. Check for the kernel module symbols
grep corefreqk /proc/kallsyms

Perf profiling use-case

insmod corefreqk.ko Register_ClockSource=1 Register_Governor=1 Register_CPU_Freq=1 Register_CPU_Idle=1 NMI_Disable=0

./corefreqd
corefreq-cli -t interrupts

2022-05-10-102952_644x564_scrot

2022-05-10-102819_642x1209_scrot

@cyring cyring changed the title Symbol issue within kernel module [Solved] Symbol issue within kernel module May 10, 2022
@cyring
Copy link
Owner

cyring commented May 26, 2022

Root cause was address relocation.
See commit bdfe6bc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants