Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport the new slightly different algorithm for "MemAvailable" used by Linux kernel 4.8+ #2

Closed
giampaolo opened this issue Sep 19, 2016 · 11 comments

Comments

@giampaolo
Copy link

giampaolo commented Sep 19, 2016

On Ubuntu 16.04 (kernel 4.0.0.36) the "avail" value differs from system's by about 90M.

~/svn/psutil$ ./free.pl  | grep avail
  -/+ avail           7135660    9189988
~/svn/psutil$ cat /proc/meminfo |grep Avail
MemAvailable:    9091500 kB

At first I thought it was the Perl interpreter overhead but I tried running "cat /proc/meminfo |grep Avail" from the shell script as a subprocess and I get a similar result.
Any idea?

@famzah
Copy link
Owner

famzah commented Sep 20, 2016

I have no idea where this 1% error comes. It could be that not all counters in "/proc/meminfo" are updated at the same time atomically.

Can I see the whole output of your "/proc/meminfo"?

@famzah famzah changed the title Avail memory is 91M inaccurate Avail memory is 91M (1%) inaccurate Sep 20, 2016
@giampaolo
Copy link
Author

~/svn/psutil$ cat /proc/zoneinfo | grep low
        low      20
        low      961
        low      20137
~/svn/psutil$ cat /proc/meminfo 
MemTotal:       16325648 kB
MemFree:         7580996 kB
MemAvailable:    9244724 kB
Buffers:          244084 kB
Cached:          2152960 kB
SwapCached:            0 kB
Active:          6975492 kB
Inactive:        1283500 kB
Active(anon):    5866260 kB
Inactive(anon):   553308 kB
Active(file):    1109232 kB
Inactive(file):   730192 kB
Unevictable:         216 kB
Mlocked:             216 kB
SwapTotal:       1952764 kB
SwapFree:        1952764 kB
Dirty:             30060 kB
Writeback:             0 kB
AnonPages:       5862216 kB
Mapped:           981876 kB
Shmem:            557628 kB
Slab:             229452 kB
SReclaimable:     171080 kB
SUnreclaim:        58372 kB
KernelStack:       16048 kB
PageTables:        76600 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    10115588 kB
Committed_AS:   16580352 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1980416 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      325548 kB
DirectMap2M:    12150784 kB
DirectMap1G:     5242880 kB

@giampaolo
Copy link
Author

I think I know why: Linux kernel changed the way this is calculated.
Your perl package starts calculating memory as:

avail = free - watermark_low

Whereas (recent) Linux kernels do avail = free - total_reserved_pages:
https://github.com/torvalds/linux/blob/6aa303defb7454a2520c4ddcdf6b081f62a15890/mm/page_alloc.c#L4025

Not sure how to get total_reserved_pages though. :-\

@giampaolo
Copy link
Author

...and this is where the change got introduced:
torvalds/linux@84ad580

@giampaolo
Copy link
Author

@giampaolo
Copy link
Author

giampaolo commented Sep 21, 2016

Also free cmdline utility makes the same mistake, so I filed an issue for procps project:
https://gitlab.com/procps-ng/procps/issues/42

@famzah
Copy link
Owner

famzah commented Sep 21, 2016

Giampaolo, thank you for the thorough research. Now we know the cause of the issue and how the current Linux implementation works. Unfortunately, I lack the knowledge of how to calculate "total_reserved_pages" from the info exported in "/proc".

I'll leave this case opened. Maybe someone else could give us a hint.

@famzah famzah changed the title Avail memory is 91M (1%) inaccurate Backport the new slightly different algorithm for "MemAvailable" used by Linux kernel 4.8+ Sep 21, 2016
@clopez
Copy link

clopez commented Nov 3, 2016

I think all the info is exposed on /proc/zoneinfo

Check this example of file with the relevant fields annotated at the right: http://sprunge.us/EZba

I think is a matter of iterating over the zones. The values for lowmem_reserve on each zone seem to be printed inside the protection() string

famzah added a commit that referenced this issue Nov 6, 2016
@famzah
Copy link
Owner

famzah commented Nov 6, 2016

@clopez thanks for the hints. I've backported the new Linux kernel implementation.

Guys, please test "free.pl" now and re-open this ticket by providing additional debug info if there is still a problem.

@famzah famzah closed this as completed Nov 6, 2016
@giampaolo
Copy link
Author

@famzah I would be interested in doing the same in psutil. I don't know Perl so it's hard for me to understand what you did in there. Could you explain the logic you used with some pseudo code or something?

@famzah
Copy link
Owner

famzah commented Nov 8, 2016

I'll try to explain the whole commit cfb776d:

  • _get_empty_zone_struct() -- returns a structure holding the regular expressions used to parse the "/proc/zoneinfo" values which we are interested in; this structure also contains the parsed values
  • _zone_end() -- called after each zone definition end in "/proc/zoneinfo"; does checks if we parsed all values; splits the "lowmem_reserve" string into an array of integers; parses the Node ID and saves the whole structure as the next array element for this Node ID:
Node 0
   ZoneA info
   ZoneB info
   ...
Node 1
   ZoneA info
   ...
  • parse_proc_zoneinfo() -- iterates over each line of "/proc/zoneinfo" and populates the already mentioned data structures
  • get_max_nr_zones() -- a naive way to get the maximum number of Zones that each memory Node can contain; I count the elements in the array "lowmem_reserve"; in reality, MAX_NR_ZONES is defined in the Linux kernel header files
  • calculate_totalreserve_pages() -- this is a one-by-one translation from C to Perl of the same algorithm used in the Linux kernel; note that MAX_NR_ZONES can be greater than the actual Zone count that you see in "/proc/zoneinfo" -- once I detect this situation, I "break" the loop for this particular memory Node (since there are no more Zones to iterate over)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants