Active Memory never reclaimed please help me understand

Prem Kumar prem.it.kumar at gmail.com
Wed Sep 23 16:46:50 EDT 2015


Dear Mulyadi,

Thank you for your response. Sorry for top posting it, I was waiting for my
posting to arrive in my mail-box but it took for ever and hence top posted
in eagerness. You are right in your observation that I couldn't possibly
have had my first application crash at 6GB. I should have said about 15GB.
I have many nodes I just picked the outputs from couple and presented by
observation. Below here I will try to dissect my observation in the hope
you can help me understand this, my OS concepts have become a little and
haven't been in touch with them.


Here is a machine that Currently has this state:
$ free -g
             total       used       free     shared    buffers     cached
Mem:            23         14          8          0          0          0
-/+ buffers/cache:         14          9
Swap:            0          0          0

I have a program that just globs memory here is what happens when I run
this:
$ ./eatmemory 8.99G
Eating 8589934592 bytes in chunks of 1024...
Done, press any key to free the memory

$ ./eatmemory 9G
Eating 9663676416 bytes in chunks of 1024...
Killed


I believe the above observation is nothing wrong, because RAM is used by
what other(assuming running) applications and I only have so much available
for my program to run.

But my issue is nothing else other than system services are running on this
machine, this renders this node un-usable for the next program that runs on
this machine and when request more than what 9G as above. Below here is the
output of /proc/meminfo from the same machine

$ cat /proc/meminfo
MemTotal:       24724728 kB
MemFree:         9402768 kB
Buffers:               0 kB
Cached:           217464 kB
SwapCached:            0 kB
Active:         14650896 kB
Inactive:          60456 kB
Active(anon):   14647052 kB
Inactive(anon):    40632 kB
Active(file):       3844 kB
Inactive(file):    19824 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:      14493928 kB
Mapped:            19544 kB
Shmem:            193720 kB
Slab:             109720 kB
SReclaimable:      12300 kB
SUnreclaim:        97420 kB
KernelStack:        2968 kB
PageTables:        39100 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12362364 kB
Committed_AS:   15684044 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      493316 kB
VmallocChunk:   34346062668 kB
HardwareCorrupted:     0 kB
AnonHugePages:  13936640 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        7652 kB
DirectMap2M:    25145344 kB

Also here is my ulimit which is unlimited:
$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 192912
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 81920
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

And /proc/self/maps
$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 00:10 67455633
/bin/cat
0060a000-0060b000 rw-p 0000a000 00:10 67455633
/bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0080a000-0080b000 rw-p 0000a000 00:10 67455633
/bin/cat
0209f000-020c0000 rw-p 00000000 00:00 0
 [heap]
36d7e00000-36d7e20000 r-xp 00000000 00:10 67454760
/lib64/ld-2.12.so
36d801f000-36d8020000 r--p 0001f000 00:10 67454760
/lib64/ld-2.12.so
36d8020000-36d8021000 rw-p 00020000 00:10 67454760
/lib64/ld-2.12.so
36d8021000-36d8022000 rw-p 00000000 00:00 0
36d8200000-36d838a000 r-xp 00000000 00:10 67456999
/lib64/libc-2.12.so
36d838a000-36d858a000 ---p 0018a000 00:10 67456999
/lib64/libc-2.12.so
36d858a000-36d858e000 r--p 0018a000 00:10 67456999
/lib64/libc-2.12.so
36d858e000-36d858f000 rw-p 0018e000 00:10 67456999
/lib64/libc-2.12.so
36d858f000-36d8594000 rw-p 00000000 00:00 0
7f754caad000-7f754cab0000 rw-p 00000000 00:00 0
7f754cac2000-7f754cac3000 rw-p 00000000 00:00 0
7fff5e496000-7fff5e4ab000 rw-p 00000000 00:00 0
 [stack]
7fff5e5f8000-7fff5e5f9000 r-xp 00000000 00:00 0
 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
 [vsyscall]

On every machine i ran into this problem, anonpages are eating up the
memory,in effect shrinking the available RAM for the programs to run.
Q) Now my question is since the previous job/program that ran on this
machines has finished or died: My OS concepts tell me that the recently
used cached-anonpages will be released to meet the request of another
application requesting to use up the memory/vm. What am I missing here to
understand?

Also what I fail to understand is the state in which my diskelss & swapless
nodes remain: What/who has control over the used up memory, why is it not
being granted for the next owner of the machine to run at full scale? I
understand that I will not have all of it but at least 19GB out of 24GB.
Also below is the list of top process on the machines: Looking at it I
don't see any heave use of memory  ...mystery make me feel dumb??

$ ps aux --sort -rss
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      8402  0.0  0.0 119712 15896 ?        S    12:11   0:00
/usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files
root      9555  0.0  0.0 3508796 4680 ?        S    Aug18   0:33
/usr/sbin/slurmd
useralap   8231  0.0  0.0  27224  4604 pts/0    S    12:06   0:00 -bash
root      8401  0.0  0.0 151052  4264 ?        S    12:11   0:00
/usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files
root      8153  0.0  0.0 111192  3240 pts/0    Ss   12:05   0:00 -bash
root      2078  0.0  0.0 720600  2968 ?        Ssl  Aug10   1:39 automount
--pid-file /var/run/autofs.pid
root      1752  0.0  0.0 249344  2784 ?        Sl   Aug10   0:04
/sbin/rsyslogd -i /var/run/syslogd.pid -c 5
useralap   9898  1.0  0.0  26196  1468 pts/0    R+   15:42   0:00 ps aux
--sort -rss
root      8150  0.0  0.0 111816  1296 ?        Ss   12:05   0:00 sshd:
root at pts/0
munge     2146  0.0  0.0 225004  1292 ?        Sl   Aug10   0:36
/usr/sbin/munged
68        2006  0.0  0.0  41976  1228 ?        Ssl  Aug10   0:10 hald
root      1671  0.0  0.0   9120   976 ?        Ss   Aug10   0:00
/sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-em1.leases -pf
/var/run/dhclient-em1.pid
root      8400  0.0  0.0 114288   900 ?        Ss   12:11   0:00
/usr/sbin/sssd -f -D
root      8403  0.0  0.0 105264   876 ?        S    12:11   0:00
/usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files
root      2174  0.0  0.0  20000   868 ?        Ss   Aug10   0:42 crond
root      2111  0.0  0.0  66188   712 ?        Ss   Aug10   0:00
/usr/sbin/sshd
root      3625  0.0  0.0 334616   648 ?        SLsl Aug10   0:00
/usr/sbin/ibacm
root       863  0.0  0.0  10832   592 ?        S<s  Aug10   0:00
/sbin/udevd -d
root      3391  0.0  0.0  10828   588 ?        S<   Aug10   0:00
/sbin/udevd -d
root      3523  0.0  0.0  10828   588 ?        S<   Aug10   0:00
/sbin/udevd -d
rpcuser   1893  0.0  0.0  25428   464 ?        Ss   Aug10   0:00 rpc.statd
root      1736  0.0  0.0  93176   460 ?        S<sl Aug10   0:07 auditd
root         1  0.0  0.0  23500   452 ?        Ss   Aug10   0:02 /sbin/init
root      1799  0.0  0.0  10912   452 ?        Ss   Aug10   6:45 irqbalance
--pid=/var/run/irqbalance.pid
root      8230  0.0  0.0 165156   448 pts/0    S    12:06   0:00 su -
useralap
rpc       1875  0.0  0.0  18976   300 ?        Ss   Aug10   0:02 rpcbind
dbus      1934  0.0  0.0  23484   280 ?        Ss   Aug10   0:00
dbus-daemon --system
root      2199  0.0  0.0  21076   212 ?        Ss   Aug10   0:00
/usr/sbin/atd
root      2207  0.0  0.0  21792   212 ?        S    Aug10   0:24
/usr/sbin/ipmievd sel pidfile=/var/run/ipmievd.pid
root      2007  0.0  0.0  20400   184 ?        S    Aug10   0:00 hald-runner
root      2043  0.0  0.0  22520   164 ?        S    Aug10   0:00
hald-addon-input: Listening on /dev/input/event0
68        2045  0.0  0.0  18008   148 ?        S    Aug10   0:00
hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
root      1997  0.0  0.0   4080   116 ?        Ss   Aug10   0:00
/usr/sbin/acpid
root      2097  0.0  0.0   6260   116 ?        Ss   Aug10   0:00
/usr/sbin/mcelog --daemon
root      2222  0.0  0.0   4064    76 tty2     Ss+  Aug10   0:00
/sbin/mingetty /dev/tty2

Please advise and let me know if you need more information.
-best regards!!


On Wed, Sep 23, 2015 at 11:07 AM, Mulyadi Santosa <mulyadi.santosa at gmail.com
> wrote:

>
>
> On Wed, Sep 23, 2015 at 4:47 AM, Prem Kumar <prem.it.kumar at gmail.com>
> wrote:
>
>> also wondering if there is a way I can list Active memory map showing me
>> what is cached?
>>
>> -regards.
>>
>> On Tue, Sep 22, 2015 at 3:08 PM, Prem Kumar <prem.it.kumar at gmail.com>
>> wrote:
>>
>>> Dear All,
>>>
>>> I have done quite a bit of reading on Active memory reported in
>>> /proc/meminfo and in short says it is never reclaimed unless absolutely
>>> necessary, and it caches the recently used files/pages in memory. Although
>>> I fail to understand the consequences that I face here.
>>>
>>> I have disk-less and swap-less nodes. So all I have to do, is play with
>>> the RAM on the box. Issue that brought me here is investigating why after
>>> running some applications, used memory is never available for use with any
>>> other applications.
>>>
>>> In other words I cannot run any programs that requests memory more than
>>> what is shown as free in the output of free command and MemFree in the
>>> output of the cat /proc/meminfo
>>> For example if I ran any program that requires more than 6GB on the
>>> first node below and more than 1GB on the second node below they fail
>>> instantly, and work fine if within the limist of free. There is nothing
>>> else running on the system other than system processes/services.
>>>
>>>              total       used       free     shared    buffers     cached
>>> Mem:            23         17          6          0          0          9
>>> -/+ buffers/cache:          8         15
>>> Swap:            0          0          0
>>>
>>>              total       used       free     shared    buffers     cached
>>> Mem:            23         22          1          0          0          0
>>> -/+ buffers/cache:         21          1
>>> Swap:            0          0          0
>>>
>>> Since the applications that ran previously are not running any more
>>> "even though they died out of memory because they requested more memory
>>> than available", shouldn't the OS see that any memory used previously as
>>> useless and can it not reclaim that for use with the next job/program on
>>> that machine.
>>>
>>> On every machine that I have run into this problem the out put of
>>> /proc/meminfo shows that Active memory is used up the amount shown in the
>>> free command and limits my further runs.
>>>
>>> This is driving me insane and making me feel stupid knowing that OS is
>>> smart enough to handle this, then what am I missing here to understand?
>>> Please advise.
>>>
>>> Appreciate any insight into this.
>>>
>>> Best Regards,
>>> Prem
>>>
>>>
>>>
>>
>>
>>
> Dear Prem
>
> welcome to kernelnewbies :) First of all, please don't do top posting when
> replying. Follow like what I and the rest of list member do.
>
> Btw, looking from the free output, I have a doubt about your statement
> that your first application took 6 GB and secondly it took 1 GB. Assuming
> your application doesn't thing like memory locking in kernel space, i guess
> it takes 20+ GB of RAM.
>
> So, before we go further, could you re run your applications and use ps or
> top to see both the VSIZE and RSS they take ?
>
> Regarding memory claiming, yes after app is killed (using any ways
> possible: ctrl-c, sending kill/term/quit signal, OOM etc), any memory
> allocated by this task are freed. It happen on both active and inactive
> pages
>
>
> --
> regards,
>
> Mulyadi Santosa
> Freelance Linux trainer and consultant
>
> blog: the-hydra.blogspot.com
> training: mulyaditraining.blogspot.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.kernelnewbies.org/pipermail/kernelnewbies/attachments/20150923/94d5241f/attachment-0001.html 


More information about the Kernelnewbies mailing list