<div dir="ltr">Dear Mulyadi,<div><br></div><div>Thank you for your response. Sorry for top posting it, I was waiting for my posting to arrive in my mail-box but it took for ever and hence top posted in eagerness. You are right in your observation that I couldn't possibly have had my first application crash at 6GB. I should have said about 15GB. I have many nodes I just picked the outputs from couple and presented by observation. Below here I will try to dissect my observation in the hope you can help me understand this, my OS concepts have become a little and haven't been in touch with them. </div><div><br></div><div> </div><div>Here is a machine that Currently has this state:</div><div><div>$ free -g</div><div> total used free shared buffers cached</div><div>Mem: 23 14 8 0 0 0</div><div>-/+ buffers/cache: 14 9</div><div>Swap: 0 0 0</div></div><div><br></div><div>I have a program that just globs memory here is what happens when I run this:</div><div><div>$ ./eatmemory 8.99G</div><div>Eating 8589934592 bytes in chunks of 1024...</div><div>Done, press any key to free the memory</div><div><br></div><div>$ ./eatmemory 9G</div><div>Eating 9663676416 bytes in chunks of 1024...</div><div>Killed</div></div><div><br></div><div><br></div><div>I believe the above observation is nothing wrong, because RAM is used by what other(assuming running) applications and I only have so much available for my program to run.</div><div><br></div><div>But my issue is nothing else other than system services are running on this machine, this renders this node un-usable for the next program that runs on this machine and when request more than what 9G as above. Below here is the output of /proc/meminfo from the same machine</div><div><br></div><div><div>$ cat /proc/meminfo</div><div>MemTotal: 24724728 kB</div><div>MemFree: 9402768 kB</div><div>Buffers: 0 kB</div><div>Cached: 217464 kB</div><div>SwapCached: 0 kB</div><div>Active: 14650896 kB</div><div>Inactive: 60456 kB</div><div>Active(anon): 14647052 kB</div><div>Inactive(anon): 40632 kB</div><div>Active(file): 3844 kB</div><div>Inactive(file): 19824 kB</div><div>Unevictable: 0 kB</div><div>Mlocked: 0 kB</div><div>SwapTotal: 0 kB</div><div>SwapFree: 0 kB</div><div>Dirty: 0 kB</div><div>Writeback: 0 kB</div><div>AnonPages: 14493928 kB</div><div>Mapped: 19544 kB</div><div>Shmem: 193720 kB</div><div>Slab: 109720 kB</div><div>SReclaimable: 12300 kB</div><div>SUnreclaim: 97420 kB</div><div>KernelStack: 2968 kB</div><div>PageTables: 39100 kB</div><div>NFS_Unstable: 0 kB</div><div>Bounce: 0 kB</div><div>WritebackTmp: 0 kB</div><div>CommitLimit: 12362364 kB</div><div>Committed_AS: 15684044 kB</div><div>VmallocTotal: 34359738367 kB</div><div>VmallocUsed: 493316 kB</div><div>VmallocChunk: 34346062668 kB</div><div>HardwareCorrupted: 0 kB</div><div>AnonHugePages: 13936640 kB</div><div>HugePages_Total: 0</div><div>HugePages_Free: 0</div><div>HugePages_Rsvd: 0</div><div>HugePages_Surp: 0</div><div>Hugepagesize: 2048 kB</div><div>DirectMap4k: 7652 kB</div><div>DirectMap2M: 25145344 kB</div></div><div><br></div><div>Also here is my ulimit which is unlimited:</div><div><div>$ ulimit -a</div><div>core file size (blocks, -c) 0</div><div>data seg size (kbytes, -d) unlimited</div><div>scheduling priority (-e) 0</div><div>file size (blocks, -f) unlimited</div><div>pending signals (-i) 192912</div><div>max locked memory (kbytes, -l) unlimited</div><div>max memory size (kbytes, -m) unlimited</div><div>open files (-n) 1024</div><div>pipe size (512 bytes, -p) 8</div><div>POSIX message queues (bytes, -q) 819200</div><div>real-time priority (-r) 0</div><div>stack size (kbytes, -s) 81920</div><div>cpu time (seconds, -t) unlimited</div><div>max user processes (-u) 1024</div><div>virtual memory (kbytes, -v) unlimited</div><div>file locks (-x) unlimited</div></div><div><br></div><div>And /proc/self/maps</div><div><div>$ cat /proc/self/maps</div><div>00400000-0040b000 r-xp 00000000 00:10 67455633 /bin/cat</div><div>0060a000-0060b000 rw-p 0000a000 00:10 67455633 /bin/cat</div><div>0060b000-0060c000 rw-p 00000000 00:00 0</div><div>0080a000-0080b000 rw-p 0000a000 00:10 67455633 /bin/cat</div><div>0209f000-020c0000 rw-p 00000000 00:00 0 [heap]</div><div>36d7e00000-36d7e20000 r-xp 00000000 00:10 67454760 /lib64/<a href="http://ld-2.12.so">ld-2.12.so</a></div><div>36d801f000-36d8020000 r--p 0001f000 00:10 67454760 /lib64/<a href="http://ld-2.12.so">ld-2.12.so</a></div><div>36d8020000-36d8021000 rw-p 00020000 00:10 67454760 /lib64/<a href="http://ld-2.12.so">ld-2.12.so</a></div><div>36d8021000-36d8022000 rw-p 00000000 00:00 0</div><div>36d8200000-36d838a000 r-xp 00000000 00:10 67456999 /lib64/<a href="http://libc-2.12.so">libc-2.12.so</a></div><div>36d838a000-36d858a000 ---p 0018a000 00:10 67456999 /lib64/<a href="http://libc-2.12.so">libc-2.12.so</a></div><div>36d858a000-36d858e000 r--p 0018a000 00:10 67456999 /lib64/<a href="http://libc-2.12.so">libc-2.12.so</a></div><div>36d858e000-36d858f000 rw-p 0018e000 00:10 67456999 /lib64/<a href="http://libc-2.12.so">libc-2.12.so</a></div><div>36d858f000-36d8594000 rw-p 00000000 00:00 0</div><div>7f754caad000-7f754cab0000 rw-p 00000000 00:00 0</div><div>7f754cac2000-7f754cac3000 rw-p 00000000 00:00 0</div><div>7fff5e496000-7fff5e4ab000 rw-p 00000000 00:00 0 [stack]</div><div>7fff5e5f8000-7fff5e5f9000 r-xp 00000000 00:00 0 [vdso]</div><div>ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]</div></div><div><br></div><div>On every machine i ran into this problem, anonpages are eating up the memory,in effect shrinking the available RAM for the programs to run. </div><div>Q) Now my question is since the previous job/program that ran on this machines has finished or died: My OS concepts tell me that the recently used cached-anonpages will be released to meet the request of another application requesting to use up the memory/vm. What am I missing here to understand?</div><div><br></div><div>Also what I fail to understand is the state in which my diskelss & swapless nodes remain: What/who has control over the used up memory, why is it not being granted for the next owner of the machine to run at full scale? I understand that I will not have all of it but at least 19GB out of 24GB. Also below is the list of top process on the machines: Looking at it I don't see any heave use of memory ...mystery make me feel dumb??</div><div><br></div><div>$ ps aux --sort -rss<br></div><div><div>USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND</div><div>root 8402 0.0 0.0 119712 15896 ? S 12:11 0:00 /usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files</div><div>root 9555 0.0 0.0 3508796 4680 ? S Aug18 0:33 /usr/sbin/slurmd</div><div>useralap 8231 0.0 0.0 27224 4604 pts/0 S 12:06 0:00 -bash</div><div>root 8401 0.0 0.0 151052 4264 ? S 12:11 0:00 /usr/libexec/sssd/sssd_be --domain default --uid 0 --gid 0 --debug-to-files</div><div>root 8153 0.0 0.0 111192 3240 pts/0 Ss 12:05 0:00 -bash</div><div>root 2078 0.0 0.0 720600 2968 ? Ssl Aug10 1:39 automount --pid-file /var/run/autofs.pid</div><div>root 1752 0.0 0.0 249344 2784 ? Sl Aug10 0:04 /sbin/rsyslogd -i /var/run/syslogd.pid -c 5</div><div>useralap 9898 1.0 0.0 26196 1468 pts/0 R+ 15:42 0:00 ps aux --sort -rss</div><div>root 8150 0.0 0.0 111816 1296 ? Ss 12:05 0:00 sshd: root@pts/0</div><div>munge 2146 0.0 0.0 225004 1292 ? Sl Aug10 0:36 /usr/sbin/munged</div><div>68 2006 0.0 0.0 41976 1228 ? Ssl Aug10 0:10 hald</div><div>root 1671 0.0 0.0 9120 976 ? Ss Aug10 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient-em1.leases -pf /var/run/dhclient-em1.pid</div><div>root 8400 0.0 0.0 114288 900 ? Ss 12:11 0:00 /usr/sbin/sssd -f -D</div><div>root 8403 0.0 0.0 105264 876 ? S 12:11 0:00 /usr/libexec/sssd/sssd_pam --uid 0 --gid 0 --debug-to-files</div><div>root 2174 0.0 0.0 20000 868 ? Ss Aug10 0:42 crond</div><div>root 2111 0.0 0.0 66188 712 ? Ss Aug10 0:00 /usr/sbin/sshd</div><div>root 3625 0.0 0.0 334616 648 ? SLsl Aug10 0:00 /usr/sbin/ibacm</div><div>root 863 0.0 0.0 10832 592 ? S<s Aug10 0:00 /sbin/udevd -d</div><div>root 3391 0.0 0.0 10828 588 ? S< Aug10 0:00 /sbin/udevd -d</div><div>root 3523 0.0 0.0 10828 588 ? S< Aug10 0:00 /sbin/udevd -d</div><div>rpcuser 1893 0.0 0.0 25428 464 ? Ss Aug10 0:00 rpc.statd</div><div>root 1736 0.0 0.0 93176 460 ? S<sl Aug10 0:07 auditd</div><div>root 1 0.0 0.0 23500 452 ? Ss Aug10 0:02 /sbin/init</div><div>root 1799 0.0 0.0 10912 452 ? Ss Aug10 6:45 irqbalance --pid=/var/run/irqbalance.pid</div><div>root 8230 0.0 0.0 165156 448 pts/0 S 12:06 0:00 su - useralap</div><div>rpc 1875 0.0 0.0 18976 300 ? Ss Aug10 0:02 rpcbind</div><div>dbus 1934 0.0 0.0 23484 280 ? Ss Aug10 0:00 dbus-daemon --system</div><div>root 2199 0.0 0.0 21076 212 ? Ss Aug10 0:00 /usr/sbin/atd</div><div>root 2207 0.0 0.0 21792 212 ? S Aug10 0:24 /usr/sbin/ipmievd sel pidfile=/var/run/ipmievd.pid</div><div>root 2007 0.0 0.0 20400 184 ? S Aug10 0:00 hald-runner</div><div>root 2043 0.0 0.0 22520 164 ? S Aug10 0:00 hald-addon-input: Listening on /dev/input/event0</div><div>68 2045 0.0 0.0 18008 148 ? S Aug10 0:00 hald-addon-acpi: listening on acpid socket /var/run/acpid.socket</div><div>root 1997 0.0 0.0 4080 116 ? Ss Aug10 0:00 /usr/sbin/acpid</div><div>root 2097 0.0 0.0 6260 116 ? Ss Aug10 0:00 /usr/sbin/mcelog --daemon</div><div>root 2222 0.0 0.0 4064 76 tty2 Ss+ Aug10 0:00 /sbin/mingetty /dev/tty2</div></div><div><br></div><div>Please advise and let me know if you need more information.</div><div>-best regards!!</div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 23, 2015 at 11:07 AM, Mulyadi Santosa <span dir="ltr"><<a href="mailto:mulyadi.santosa@gmail.com" target="_blank">mulyadi.santosa@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div class="h5"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 23, 2015 at 4:47 AM, Prem Kumar <span dir="ltr"><<a href="mailto:prem.it.kumar@gmail.com" target="_blank">prem.it.kumar@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">also wondering if there is a way I can list Active memory map showing me what is cached?<div><br></div><div>-regards.</div></div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 22, 2015 at 3:08 PM, Prem Kumar <span dir="ltr"><<a href="mailto:prem.it.kumar@gmail.com" target="_blank">prem.it.kumar@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Dear All,<div><br></div><div>I have done quite a bit of reading on Active memory reported in /proc/meminfo and in short says it is never reclaimed unless absolutely necessary, and it caches the recently used files/pages in memory. Although I fail to understand the consequences that I face here. </div><div><br></div><div>I have disk-less and swap-less nodes. So all I have to do, is play with the RAM on the box. Issue that brought me here is investigating why after running some applications, used memory is never available for use with any other applications. </div><div><br></div><div>In other words I cannot run any programs that requests memory more than what is shown as free in the output of free command and MemFree in the output of the cat /proc/meminfo</div><div>For example if I ran any program that requires more than 6GB on the first node below and more than 1GB on the second node below they fail instantly, and work fine if within the limist of free. There is nothing else running on the system other than system processes/services. </div><div><br></div><div><div> total used free shared buffers cached</div><div>Mem: 23 17 6 0 0 9</div><div>-/+ buffers/cache: 8 15</div><div>Swap: 0 0 0</div><div><br></div><div> total used free shared buffers cached</div><div>Mem: 23 22 1 0 0 0</div><div>-/+ buffers/cache: 21 1</div><div>Swap: 0 0 0</div></div><div><br></div><div>Since the applications that ran previously are not running any more "even though they died out of memory because they requested more memory than available", shouldn't the OS see that any memory used previously as useless and can it not reclaim that for use with the next job/program on that machine. </div><div><br></div><div>On every machine that I have run into this problem the out put of /proc/meminfo shows that Active memory is used up the amount shown in the free command and limits my further runs. </div><div><br></div><div>This is driving me insane and making me feel stupid knowing that OS is smart enough to handle this, then what am I missing here to understand? Please advise. </div><div><br></div><div>Appreciate any insight into this. </div><div><br></div><div>Best Regards,</div><div>Prem</div><div><br></div><div><br></div></div>
</blockquote></div><br></div>
</div></div><br><br></blockquote></div><br></div></div></div><div class="gmail_extra">Dear Prem<br><br></div><div class="gmail_extra">welcome to kernelnewbies :) First of all, please don't do top posting when replying. Follow like what I and the rest of list member do.<br><br></div><div class="gmail_extra">Btw, looking from the free output, I have a doubt about your statement that your first application took 6 GB and secondly it took 1 GB. Assuming your application doesn't thing like memory locking in kernel space, i guess it takes 20+ GB of RAM.<br><br></div><div class="gmail_extra">So, before we go further, could you re run your applications and use ps or top to see both the VSIZE and RSS they take ?<br><br></div><div class="gmail_extra">Regarding memory claiming, yes after app is killed (using any ways possible: ctrl-c, sending kill/term/quit signal, OOM etc), any memory allocated by this task are freed. It happen on both active and inactive pages<span class="HOEnZb"><font color="#888888"><br></font></span></div><span class="HOEnZb"><font color="#888888"><div class="gmail_extra"><br clear="all"><br>-- <br><div>regards,<br><br>Mulyadi Santosa<br>Freelance Linux trainer and consultant<br><br>blog: <a href="http://the-hydra.blogspot.com" target="_blank">the-hydra.blogspot.com</a><br>training: <a href="http://mulyaditraining.blogspot.com" target="_blank">mulyaditraining.blogspot.com</a></div>
</div></font></span></div>
</blockquote></div><br></div>