Understanding the locking behavior of msync
Maximilian Böther
maximilian.boether at student.hpi.de
Wed Mar 24 07:56:46 EDT 2021
Hello!
I am investigating an application that writes random data in fixed-size
chunks (e.g. 4k) to random locations in a large buffer file. I have
several processes (not threads) doing that, each process has its own
buffer file assigned.
If I use mmap+msync to write and persist data to disk, I see a
performance spike for 16 processes, and a performance drop for more
threads (32 processes). The CPU has 32 logical cores in total, and we
are not CPU bound.
If I use open+write+fsync, I do not see such a spike, instead a
performance plateau (and mmap is slower than open/write).
I've read multiple times [1,2] that both mmap and msync can take locks.
With vtune, I analyzed that we are indeed spinlocking, and spending the
most time in clear_page_erms and xas_load functions.
However, when reading the source code for msync [3], I cannot understand
whether these locks are global or per-file. The paper [2] states that
the locks are on radix-trees within the kernel that are per-file,
however, as I do observe some spinlocks in the kernel, I believe that
some locks may be global, as I have one file per process.
Do you have an explanation on why we have such a spike at 16 processes
for mmap and input on the locking behavior of msync?
Thank you!
Best,
Maximilian Böther
[1]
https://kb.pmem.io/development/100000025-Why-msync-is-less-optimal-for-persistent-memory/
- I know it's about PMem, but the lock argument is important
[2] Optimizing Memory-mapped I/O for Fast Storage Devices, Papagiannis
et al., ATC '20
[3] https://elixir.bootlin.com/linux/latest/source/mm/msync.c
More information about the Kernelnewbies
mailing list