MADV_ZERO

Robert Nagy ronag89 at gmail.com
Tue Sep 1 14:34:41 EDT 2015


I have recently switched over to Linux after encountering an issue that seems unsolvable in Windows with the hope of finding a solution in Linux.

Basically what I need to achieve is IPC persisted to a huge file. I’m doing this currently by memory-mapping a huge backing file and always sequentially writing to the file in a circular fashion 24/7. 

However, this has some major throughput issues since overwriting pages will always cause a page-faults, even if entire pages are overwritten, which totally trashes disk performance.

Basically I would need a flag for madvice (e.g. MADV_ZERO) with similar functionality to FALLOC_FL_ZERO_RANGE so that I would get much faster zero fill page faults instead. The closest I’ve come is to use MADV_REMOVE before overwriting the range, however, that is a suboptimal as it will from my understanding fragment the backing file and potentially degrade performance over time.

What I’d like to be able to do is something like:

int mm_fast_write(void* dst, void* src, size_t length)
{
  if (dst & ~PAGE_MASK)
    return -EINVAL;
  if (src & ~PAGE_MASK)
    return -EINVAL;
  if (length & ~PAGE_MASK)
    return -EINVAL;
 madvice(ptr, len, MADV_ZERO);
 memcpy(ptr, src, len);
 madvice(ptr, len, MADV_DONTNEED); // Might not do anything without msync?
 return 0;
}

Is it possible to implement or emulate something like MADV_ZERO in user mode? Or should I look into modifying the kernel? I believe it could be implemented based on madvice_remove by simply replacing the FALLOC_FL_PUNCH_HOLE flag with FALLOC_FL_ZERO_RANGE?




More information about the Kernelnewbies mailing list