Question on mmap / expecting more calls to vfs_read

Tue Jan 11 14:16:38 EST 2011

On 01/07/11 07:26, Rajat Sharma wrote:
> so, suitable position is to add hooks on readpage a_op. And of-course
> for doing that, you may have to capture various path thorugh which inode
> can come in memory, e.g. lookup and create directory inode operation
> (for regular files). For your worst nightmare, NFS implements its
> readdir with an additional feature with v3 protocol called READDIR PLUS,
> which not only gives you name of children of a directory, but also
> initializes their inodes in memory, so you may have to hook readdir as
> well and trap aops of all regular file child after nfs_readdir is
finished.

I see.

> As far as offset and length of I/O are concerned, page->index gives you
> its index in the page cache which in turn is equivalent to file offset
> (page->index << PAGE_SHIFT). readpage is invoked to bring in complete
> page in memory. It may so happen that page is a partial page (e.g. last
> page of file), in that case your I/O lenght will be inode->i_size &
> ~PAGE_MASK, otherwise it can be PAGE_SIZE. don't worry about file
> wholes, that is taken care by filesystem's original readpage method.

That sounds great.  I have added these lines of code to read_pages() of
mm/readahead.c:

================================================
{
  struct page *page = list_to_page(pages);
  for (page_idx = 0; page_idx < nr_pages;
      page_idx++,
      page = list_to_page(page->lru.next)) {
    pgoff_t offset = 0;
    pgoff_t len = PAGE_SIZE;

    if (page) {
      offset = page->index << PAGE_SHIFT;

      if (page->mapping && page->mapping->host) {
        struct inode *inode = page->mapping->host;
        len = inode->i_size & ~PAGE_MASK;
      }
    }

    printk(KERN_DEBUG "VFS: BODY at 1 read_pages(...)["
        "page_idx=%u, offset=%lu, len=%lu, page_size=%lu]",
        page_idx, offset, len, PAGE_SIZE);
  }
}
================================================

Suprisingly I never get len != PAGE_SIZE, i.e. partial pages.
Also, in a list of pages given, the offsets are all the same, all 0 in
this case:

================================================
[ 2811.993564] VFS: BODY at 1 read_pages(...)[page_idx=0, offset=0,
len=4096, page_size=4096]
[ 2811.996456] VFS: BODY at 1 read_pages(...)[page_idx=1, offset=0,
len=4096, page_size=4096]
[ 2811.999305] VFS: BODY at 1 read_pages(...)[page_idx=2, offset=0,
len=4096, page_size=4096]
[ 2812.002152] VFS: BODY at 1 read_pages(...)[page_idx=3, offset=0,
len=4096, page_size=4096]
================================================

Is  page = list_to_page(page->lru.next)  the correct way of walking
pages?  I am wondering how the main loop of read_pages()

	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
		struct page *page = list_to_page(pages);
		[..]
	}

really iterates, if neither the address of pages changes in the loop nor
page_idx is passed to anything.  I would think the loop is actually
accessing the same page again and again, but somehow my own

  page = list_to_page(page->lru.next)

above doesn't seem to do better.  Any insights?

Thanks!

> Having said above, it will still be better if you can state what you
> want to achieve in little layman language.

I want to trace down all reads and writes to the file system including
filename, offset and byte count into a log that I can reply inside a
simulator.

Best,

Sebastian