Userspace pages in UC mode
Sabela Ramos Garea
sabelaraga at gmail.com
Mon Sep 14 09:37:34 EDT 2015
Hi Pranay,
2015-09-12 3:12 GMT+02:00 Pranay Srivastava <pranjas at gmail.com>:
> Hi Sabela,
>
> On Fri, Sep 11, 2015 at 8:29 PM, Sabela Ramos Garea
> <sabelaraga at gmail.com> wrote:
>> Sorry, little mistake copypasting and cleaning. The pages and vma
>> structs should look like that:
>>
>> struct page *pages --> struct page *pages[MAX_PAGES];
>> struct vma_area_struct *vma --> struct vma_area_struct *vma[MAX_PAGES];
>>
>> Where MAX_PAGES is defined to 5.
>>
>> Sabela.
>>
>> 2015-09-11 16:07 GMT+02:00 Sabela Ramos Garea <sabelaraga at gmail.com>:
>>> Dear all,
>>>
>>> For research purposes I need some userspace memory pages to be in
>>> uncacheable mode. I am using two different Intel architectures (Sandy
>>> Bridge and Haswell) and two different kernels (2.6.32-358 and
>>> 3.19.0-28).
>>>
>>> The non-temporal stores from Intel assembly are not a valid solution
>>> so I am programming a kernel module that gets a set of pages from user
>>> space reserved with posix_memalign (get_user_pages) and then sets them
>>> as uncacheable (I have tried set_pages_uc and set_pages_array_uc).
>>> When I use one page, the access times are not very coherent and with
>>> more than one page the module crashes (in both architectures and both
>>> kernels).
>>>
>>> I wonder if I am using the correct approach or if I have to use kernel
>>> space pages in order to work with uncacheable memory. Or if I have to
>>> remap the memory. Just in case it makes it clearer, I am attaching the
>>> relevant lines of a kernel module function that should set the pages
>>> as uncacheable. (This function is the .write of a misc device; count
>>> is treated as the number of pages).
>>>
>>> Best and Thanks,
>>>
>>> Sabela.
>>>
>>> struct page *pages; //defined outside in order to be able to set them
>>> to WB in the release function.
>>> int numpages;
>>>
>>> static ssize_t setup_memory(struct file *filp, const char __user *buf,
>>> size_t count, loff_t * ppos)
>>> {
>>> int res;
>>> struct vm_area_struct *vmas;
>>>
> shouldn't this be rounded this up?
>>> numpages = count/4096;
>>>
For the current tests I am assuming that count is multiple of 4096 and
the user *buf is aligned. Anyway, isn't it safer if I just round down
so I don't mess with addresses outside the range of pages that have to
be set as uncached?
>>> down_read(¤t->mm->mmap_sem);
>>> res = get_user_pages(current, current->mm,
>>> (unsigned long) buf,
>>> numpages, /* Number of pages */
>>> 0, /* Do want to write into it */
>>> 1, /* do force */
>>> &pages,
>>> &vmas);
>>> up_read(¤t->mm->mmap_sem);
>>>
>>> numpages=res;
>>>
>>> if (res > 0) {
>>> set_pages_uc(pages, numpages); /* Uncached */
>
> what about high-mem pages. set_memory_uc does __pa, so perhaps that's
> the reason for your kernel oops?
>
I have used kmap to map the user addresses in kernel space as follows:
if (res > 0) {
for(i=0; i<res; i++){
kaddress = kmap(pages[i]);
set_memory_uc(kaddress,1);//userspace
addresses doesn't have to be contiguous...
}
//set_pages_array_uc(pages, count); /* Uncached */
printk("Write: %d pages set as uncacheable\n",numpages);
}
But the effect in the test code (user space) that tries to measure
cached vs. uncached accesses obtains lower latency for uncached pages.
Accesses are performed and measure like that:
CL_1 = (int *) buffer;
CL_2 = (int *) (buffer+CACHELINE);
//flush caches
//get timestamp
for(j=0;j<10;j++){
CL_2 = (int *) (buffer+CACHELINE);
for (i=1; i<naccesses; i++){
*CL_1 = *CL_2+i;
*CL_2 = *CL_1+i;
CL_2 = (int *)((char *)CL_2+CACHELINE);
}
}
//get timestamp
I've tried to do it within the kernel space but the results are similar.
Thanks,
Sabela.
More information about the Kernelnewbies
mailing list