Question about memcpy

Himanshu Jha himanshujha199640 at gmail.com
Mon Jul 9 10:04:44 EDT 2018


Hi Bing,

On Sun, Jul 08, 2018 at 10:03:48PM +0800, bing zhu wrote:
> void *p = malloc(4096 * max);
> start = usec();
> for (i = 0; i < max; i++) {
> memcpy(p + i * 4096, page, 4096);
> }
> end = usec();
> printf("%s : %d time use %lu us \n", __func__, max,end - start);
> 
> static unsigned long usec(void)
> {
>         struct timeval tv;
>         gettimeofday(&tv, 0);
>         return (unsigned long)tv.tv_sec * 1000000 + tv.tv_usec;
> }

I think for these benchmarking stuff, to evaluate the cycles and time
correctly you should use the __rdtscp(more info at "AMD64 Architecture
Programmer’s Manual Volume 3: General-Purpose and System Instructions"
Pg 401)

Userspace:
----------------------------------------------------------------------
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <x86intrin.h>

volatile unsigned sink;
unsigned int junk;

int main (void)
{
clock_t start = clock();
register uint64_t t=__rdtscp(&junk);

for(size_t i=0; i<10000000; ++i)
	sink++;

t=__rdtscp(&junk)-t;
clock_t end = clock();
double cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;

printf("for loop took %f seconds to execute %zu cylces\n", cpu_time_used, t);
}
---------------------------------------------------------------------

Kernelspace:
If you want to dig more:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf


Thanks
-- 
Himanshu Jha
Undergraduate Student
Department of Electronics & Communication
Guru Tegh Bahadur Institute of Technology



More information about the Kernelnewbies mailing list