Tuesday, June 22, 2010

memset speed vs memcpy speed

Below i post some piece of code I wrote for memset testing. It's C.
Memset can be 4-5 times faster then an ordinary loop.
There is also one feature which should be pointed out.
When loop and memset operate on the same array and loop is first in order (memset follows the loop), memset can be 10 times faster (probably due to cache usage). It does not work the opposite way. If memset goes first and loop goes second the memset speedup is 4-5 times again.
The array is one dimension - why the compiler does not take advantage of cache. Or maybe the cause of speed up when memset goes after loop is better vectorization?
If arrays are totally different memory areas, the speedup for memset is again 4-5 times.

Try it yourself if you wish and please comment. I have used gcc 4.4.1


#include
#include "time.h"
int main()
{

unsigned int val = 253,val2=223;
const int size = 7000;

//unsigned int test[size][size];
unsigned int *test,*testloop;
test=(int*)malloc(size*size*sizeof(unsigned int));
testloop=(int*)malloc(size*size*sizeof(unsigned int));
if (test == NULL)
{
printf("cannot allocate so much memory, size: %d ", size);
return;
}

int i,j;
double loop_time,memset_time;

double old_time = (double)clock();///CLOCKS_PER_SEC;
// printf("took %5.3f seconds \n", total_time);

for (i=0;ifor (j=0;jloop_time=(double)clock() - old_time;
memset_time = (double)clock()- old_time - loop_time;
printf("Loop took %5.3f ticks \n", loop_time);
printf("memcpy took %5.3f ticks \n", memset_time);
free(test);
free(testloop);
free(struct_array);
free(struct_array2);
free(struct_array3);
}//main

Memcpy works slower than assignment
I had to check it and it is truth. The assignment runs faster then explicit memcpy. All tests were done using gcc and g++ 4.4.1. Here is the code for struct copy. I tested memcpy and loop in separate executions (that's why memcpy is commented out)

#include
#include "time.h"
#include
int main()
{

unsigned int val = 253,val2=223;
const int size = 7000;
class test_struct
{
int a;
int b;
float dupa;
double s;
public:
test_struct() {};
};

unsigned int *test,*testloop;

test_struct *struct_array = new test_struct[size*size];
test_struct *struct_array2 = new test_struct[size*size];

test = new unsigned int[size*size];
testloop = new unsigned int[size*size];

int i,j;
double loop_time,memset_time;

double old_time = (double)clock();///CLOCKS_PER_SEC;
// printf("took %5.3f seconds \n", total_time);

for (i=0;i for(j=0;j struct_array[i*size + j] = struct_array2[i*size + j];

//memcpy(struct_array2,struct_array,size*size*sizeof(test_struct));
loop_time = (double)clock() - old_time;

memset_time = (double)clock()- old_time - loop_time;
printf("Loop took %5.3f ticks \n", loop_time);
printf("memcpy took %5.3f ticks \n", memset_time);
//delete(test);
//delete(testloop);
delete [] struct_array;
delete [] struct_array2;
delete [] test;
delete [] testloop;
}

No comments:

Post a Comment