You should switch X and Y loop places to make memory access more sequential. As rounding is not fast, you could make aSin and aCos into integers (to give them some resolution multiply them by a power of two before rounding and then in loops just use shifts). You should also consider inlineing those SDL functions, and maybe also assume some limitations like not supporting 8bit color.