Ok guys. i removed in the procedure "texturesampler_bilinear " sse4 instructions because they, as we know, caused small problems on AMD proccesors and replaced with faster lookup table which calculates the adresses in tile for bilinear sample fetching. Fps jumped from 26 to 32 fps.. with point sampling ist it about 36-37 fps.. so its nice speedup.
I compared "tile bilinear sampler" against "linear (standart in memory image representation on PC) bilinear sampler" and the speed stayed almost the same... linear representation of the texture was a bit slower,because of not cache-friendly representation of the texture. Tiled texture is good for big textures, because if the texture is in high resolution , the speed don't drop so fast down as in linear (standart) representation of the texture. Of course the linear calculation of the sample adress from texture coordinates is much simpler, but the cache-polution is much bigger and is causing much bigger slowdowns.

https://sourceforge.net/projects/phenomenon/