use only part of a texture

strangerranger · 24-01-2004, 10:58 PM

Is there a way to use only part of a texture e.g. to stretch this part onto powerdraw?
I have a huge tgaimage (2048x204

and need to use only a part of it (for scrolling).

One workaround would be to create a buffer-tgaimage, copy the needed part onto it and then stretch this buffer. Do I have to do the first part pixel by pixel or is there a more efficient way?

cheers,
strangerranger

**jansoft** · 26-01-2004, 11:40 AM

use method TextureMap of TPowerDraw object. But you must convert your TGA into TAGFImage.

this is part of code i am using :
[pascal]
procedure BitBlt(Dest, Src, Size: TPoint;Img:TAGFImage);
var
c:TTexCoord;
begin
if (Size.x=0) or (Size.y=0) then exit;
c.SrcX:=Src.x;
c.SrcY:=Src.y;
c.Width:=Size.x;
c.Height:=Size.y;
c.Pattern:=0;
c.Flip:=False;
c.Mirror:=False;
PowerDraw.TextureMap(Img,pBounds4(Dest.x,Dest.y,Si ze.x,Size.y),cWhite4,c,effectSrcAlpha);
end;[/pascal]

strangerranger · 26-01-2004, 12:19 PM

Thanks a lot, jansoft.
That was easier than I expected.

Is there an equally easy way to draw a texture onto another texture?
I managed to write a pixel-by-pixel routine (two address pointers) but it is quite slow compared to the texture-drawing routines of powerdraw (factor 10).
Can one draw textures onto each other using DX or D3D accelerated routines?

thanks,
strangerranger

**jansoft** · 26-01-2004, 03:35 PM

I dont know any way to do this, in VTDbManager I am using my own procedure to bitBlt in-memory RGBA data, but this is part of another procedure, so I can't write you a sorce code.
In the principe it is pixel-by-pixel routine, but with optimised access to memory :

- lock - do everything - unlock
- use block memory moves (rows of 4 byte pixels)
- precomputized values of offsets (dest.x-src.x)*4
- and so on...

strangerranger · 26-01-2004, 08:32 PM

Below printed code is 20 times slower then drawing a texture using texturemap. It simply copies the whole scrpic (type tagfimage) onto destpic.
Is this the limit or are there any further tweeks?

regards,
strangerranger

[pascal]

srcpic.Lock(0,r0);
destpic.Lock(0,r1);

x:=srcpic.patternWidth-1;
y:=srcpic.PatternHeight-1;
for i:=0 to x do for j:=0 to y do begin
ScrPtr:= Pointer(Integer(r0.Bits) + (j * r0.Pitch) + (i * 4));
DestPtr:=Pointer(Integer(r1.Bits) + (r1.Pitch * j) + (i * 4));
pinteger(destptr)^:=integer(scrptr^);
end;

srcpic.unlock(0);
destpic.Unlock(0);

[/pascal][/b]

**Paulius** · 26-01-2004, 10:17 PM

Probably you won?¢_Tt get much of a speed improvement with this but a more optimized version of your code should look like this:
[pascal] srcpic.Lock(0,r0);
destpic.Lock(0,r1);

x:=srcpic.patternWidth-1;
y:=srcpic.PatternHeight-1;
PitchSrc:=r0.Pitch - srcpic.patternWidth shl 2;
PitchDst:=r1.Pitch - dstpic.patternWidth shl 2;
ScrPtr :=r0.Bits;
DestPtr:=r1.Bits;
for j:=0 to y do
begin
for i:=0 to x do
begin
integer(destptr)^:=integer(scrptr^);
inc(Cardinal(scrptr), 4);
inc(Cardinal(destptr), 4);
end;
inc(Cardinal(scrptr), PitchSrc);
inc(Cardinal(destptr), PitchDst);
end;

srcpic.unlock(0);
destpic.Unlock(0);[/pascal]

strangerranger · 26-01-2004, 11:07 PM

Paulius' Code raised my pixels per second from 2*10^7 to 1*10^8.
Apparently it takes a huge ammount of time to read .bits and .pitch.

Now I've got half the performance of texturemap. I can live with that since bitBlt doesn't use D3D9 accelerated features.

Thanks for your help,
strangerranger

**Paulius** · 27-01-2004, 09:47 AM

Apparently it takes a huge ammount of time to read .bits and .pitch.

Nope, reading them should be like reading any other variable. To explain the speedup: even without counting hidden additions and the like, the first variant had visible 4 extra multiplications and 2 extra additions per iteration so for textures with resolutions 256x256 we?¢_~d get 262144 extras multiplications and 131072 extra additions (and things like r0.Bits turn to r0 address + Bits offset so better store it in a variable before the loop to avoid unnecessary additions), another thing is that it?¢_~s better to loop over data the way it is stored in memory (images are stored in rows) to minimize cash misses.

strangerranger · 27-01-2004, 10:55 AM

I guess this code has to be done by CPU (instead of GPU).
So on my 2,8 GHz P4, it took about 140 cycles (2,8e9 Hz / 2e7 Pixel) to do 4+ and 4* operations.
Getting rid of the * operations (and going in the right direction

) reduced it to 28 cycles (2,8e9 Hz/ 1e7 Pixel).

I think the P4 pipeline is 20 instructions long so one would expect 20 cycles saved per multiplication.

Makes sense.

strangerranger · 27-01-2004, 01:04 PM

Now that pixel adressing is working so fast:
If I want to draw a pixel depending on the alpha value of his top-left neighbour (or anything similar), this would not be a problem.
But it would have to be done by CPU altough I think pixel shaders were developed for that reason.

Does anyone know a way (using Powerdraw) to let pixelshaders do that work?