PDA

View Full Version : 1) Alpha Blending 2) Calculating pixel address offset 16bit



Gadget
25-01-2003, 11:24 AM
How can I 'blend' surfaces together?

I have a simple interface image that I wan't to appear semi translucent... Any ideas how I do this?


Also, I have a routine I have written for a weather system, I can draw rain / snow particles on one of my surfaces and blit it over the main display. The problem is I don't quite understand how to work out the byte offset for pixels using 16bit in DirectX?

Alimonster
25-01-2003, 12:52 PM
This post assumes that you're referring to DirectDraw and not Direct3d.

You have to blend surfaces by hand in DirectDraw, which means locking and going over the pixels. You'd read the pixels, break them up into r,g, and b components, do some calculations on them, then recombine the results. This ain't the fastest thing out there. You will get more speed with MMX, so try having a look at this article at Gamedev (note: it's in C++ - if that's a problem, let me know): MMX Enhanced Blending (http://www.gamedev.net/reference/articles/article817.asp). There may be other blending articles at gamdev; I forget.

As for calculating the offsets... have a look at the pixel manipulation part of my yet-to-be-finished DDraw tutorial: http://www.alistairkeys.co.uk/ddraw2.shtml (Btw: the layout may screw up in IE 5 due to its broken box model implementation -- if so, upgrade to a less broken browser such as IE 6 or Mozilla).

Gadget
25-01-2003, 07:55 PM
This post assumes that you're referring to DirectDraw and not Direct3d.

You have to blend surfaces by hand in DirectDraw, which means locking and going over the pixels. You'd read the pixels, break them up into r,g, and b components, do some calculations on them, then recombine the results. This ain't the fastest thing out there. You will get more speed with MMX, so try having a look at this article at Gamedev (note: it's in C++ - if that's a problem, let me know): MMX Enhanced Blending (http://www.gamedev.net/reference/articles/article817.asp). There may be other blending articles at gamdev; I forget.

As for calculating the offsets... have a look at the pixel manipulation part of my yet-to-be-finished DDraw tutorial: http://www.alistairkeys.co.uk/ddraw2.shtml (Btw: the layout may screw up in IE 5 due to its broken box model implementation -- if so, upgrade to a less broken browser such as IE 6 or Mozilla).

Thanks very much for those :) I tried the simple 50% blending routine and it works fine :) (tested with TBitmap) I think it will be too slow though to keep it for what I have in mind :P Will have to try coding that MMX example on the same page... Bah, there looks to be loads of code in the MMX routine, seems hard to believe it could be more efficient than that one line of code for simple 50/50 blend. Will check out your site now as this pixel offset is driving me nuts.

Is there any way of stopping my PC locking due to the Win16Mutex crap?

Alimonster
25-01-2003, 08:35 PM
Is there any way of stopping my PC locking due to the Win16Mutex crap?
Try adding the flag DDLOCK_NOSYSLOCK when locking your surface. Can't say I've tried it myself, though, so I'm not sure how it effects performance or anything else.

Gadget
25-01-2003, 09:56 PM
Is there any way of stopping my PC locking due to the Win16Mutex crap?
Try adding the flag DDLOCK_NOSYSLOCK when locking your surface. Can't say I've tried it myself, though, so I'm not sure how it effects performance or anything else.

I will look into that! As for those alpha blending routines... Bah @ alpha blending lol! I started translating the C++ MMX sample and ran into problems with unsupported ASM instructions (minor problem), but also lack of MMX register addresses etc...

Using that 50/50 simple routine my engine drops from 60fps to 5fps, and I am only making the top 1/4 of the screen 50% transparent! I think I might leave the interface without alpha and just use that alpha routine for tiny images.

Alimonster
26-01-2003, 12:47 AM
Out of interest, how fast does this (http://www.alistairkeys.co.uk/new_dissolve.zip) (224K) run on your comp? If it's any value >= 5 FPS then your blending could probably be improved, especially since the above is doing much more fancy stuff than just alpha-blending the screen (which it does too, though). It doesn't use MMX yet. [It works best with white backgrounds; if you press 'z' to get rid of the picture then it looks pretty snazzy].

Yes, this is sort of irrelevant and I'm showing off. :wink:

You probably would be better off with a non-translucent interface. Save most of the CPU cycles for the game :).

Gadget
26-01-2003, 02:45 PM
Out of interest, how fast does this (http://www.alistairkeys.co.uk/new_dissolve.zip) (224K) run on your comp? If it's any value >= 5 FPS then your blending could probably be improved, especially since the above is doing much more fancy stuff than just alpha-blending the screen (which it does too, though). It doesn't use MMX yet. [It works best with white backgrounds; if you press 'z' to get rid of the picture then it looks pretty snazzy].

Yes, this is sort of irrelevant and I'm showing off. :wink:

You probably would be better off with a non-translucent interface. Save most of the CPU cycles for the game :).

LOL @ 110 FPS... How on earth can I optimize this:- (other than using MMX)


dwSrcOffset := (ddsdSrc.lPitch shr 1);
dwDestOffset := (ddsdDest.lPitch shr 1);

for iY := 0 to ddsdSrc.dwHeight - 1 do
begin
for iX := 0 to ddsdSrc.dwWidth - 1 do
begin
if SrcBuf[iX + iY * dwSrcOffset] <> $0000 then
begin
DestBuf[iX + iY * dwDestOffset] :=
((SrcBuf[iX + iY * dwSrcOffset] and $F7DE) shr 1) + ((DestBuf[iX + iY * dwDestOffset] and $F7DE) shr 1);
end;
end;
end;

Alimonster
26-01-2003, 04:45 PM
Warning: this post contains vague, fuzzy optimisation info. :P

First of all, and before all else, ensure that you're working with a system memory surface rather than a video memory surface. This is very important. Reading from a video memory surface is most definitely not a good plan. If possible, specify the DDLOCK_READONLY or DDLOCK_WRITEONLY when locking as they sometimes help.

The first trick is based on the assumption that your GUI will be relatively static. It's not applicable if it changes all the time. If it's static then you can precompute the blending amount for that half of the equation - instead of calculating it each time you'd read back the 50% blended value. That's potentially half of your work cut out right there! Of course, this assumption breaks if your GUI constantly changes (maybe it will, maybe not). If it changes a little then you'd still get away with recalculating the values on each change. Of course, there's a cut-off point to this. Repeat: store your GUI in 50% blended format, rather than 100% normal format, if possible, so you can simply add it directly to the other side of the equation without any more thought.

Next, the value is invariant inside the inner-x loop. You can calculate it once and reuse it in multiple places inside there. That'll save you 3 muls and 3 additions per inner loop, which is quite handy. If you think about it, each pixel will be a successively higher array index: array[0], array[1], and so on. There's no reason to recalculate it each pixel. Instead, you'd initialise it to 0 outside of all the loops and then inc it each inner x loop. Much simpler! As always, though, compare before and after FPS to ensure that it *is* an improvement - don't assume! You can also use pointers directly to the elements and inc them instead - this sometimes helps.
Looping down to zero can give a speed boost - from ix := 0 to whatever - 1 => from ix := whatever - 1 downto 0 do. This is a little micro-optimisation, though, so it may not buy you much. It's only possible if you don't rely on array indices (which would go back to front). Instead, you'd use pointers to the first element and inc them in the inner loop so that they still go the same way.

You might also want to unroll the inner loop. This sometimes helps, but sometimes doesn't. Unrolling is simply repeating your inner loop code several so that you do more work per loop. A quick example:

for i := 1 to 100 do
begin
something
end;

//becomes

for i := 1 to 25 do
begin
something
something
something
something
end;

This can help because it reduces the loop overhead - you have roughly four times less checking-of-loop-vars (remember that the loop has to check if it's finished each iteration!). Test this first; sometimes you'll blow the cache by making your inner loops too large, which makes things slower!

It's not a good plan to have an if statement in an inner loop - a mispredicted branch is very slow. The CPU is always grabbing various things prematurely in the expectation that they'll be used. This gives a speed boost if the branch is taken and the stuff can be used as expected, but if it's not then the CPU has to chuck out the prematurely grabbed info (a speed hit) and get the real info. The CPU has [i]branch prediction, which attempts to predict whether a jump will be taken, but it's not perfect. If you are going to have if statements in the inner loop then try to have the if do the least likely case first - for example, if it's more likely that you'll have blended rather than empty pixels, you might say "if this_pixel = 0 then continue". In general, "if something_unlikely then do_unlikely_thing else do_likely_thing" or, better yet, "do_likely_thing; if very_unlikely_thing then begin undo_likely_thing; do_very_unlikely_thing; end;". The above arrangement of if/else is from my memory; confirm it yourself rather than believing me immediately, please.

It's probably possible to get rid of that if statement with a little precalc (maybe fiddle it so that it blends to only the underlying surface with full intensity, i.e. no blending). You can also consider using run-length encoding to get rid of the if statement. I can't think of a nice way to explain this, however. You'd fiddle with the surface so that it had the number of consecutive transparent pixels, the number of non-transparent pixels, and the values themselves. You could code it like this (pseudocode)

var this_index := 0;
var however_many := read_amount_of_transparent_pixels(surface);
inc(this_index, however_many);

however_many := read_amount_of_gui_pixels(surface);
for i := 0 to however_many - 1 do
begin
blend_this_pixel[this_index]
inc(this_index)
end;

(Repeat for each block of unused then blended run of pixels.)

You could probably optimise the above significantly, of course. The idea with that pseudocode is to avoid an if statement - you'll know how many blended/not blended pixels there are in sequences, so you don't have to check each one and you can directly skip over the empty ones!

By far the most important tip: DO NOT ASSUME ANYTHING. Test it! For example, I assumed for my effect that precalcuting a certain var would be quicker, but it wasn't ;). Sometimes, reading from memory isn't as quick as recalculating because of the memory transfer speed -- however, sometimes reading from memory w/ a precalc is quicker!

The only way to be certain is to try out the different possibilities. Sometimes, the fastest method isn't the most obvious. As an aside, you could consider using the GDI/VCL as a quick test cradle. It'll be effected by the blit speed, but you should still be able to try out different ideas and see whether they are a speed boost. The real trick after this, though, is transferring those results to DX :).

Here's the main loop from my effect (after the convolution filter and other stuff)... the actual blending is something similar to this: "this_pixel := FTransPic[y * BITMAP_WIDTH + x] + FTransValues[DestPixel^];" This is the precalc bit I talked about - it was quicker doing the y * BITMAP_WIDTH + x in this case rather than having a var set to 0 and inc'ing it. Unintuitive! The background picture is stored in a pre-blended format (you load it up and store **the 50%** version of it, rather than the proper picture). The particle palette is also precalc'ed - it's stored as 256 different 50% colours. As a result, the blending is reduced to some reading and an addition, rather than anything more complicated. Woot.

I don't know whether the above is tremendously helpful, mind you, since my effect had certain constraints (e.g. background not changing). Constraints are always a massive boost for optimisation since you can precalc a bunch of stuff. In your case, the main source for optimisation is the GUI not changing often - aim there, precalc if possible.

Bear in mind that system memory blits are much slower than video memory blits. You might want to use MMX to copy over the results, four pixels at a time, onto your back buffer (rather than using a standard Blt).

I wish you the best of luck! If it's possible for my effect to be > 100FPS using 32bit and the GDI, you should be able to do blending at a suitable speed w/ DX, which rocks for pixel manipulation.

Gadget
26-01-2003, 09:21 PM
Warning: this post contains vague, fuzzy optimisation info. :P


Wow, that's an excellent reply ;) That will help tremendously with most stuff in future. I always assumed most of what I wrote could not be optimized much... I didn't think of storing one of the images part blended, and I also did not realize about the Video memory either! I had always assumed that reading and writting to vid ram was much faster. You have opened my eyes!

Useless Hacker
27-01-2003, 10:56 AM
As for those alpha blending routines... Bah @ alpha blending lol! I started translating the C++ MMX sample and ran into problems with unsupported ASM instructions (minor problem), but also lack of MMX register addresses etc...

Which version of Delphi are you using? I think Delphi 6 & 7 have support for MMX instructions built in. For earlier versions, there are utillities which will convert the MMX instructions into machine code for you, such as this (http://www.yks.ne.jp/~hori/MMXasm-e.html) one by Hori (the maker of DelphiX).