I'm guilty of not profiling my code, I just do my best, write each new processor intensive task in a seperate test app, set some arbitary but future fixed usage pattern for the test and get it running as quick as I can before I get fed up optimizing.

That way if I ever decide that I need to optimize some more I can just go back to any suspiciously expensive test app and poke and prod it a bit more.

obviously this won't work for any task that has multiple steps that can't be divide out into seperate test apps due to mutal dependance and it's not a true test for a typical usage pattern in the system as a whole..

But it works for me and encourages a good modular design.

edit : I'm informed that this technique is similar to extreme programming (http://www.extremeprogramming.org/).

An interesting idea.