That tutorial told to calculate volume multiplier in order to prevent clipping by diving 256 (max 8 bit unsigned sample) by number voices and then multiplying samples with that. That's practically same thing as calculating average value of samples which I was doing (except that my way is way slower)... so it wasn't really help.

I wonder if I should implement some sort of "dynamic compressor" which analyzes sum of samples ahead and reduces max volume of sample when it notices that clipping _would_ occur. But that seems very slow and I don't think that is needed in normal mixing anyway...