Ah, I thought it crashed when you tried the code in FPC. Lddqu is useful only for P4 Prescott, where it solves the cacheline split issue (some interesting reading about this: http://x264dev.multimedia.cx/?p=8) and if you're sure the data is aligned, it doesn't matter anyway.
Btw. doesn't Delphi have an Align() function - or this is only FPC's feature?