pascal and learning 3d

**phibermon** · 30-01-2013, 07:34 PM

Oh and anybody who doesn't think that record alignment is important for openGL, you could not be more wrong. in situations like loading single/dual channel image data or using uniform buffers, you'll very quickly discover that you'll need to setup either your record alignment or OpenGLs packing/unpacking options. You might of not come across these issues but that'll be because the default alignment on your platform matches that in your hardware/OpenGL driver implementation (see std140 block layout for and example) but deal with the more exotic GL features on multiple architectures and it matters a lot.

You can't just use things like NumX * sizeof(X) and expect the layout on the target hardware to be the same.

Oh and I've come across plenty of formats that store arrays of structures that must be tightly packed (aligned to 1 byte boundary). If Pascal pads a field out with a few bytes for optimized memory access and you just stream the bytes from a file across the array, you'll have written your last byte from the file before you reach the last byte in the array.

Unfortuantly because of Intel CPU's and things like standards, most things are just 4-byte aligned, so many programmers never learn about it and will one day spend weeks trying to find the bug.

The point is that records are optimized by the compiler so that memory operations operate as fast as possible.

Assuming 32bit floats (single) it's actually faster on Intel hardware to load TVec3 data (8+8+8 bytes) that's aligned to TVec4 (8+8+8+8 bytes) into openGL with a stride parameter, than it is to load tightly packed TVec3 (assuming you're got hardware that comes with decent drivers). Because the Vec3 data will only be aligned to vec4 boundries on the hardware anyway, so any space you save by tightly packing your data you loose in A) system read performance and B) GPU unpacking operation.

you will only see the performance difference between a packed and non packed record/array, if the combined size of the record/array elements don't already fall on the boundry. if they do not, then packing this data will slow memory operations.

It is in fact, on the latest cards pretty pointless to use anything but a vec4. it takes no longer to copy the data (the bus is essentially transfering your vec3 to the GPU in a vec4 'box') and it's all operating on 32/64byte vectors in silicon.

Thread: pascal and learning 3d

Thread Tools

Display

Threaded View

Bookmarks

Bookmarks

Posting Permissions