Results 1 to 10 of 42

Thread: pascal and learning 3d

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #37
    PGD Staff / News Reporter phibermon's Avatar
    Join Date
    Sep 2009
    Location
    England
    Posts
    524
    Oh and anybody who doesn't think that record alignment is important for openGL, you could not be more wrong. in situations like loading single/dual channel image data or using uniform buffers, you'll very quickly discover that you'll need to setup either your record alignment or OpenGLs packing/unpacking options. You might of not come across these issues but that'll be because the default alignment on your platform matches that in your hardware/OpenGL driver implementation (see std140 block layout for and example) but deal with the more exotic GL features on multiple architectures and it matters a lot.

    You can't just use things like NumX * sizeof(X) and expect the layout on the target hardware to be the same.

    Oh and I've come across plenty of formats that store arrays of structures that must be tightly packed (aligned to 1 byte boundary). If Pascal pads a field out with a few bytes for optimized memory access and you just stream the bytes from a file across the array, you'll have written your last byte from the file before you reach the last byte in the array.

    Unfortuantly because of Intel CPU's and things like standards, most things are just 4-byte aligned, so many programmers never learn about it and will one day spend weeks trying to find the bug.

    The point is that records are optimized by the compiler so that memory operations operate as fast as possible.

    Assuming 32bit floats (single) it's actually faster on Intel hardware to load TVec3 data (8+8+8 bytes) that's aligned to TVec4 (8+8+8+8 bytes) into openGL with a stride parameter, than it is to load tightly packed TVec3 (assuming you're got hardware that comes with decent drivers). Because the Vec3 data will only be aligned to vec4 boundries on the hardware anyway, so any space you save by tightly packing your data you loose in A) system read performance and B) GPU unpacking operation.

    you will only see the performance difference between a packed and non packed record/array, if the combined size of the record/array elements don't already fall on the boundry. if they do not, then packing this data will slow memory operations.

    It is in fact, on the latest cards pretty pointless to use anything but a vec4. it takes no longer to copy the data (the bus is essentially transfering your vec3 to the GPU in a vec4 'box') and it's all operating on 32/64byte vectors in silicon.
    Last edited by phibermon; 30-01-2013 at 08:05 PM.
    When the moon hits your eye like a big pizza pie - that's an extinction level impact event.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •