Results 1 to 4 of 4

Thread: Surprise! Why multiplication by inline const may work 3 times slower in 64-bit code

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Quote Originally Posted by SilverWarior View Post
    Are you testing this only one one machine or multiple different machines?
    Ka-whoops!
    I was only testing on an i5 2450m

    Let's try @ Ryzen 7 5800X...
    ..multiplication not wrapped in type-cast (and thus using FPU) is 2.7 times slower in 32-bit code (0.81 vs 2.18 gigaflops) and 4.21 times slower in 64-bit code (0.46 vs 1.94)
    So on Ryzens this hits even harder, affecting 32-bit code as well as 64-bit.

    P.S. You can try it yourself, as I mentioned before
    (note you need to make sure your browser doesn't correct http into https since I still haven't corrected my server's Let'sEncrypt and the https has invalid sertificate)
    pure source http://chentrah.chebmaster.com/downloads/determchk.zip (7Kb)
    with binaries compiled for x86 and x86-64 using both Free Pascal 3.2.2 and Free Pascal 2.6.4 : http://chentrah.chebmaster.com/downl...thbinaries.zip (199Kb)

    http://chentrah.chebmaster.com/downl...ple_output.txt
    :
    Microsoft Windows [Version 10.0.19044.2965]
    (c) Корпорация Майкрософт (Microsoft Corporation). Все права защищены.

    x:\stuff\determchk>determchk_322_x86.exe

    Determinism checker, built using 3.2.2 for Win32 i386
    (c) 2016, 2023 ChebMaster
    This program calculates md5 checksums over the entire float range
    (4 billion something calculations per formula) to test
    if reproducibility is possible using Free Pascal
    -----------------------------------------
    Init timer...
    Setting hardware timer to 1ms... Ok
    Setting THREAD_PRIORITY_TIME_CRITICAL... Ok
    Measuring TSC frequency... Ok
    Resetting thread priority back to normal... Ok
    Calling timeEndPeriod(1)...Ok
    Ultra-res timer at 4,19 GHz (error of 0,239 nanoseconds)
    -----------------------------------------

    ..checking round(x) (-1 million to +1 million)
    .................................
    ..ok, in 8 (pure 1) seconds (1,21 GFLOPS)
    ..md5 checksum = 71AD5C546C02DCE7A1804554B2ACE0BA

    ..checking trunc(x) (-1 million to +1 million)
    .................................
    ..ok, in 8 (pure 1) seconds (1,21 GFLOPS)
    ..md5 checksum = A5AEE527EC2F8F587A5294C5D9D999A7

    ..checking frac(x) (-1 million to +1 million)
    .................................
    ..ok, in 14 (pure 6,43) seconds (0,188 GFLOPS)
    ..md5 checksum = CA2119DA4E2ECEC02F00B78116120B86

    ..checking sin(x) (0 to Pi)
    .................................
    ..ok, in 42 (pure 35,2) seconds (0,0304 GFLOPS)
    ..md5 checksum = 4DE8EFC27CBB692E5E3DEB7A7E561EAB

    ..checking fake quick sin() (0 to Pi)
    .................................
    ..ok, in 30 (pure 23,9) seconds (0,0447 GFLOPS)
    ..md5 checksum = 78E20BDF40F0D2352EFB0F50427AAFC0

    ..checking tricky fake quick sin() based on Trunc() instead of Frac() (0 to Pi)
    .................................
    ..ok, in 14 (pure 7,7) seconds (0,139 GFLOPS)
    ..md5 checksum = 78E20BDF40F0D2352EFB0F50427AAFC0

    ..checking x * y (two values)
    .................................
    ..ok, in 14 (pure 0,919) seconds (2,32 GFLOPS)
    ..md5 checksum = 3D703727DCD17C3EDCE64B89560A98E9

    ..checking float(x * y) (two values wrapped in type-cast)
    .................................
    ..ok, in 14 (pure 0,916) seconds (2,32 GFLOPS)
    ..md5 checksum = 3D703727DCD17C3EDCE64B89560A98E9

    ..checking x * 3.141592653589793 (inline const)
    .................................
    ..ok, in 31 (pure 5,1 seconds (0,81 GFLOPS)
    ..md5 checksum = 0FC3738303DEA3CFC8C6F7AFBF585BE6

    ..checking x * float(3.141592653589793) (inline const with type-cast)
    .................................
    ..ok, in 27 (pure 1,92) seconds (2,18 GFLOPS)
    ..md5 checksum = 9CA6E7B818FA046C3DAE722C35196729

    ..checking 1/x
    .................................
    ..ok, in 15 (pure 1,61) seconds (1,32 GFLOPS)
    ..md5 checksum = 00144058D1BFF4A090304684F39E6020

    ..checking sqrt(x)
    .................................
    ..ok, in 16 (pure 2,53) seconds (0,844 GFLOPS)
    ..md5 checksum = 10B012DFF8522837F45FBC1DA821B545

    ..checking 1/sqrt(x)
    .................................
    ..ok, in 17 (pure 4,31) seconds (0,494 GFLOPS)
    ..md5 checksum = 7BA70F1439D5E2955151CC565477E924

    ..checking SSE SIMD4 1/sqrt(x)
    .................................
    ..ok, in 14 (pure 1,06) seconds (2 GFLOPS)
    ..md5 checksum = 7BA70F1439D5E2955151CC565477E924

    ..checking SSE SIMD4 RSQRTPS (packed quick reverse square root)
    .................................
    ..ok, in 13 (pure 0,274) seconds (7,78 GFLOPS)
    ..md5 checksum = EF9B294032F7BA3051A1025B06EA3C96


    Press Enter to close.


    x:\stuff\determchk>determchk_322_x86-64.exe

    Determinism checker, built using 3.2.2 for Win64 x86_64
    (c) 2016, 2023 ChebMaster
    This program calculates md5 checksums over the entire float range
    (4 billion something calculations per formula) to test
    if reproducibility is possible using Free Pascal
    -----------------------------------------
    Init timer...
    Setting hardware timer to 1ms... Ok
    Setting THREAD_PRIORITY_TIME_CRITICAL... Ok
    Measuring TSC frequency... Ok
    Resetting thread priority back to normal... Ok
    Calling timeEndPeriod(1)...Ok
    Ultra-res timer at 4,2 GHz (error of 0,238 nanoseconds)
    -----------------------------------------

    ..checking round(x) (-1 million to +1 million)
    .................................
    ..ok, in 10 (pure 0,642) seconds (1,88 GFLOPS)
    ..md5 checksum = 71AD5C546C02DCE7A1804554B2ACE0BA

    ..checking trunc(x) (-1 million to +1 million)
    .................................
    ..ok, in 10 (pure 0,683) seconds (1,77 GFLOPS)
    ..md5 checksum = A5AEE527EC2F8F587A5294C5D9D999A7

    ..checking frac(x) (-1 million to +1 million)
    .................................
    ..ok, in 16 (pure 6,31) seconds (0,191 GFLOPS)
    ..md5 checksum = CA2119DA4E2ECEC02F00B78116120B86

    ..checking sin(x) (0 to Pi)
    .................................
    ..ok, in 17 (pure 8,37) seconds (0,128 GFLOPS)
    ..md5 checksum = 4DE8EFC27CBB692E5E3DEB7A7E561EAB

    ..checking fake quick sin() (0 to Pi)
    .................................
    ..ok, in 11 (pure 2,06) seconds (0,52 GFLOPS)
    ..md5 checksum = 78E20BDF40F0D2352EFB0F50427AAFC0

    ..checking tricky fake quick sin() based on Trunc() instead of Frac() (0 to Pi)
    .................................
    ..ok, in 10 (pure 1,5 seconds (0,677 GFLOPS)
    ..md5 checksum = 78E20BDF40F0D2352EFB0F50427AAFC0

    ..checking x * y (two values)
    .................................
    ..ok, in 18 (pure 1,07) seconds (2 GFLOPS)
    ..md5 checksum = 3D703727DCD17C3EDCE64B89560A98E9

    ..checking float(x * y) (two values wrapped in type-cast)
    .................................
    ..ok, in 18 (pure 1,07) seconds (2 GFLOPS)
    ..md5 checksum = 3D703727DCD17C3EDCE64B89560A98E9

    ..checking x * 3.141592653589793 (inline const)
    .................................
    ..ok, in 43 (pure 8,96) seconds (0,468 GFLOPS)
    ..md5 checksum = 0FC3738303DEA3CFC8C6F7AFBF585BE6

    ..checking x * float(3.141592653589793) (inline const with type-cast)
    .................................
    ..ok, in 36 (pure 2,16) seconds (1,94 GFLOPS)
    ..md5 checksum = 9CA6E7B818FA046C3DAE722C35196729

    ..checking 1/x
    .................................
    ..ok, in 19 (pure 1,7) seconds (1,25 GFLOPS)
    ..md5 checksum = 00144058D1BFF4A090304684F39E6020

    ..checking sqrt(x)
    .................................
    ..ok, in 20 (pure 2,56) seconds (0,833 GFLOPS)
    ..md5 checksum = 10B012DFF8522837F45FBC1DA821B545

    ..checking 1/sqrt(x)
    .................................
    ..ok, in 22 (pure 4,33) seconds (0,492 GFLOPS)
    ..md5 checksum = 7BA70F1439D5E2955151CC565477E924

    ..checking SSE SIMD4 1/sqrt(x)
    Unknown check kind (dck_sse_one_div_sqrt)
    ..checking SSE SIMD4 RSQRTPS (packed quick reverse square root)
    Unknown check kind (dck_sse_rsqrtps)

    Press Enter to close.


    x:\stuff\determchk>
    Note that all checksums should match, on all platforms and CPU models, *except* the SSE SIMD4 RSQRTPS one. That one will have a different checksum on each CPU model, thus not suitable for physics since it is not deterministic, but very useful for secondary things like animation.

    P.S. Now I'm itching to unearth my Core2 Duo rig and check also there (only the 32-bit version, alas, because although the CPU itself is 64-bit, WinXP reigning there is not).
    Last edited by SilverWarior; 20-05-2023 at 12:42 PM. Reason: Fixed second download link

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •